About the job Cloud Engineer
Experience: 5+ Years
Summary
We are looking for a passionate, innovate professional to join our cloud services team. Youll work in a collaborative and inclusive environment that values diverse perspectives and continuous learning and provides industry-leading benefits with unmatched opportunities for career growth. Key accountabilities include development and maintenance of cloud platforms, services and components to enable safe enterprise-wide use of cloud common functionality.
Requirements
Bachelors degree in Computer Science, related Engineering field, or equivalent experience
4+ years of experience in public cloud infrastructure, especially Azure and AWS.
Good understanding of cloud infrastructure, and different deployment models
Should be familiar with cloud networking and security solutions like load balancer, firewall, WAF, CSPM, security group, etc.
Good understanding of identity and access management solutions like Active directory, Azure AD, conditional access, IAM and other vendor specific solutions
Good understanding of Linux and windows based systems
Understanding of SQL & NoSQL Databases including IAAS and PAAS models.
Experience in policy management, governance, monitoring and alerts
Knowledge in microservices, DevOps and IaC (Terraform and Ansible).
Azure AZ-104 or AWS administrator certification would be an advantage
Excellent communication and interpersonal skills
Job responsibilities
- Assist application team to deploy various solutions in the cloud environment.
- Maintain infrastructure security and governance as per the client requirement and standards.
- Support other team members (database, network, security, etc.) to configure and maintain respective solution.
- Actively Involve in discussions related to new solution implementation, design creation and all other discussions related to cloud infrastructure.
- POC deployment, documentation, and technical presentation.
Linux Hosting and Administration
- Install, configure, and maintain Linux servers, ensuring optimal performance and security.
- Handle Linux-based hosting solutions, including web servers, databases, and other services.
- Apply patches and updates to Linux servers as required, and automate routine tasks.
- Monitor system performance, troubleshoot issues, and conduct root cause analysis for any server downtime.
Kubernetes Operations
- Deploy, manage, and maintain containerized applications using Kubernetes.
- Create and manage Kubernetes manifests, helm charts, and operators for complex application architectures.
- Scale applications based on resource utilization and requirements.
- Monitor the health and performance of Kubernetes clusters and take corrective actions as needed.
DevOps Integration
- Implement and maintain CI/CD pipelines for automated testing and deployments.
- Assist in incorporating containerization and orchestration into the DevOps process.
Rancher/OpenShift Expertise (Nice to Have)
- Experience in deploying and managing Kubernetes clusters using Rancher or OpenShift.
- Implement monitoring, logging, and auto-scaling solutions in Rancher or OpenShift environments.
Application Support
- Gain a thorough understanding of the applications running within containers to provide first-level application support.
- Collaborate with development teams to debug application issues in staging and production environments.
Azure Infrastructure
- Deploy and manage resources on Azure, including but not limited to VMs, databases, and Kubernetes clusters.
- Implement Infrastructure as Code practices using Azure Resource Manager (ARM) templates or terraform
Monitoring and Alerting Using Open-Source Tools (Any one of the following)
ELK Stack
- Implement and manage the ELK (Elasticsearch, Logstash, Kibana) stack for real-time log aggregation, monitoring, and analysis.
- Customize Kibana dashboards for different system metrics and logs to aid in quick issue resolution.
- Grafana
- Develop and maintain Grafana dashboards to visualize key performance indicators and system metrics.
- Integrate Grafana with other data sources and monitoring tools for comprehensive analytics.
- Loki
- Set up and manage Loki for aggregating and storing logs.
- Integrate Loki with Grafana for unified querying and visualization of metrics and logs.
- Prometheus
- Deploy and configure Prometheus for monitoring system and application metrics.
- Create custom Prometheus queries and alerts to catch anomalies and system performance issues.
- Mimir/Cortex (prefereable)
- Implement Mimir or Cortex for enhanced long-term storage and scalability of Prometheus metrics.