Location: Abu Dhabi 
Duration: Yearly Renewable Contract 
Role Summary: 
We are looking for a Site Reliability Engineer (SRE) to maintain the availability, scalability, and performance of critical services deployed across cloud and on-premise environments.
This role combines software engineering and systems engineering to automate operations and improve reliability in CI/CD and production environments.
Key Responsibilities: 
- Maintain uptime and performance of applications deployed across hybrid infrastructure 
 - Implement observability (logging, metrics, tracing) using Prometheus, Grafana, ELK, Azure Monitor 
 - Troubleshoot production issues, participate in incident response, and root cause analysis 
 - Automate infrastructure, monitoring, and runbooks using IaC tools and scripting 
 - Implement and track SLOs, SLIs, and error budgets 
 - Build self-healing systems and resilient deployments 
 - Collaborate with developers, security teams, and cloud engineers to enforce reliability practices 
  
Required Skills: 
·      Experience with Azure/AWS/GCP monitoring tools and on-prem observability stacks 
·      Strong in Linux/Unix administration, scripting (Python, Bash) 
·      Hands-on with CI/CD pipelines, Kubernetes, and Helm 
·      Good understanding of load balancing, failover, HA architecture 
·      Familiar with incident management, postmortem writing, and runbook creation 
Preferred Qualifications: 
- Experience with Terraform, Ansible, or Pulumi 
 - Knowledge of service mesh (Istio, Linkerd) and API gateway configurations 
 - Certifications: SRE Foundation, Azure/AWS Cloud Practitioner, or Kubernetes Administrator (CKA) 
 - Awareness of compliance standards (CIS, NIST, ISO 27001)