Know ATS Score
CV/Résumé Score
  • Expertini Resume Scoring: Our Semantic Matching Algorithm evaluates your CV/Résumé before you apply for this job role: AI Infrastructure Engineer Opus.
United Arab Emirates Jobs Expertini

Urgent! AI Infrastructure Engineer- Opus Job Opening In أبوظبي – Now Hiring AppliedAI

AI Infrastructure Engineer Opus



Job description

As an
Opus AI Infrastructure Engineer
, you will lead the optimization and scaling of AI pipelines that serve foundational models in live production environments.

You will focus on evolving real-time and batch inference systems for reliability, low latency, and seamless integration with product logic.

This senior engineering role operates at the core of AI delivery, requiring strong system design, infrastructure fluency, and a deep commitment to performance and operational excellence.

You will work across modern cloud environments and manage a diverse and evolving portfolio of LLMs, both proprietary and open-source.

You will play a key role in evaluating model trade-offs, adapting to rapid model iteration, and ensuring smooth transitions as providers update APIs, capabilities, and service tiers.

You will also coordinate directly with foundational model vendors to align roadmap requirements, performance issues, and deployment optimizations.

Key Responsibilities

AI Serving Pipeline Optimization

* Design, rewrite, and mature inference pipelines for real-time, streaming, and batch workloads

* Optimize throughput, latency, and reliability via architectural evolution and model-specific strategies

* Manage orchestration of heterogeneous LLMs with varying performance, cost, and response profiles

* Implement fallback logic, request routing, and intelligent retry systems for availability and graceful degradation

* Build tooling for profiling and benchmarking pipelines involving LLMs and agentic orchestration frameworks

* Adapt infrastructure and integrations to support rapidly changing LLM APIs, model versions, and provider behavior

* Design and deploy self-hosted LLM inference pipelines, including model loading, quantization, batching, and runtime optimization on GPU/TPU environments

Production Infrastructure & Runtime Efficiency

* Own the live AI execution layer: coordinate model calls, resource scheduling, and latency-critical paths

* Monitor and improve key metrics: latency, token throughput, error rates, and autoscaling responsiveness

* Deploy and scale LLM services across cloud environments (AWS, GCP, Azure, on-prem), optimizing for regional availability and regulatory constraints

* Ensure robust observability, failover, rollback, and health monitoring across all deployed models

* Collaborate with infra teams to maximize compute efficiency across CPU/GPU/TPU backends

Model Vendor Coordination & External Integrations

* Serve as a technical counterpart to foundational model providers, communicating product needs, debugging issues, and tracking performance updates

* Maintain high reliability across provider transitions, including model deprecations, quota shifts, and new capability rollouts

* Evaluate and experiment with emerging models across different providers, providing comparative benchmarks and integration plans

System Integration & Engineering Excellence

* Integrate pipelines cleanly with APIs, orchestration layers, and application logic

* Refactor legacy systems for modularity, observability, and performance

* Promote reusable, maintainable infrastructure via tooling and shared abstractions

* Uphold engineering standards through code reviews, performance audits, and technical mentorship

Qualifications

Education

* Bachelor's or Master's degree in Computer Science, Software Engineering, or a related field

Experience

* 5+ years in backend, ML, or infrastructure engineering with a focus on live AI systems

* Demonstrated experience building and scaling real-time inference infrastructure

* Proven track record in latency optimization, fault tolerance, and production observability

Skills

* Proficient in Python (optionally Go or Rust); strong software design and debugging skills

* Experience with orchestration and serving tools

* Deep familiarity with containerization, Kubernetes, and cloud-native deployment (EKS, GKE, etc.)

* Hands-on with observability stacks (Prometheus, Grafana, etc.)

* Understanding of inference-level optimizations: batching, quantization, caching, and sharding

* Operational experience with LLMs (OpenAI, Anthropic, open-weight models) in both hosted and self-managed setups

* Experience building and maintaining self-hosted inference stacks using frameworks such as vLLM, HuggingFace Transformers, or DeepSpeed-Inference

* Familiarity with agentic AI systems and tooling (LangGraph, Semantic Kernel, CrewAI)

* Cross-cloud deployment experience (AWS, GCP, Azure) and awareness of compliance/latency trade-offs

* Comfortable managing technical communication with external vendors and adapting to fast-moving dependencies


Required Skill Profession

Other General



Your Complete Job Search Toolkit

✨ Smart • Intelligent • Private • Secure

Start Using Our Tools

Join thousands of professionals who've advanced their careers with our platform

Rate or Report This Job
If you feel this job is inaccurate or spam kindly report to us using below form.
Please Note: This is NOT a job application form.


    Unlock Your AI Infrastructure Potential: Insight & Career Growth Guide