Job Description
About the Role:
Grade Level (for internal use):
10
Key Responsibilities
- Design and build agentic AI platform components including agents, tools, workflows, and integrations with internal systems.
- Implement observability across the AI lifecycle: tracing, logging, metrics, and evaluation pipelines to monitor agent quality, cost, and reliability.
- Translate business problems into agentic AI solutions by collaborating with product, SMEs, and platform teams on data, model, and orchestration requirements.
- Develop and maintain data pipelines, features, and datasets for training, evaluation, grounding, and safety of LLM-based agents.
- Lead experimentation and benchmarking: Testing of prompts, models, and agent workflows; analyze results and drive iterative improvements.
- Implement guardrails, safety checks, and policy controls across prompts, tool usage, access, and output filtering to ensure safe and compliant operation.
- Create documentation, runbooks, and best practices; mentor peers on agentic AI patterns, observability-first engineering, and data/ML hygiene.
Core Skills Required
- Proficiency in programming languages such as Python with strong software engineering fundamentals.
- Solid understanding of LLM / GenAI fundamentals: prompting, embeddings, vector search, RAG, and basic agentic patterns (tool use, planning, orchestration).
- Experience running production systems or data pipelines on cloud computing platforms such as AWS / Azure / GCP, using containers, serverless, and managed storage/services.
- Hands-on familiarity with observability tools (OpenTelemetry, Prometheus, Grafana, ELK, etc.) across logs, metrics, and traces.
- Comfort working with structured and unstructured data; strong SQL plus experience with Pandas / Spark / dbt or similar frameworks.
- Ability to reason clearly about reliability, performance, and cost trade-offs.
- Strong collaboration and communication skills; ability to translate complex concepts for platform, product, data, security, and compliance teams.
Qualifications
- 3–6 years of experience in software engineering, data engineering, ML engineering, data science, MLOps roles.
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or equivalent practical experience.
- Experience with CI/CD, code reviews, and modern engineering best practices.
- Nice to Have:
- Exposure to agentic AI frameworks (LangChain, LangGraph, OpenAI Agents, etc.)
- Experience with LLM observability, eval frameworks, or prior work on production LLM/agent systems.
What We're Looking For Beyond skills and experience, we want engineers who:
Build for scale: Think like platform builders and design systems that work across teams, not just for today’s use case.
Lead with observability: Instrument first, debug with data, and deliver dashboards that reveal the truth.
Ship safely: Never deploy without guardrails or validations, even if it adds upfront effort.
Make thoughtful trade-offs: Clearly articulate decisions around cost, quality, latency, and reliability.
Own the end-to-end stack: Move comfortably between data pipelines, agent logic, infrastructure, and production monitoring.
Learn through experimentation: Test ideas, study failures, iterate rapidly, and improve continuously.
Communicate with impact: Explain complex AI concepts in simple, business-relevant terms to technical and non-technical stakeholders.
Stay ahead of the curve: Actively explore emerging technologies like LangGraph, agentic frameworks, and new LLM capabilities.