
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive engineers to play a pivotal role in the development/maintenance of industry-leading multi-modal and multi-lingual large language models (LLM). AGI team's mission is to leverage our hyper-scalable, general-purpose large model training and inference systems to develop and deploy cutting-edge sensory AI foundational models that revolutionize machine perception, interpretation and interaction, with humans and with the physical world.
Key job responsibilities
Provide support for cluster and node management, ensuring smooth operation of GenAI infrastructure.
Continuously improve and automate our cluster/capacity/maintenance upgrades.
Troubleshoot and research root causes thoroughly and fix defects.
Develop automation tools for improving operational excellence.
Candidates should be well-versed in core AWS services, including EC2 , Lambda , EKS etc.
Experienced in setting up and managing CI/CD pipelines using tools such as AWS CodePipeline, GitHub Actions, or similar platforms.
Familiarity with Infrastructure as Code (IaC) tools like AWS CloudFormation, Terraform, or the AWS CDK is a valuable asset. Furthermore, understanding of networking concepts like VPC, subnets, and security groups, Load Balancers and Route 53, is desirable.
Should have hands-on experience in Kubernetes.
- 1+ years of systems development experience
- Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
- Experience with Linux/Unix
- Experience with CI/CD pipelines build processes
