
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive engineers to play a pivotal role in the development/maintenance of industry-leading multi-modal and multi-lingual large language models (LLM). AGI team's mission is to leverage our hyper-scalable, general-purpose large model training and inference systems to develop and deploy cutting-edge sensory AI foundational models that revolutionize machine perception, interpretation and interaction, with humans and with the physical world.
β
β
Key job responsibilities
Provide support for cluster and node management, ensuring smooth operation of GenAI infrastructure.
Continuously improve and automate our cluster/capacity/maintenance upgrades.
Troubleshoot and research root causes thoroughly and fix defects.
Develop automation tools for improving operational excellence.
Candidates should be well-versed in core AWS services, including EC2 , Lambda , EKS etc.
Experienced in setting up and managing CI/CD pipelines using tools such as AWS CodePipeline, GitHub Actions, or similar platforms.
Familiarity with Infrastructure as Code (IaC) tools like AWS CloudFormation, Terraform, or the AWS CDK is a valuable asset. Furthermore, understanding of networking concepts like VPC, subnets, and security groups, Load Balancers and Route 53, is desirable.
Should have hands-on experience in Kubernetes.
β
β
- 1+ years of systems development experience
- Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
- Experience with Linux/Unix
β
β
- Experience with CI/CD pipelines build processes
β