NVIDIA is developing processor and system architectures that accelerate deep learning and high-performance computing applications. We are looking for an expert deep learning system performance architect to join our AI performance modelling, analysis and optimization efforts. In this position, you will have a chance to work on DL performance modelling, analysis, and optimization on state-of-the-art hardware architectures for various LLM workloads. You will make your contributions to our dynamic technology focused company.
β
β
What youβll be doing:
- Analyze state of the art DL networks (LLM etc.), identify and prototype performance opportunities to influence SW and Architecture team for NVIDIA's current and next gen inference products.
- Develop analytical models for the state of the art deep learning networks and algorithm to innovate processor and system architectures design for performance and efficiency.
- Specify hardware/software configurations and metrics to analyze performance, power, and accuracy in existing and future uni-processor and multiprocessor configurations.
- Collaborate across the company to guide the direction of next-gen deep learning HW/SW by working with architecture, software, and product teams.
β
β
What we need to see:
- BS or higher degree in a relevant technical field (CS, EE, CE, Math, etc.).
- Strong programming skills in Python, C, C++.
- Strong background in computer architecture.
- Experience with performance modeling, architecture simulation, profiling, and analysis.
- Prior experience with LLM or generative AI algorithms.
β
β
Ways to stand out from the crowd:
- GPU Computing and parallel programming models such as CUDA and OpenCL.
- Architecture of or workload analysis on other deep learning accelerators.
- Deep neural network training, inference and optimization in leading frameworks (e.g. Pytorch, TensorRT-LLM, vLLM, etc.).
- Open-source AI compilers (OpenAI Triton, MLIR, TVM, XLA, etc.).
β