βWhat youβll be doing
- Contribute to the design and implementation of CUDA Core Libraries in C++ and/or Python, including parallel algorithms and language idiomatic exposure of core CUDA concepts.
- Design and optimize GPU algorithms and APIs, from high-level interfaces down to low-level performance tuning involving memory, parallelism, and synchronization.
- Improve developer experience: tests, benchmarks, CI, packaging, documentation, and examples.
- Collaborate with experienced CUDA engineers; participate in design reviews, code reviews, and open-source-style workflows.
β
β
What we need to see
- Currently pursuing a BS, MS, or PhD in Computer Science, Computer Engineering, or a related field.
- Strong programming skills in C++, Python, or both, with interest in systems-level development (performance, memory, concurrency, API design).
- Familiarity with modern C++ (templates, generics, standard library) and/or Python library development and packaging.
- Experience with parallel or heterogeneous programming (CUDA, OpenMP, GPU-accelerated Python, or similar) through coursework, projects, or research.
- Experience with software libraries or open-source projects, including testing, performance profiling, and code reviews.
- Ability to work independently and drive a project from exploration to completion.
- Clear written communication for design discussions and documentation.
β
β
Ways to stand out from the crowd
- Knowledge of CPU/GPU architecture and how hardware details impact algorithmic performance.
- Hands-on experience with CUDA C++, CUDA Python, Pytorch, JAX, Numba, CuPy, or related GPU-accelerated Python stacks.
- Familiarity with libraries such as Thrust, CUB, libcudacxx, or similar modern C++/GPU libraries.
- Familiarity with compiler infrastructure and tooling such as LLVM, Clang/LLVM tooling, or MLIR.
- Comfort navigating and debugging large, multi-language codebases (C++, Python, CMake, GitHub Actions CI systems) with demonstrated interest in developer tools, library design, and making other developers faster and more productive.
β