Machine Learning Engineer - Data - AI Research | Canva

About the Role

As a Research MLE at Canva, you'll be responsible for high-performance data acquisition, processing, and annotation to enable the training of cutting-edge models. Your focus will be on sourcing data, automation, building performant infrastructure for filtering and analyzing, and dealing with petabyte-scale data. You'll be the crucial link that makes novel model development, training, and evaluation possible, accelerating Canva's cutting-edge research.

‍

Key Focus Areas

Data Acquisition: Developing scalable tools and pipelines for acquiring diverse datasets from multiple sources
Curation: Engineering robust solutions for filtering, deduplication, quality assessment, and curating data that meets specific research requirements and model training criteria
Data Infrastructure: Developing high-throughput tools for interfacing with large-scale data pools, enabling efficient querying, sampling, and extracting valuable statistical insights and patterns

‍

Primary Responsibilities

Work alongside research teams to ensure continuous flow of high-quality data toward active projects, understanding their specific dataset requirements and delivery timelines
Curate targeted subsets of data using ML techniques including clustering, embedding-based similarity search, and automated quality scoring
Extract, visualize, and communicate actionable insights about dataset composition, distributions, biases, and statistical properties to inform research decisions
Build performant, parallel algorithms for gathering and processing data at scale, optimizing for both throughput and cost-efficiency across distributed systems
Engineer intuitive interfaces and tooling to help researchers explore, sample, and interact with large datasets without requiring deep infrastructure knowledge
Work with paired multimodal data (text-image, audio-video, etc.), ensuring alignment quality, handling synchronization challenges, and maintaining multimodal correspondence
Leverage high-performance parallel computing frameworks (Ray, Spark, torch.distributed, DeepSpeed, etc) and cloud infrastructure for distributed data operations on petabyte-scale datasets

‍

You’re probably a match if you have:

A strong aesthetic sense, with a background or demonstrated passion for visual design or human-computer interaction.
Strong proficiency in Python and ML frameworks (e.g., PyTorch, TensorFlow).
Extensive experience with designing and implementing large-scale data processing workflows using libraries like Pandas and data warehousing solutions such as Snowflake.
Solid understanding of statistical methods, including experimental design, A/B testing, and quality evaluation systems.
Experience with generative AI and synthetic data generation is highly desirable.

‍

Nice to have:

Experience with cloud platforms (e.g., AWS, GCP, Azure) for data storage, processing, and MLOps related to dataset management.
Experience with MLOps practices and tools specifically for data versioning, lineage, and pipeline automation.
Ability to develop data visualization or data collection interfaces (e.g., TypeScript, Python).

‍

Machine Learning Engineer - Data - AI Research

Canva

About the Role

Key Focus Areas

Primary Responsibilities

You’re probably a match if you have:

Nice to have:

🏙️ Search by city

🗺️ Search by area

✏️ Search by role type