Applied Scientist I - Thomson Reuters Labs
โ
This position will be based in Zug, with hybrid working available.
โ
Want to play a crucial role in advancing the science of evaluating LLMs and intelligent agents? Join Thomson Reuters Labs, where we experiment, build, and deliver cutting-edge AI systems that empower professionals worldwide.
โ
This is a great role for a Master's or PhD graduate with a keen interest in experimental design, data analysis, and communicating technical findings clearly to create operational products. Our flagship AI assistant, CoCounsel, helps legal, tax, and business professionals work smarter. Weโre expanding our LLM Evaluation team, focused on developing automated, scalable, and trustworthy evaluation frameworks that measure model reasoning, reliability, and alignment.
โ
โ
At Thomson Reuters Labs, we blend applied research with real-world impact. Our scientists work on projects spanning LLM reasoning, benchmarking, grounding, and agentic behaviorโall aimed at ensuring our AI systems are effective, explainable, and robust. We believe that rigorous evaluation is the foundation of responsible AI. This role offers the opportunity to push the boundaries of auto-evaluation, LLM-as-a-judge, and agentic evaluation methodologies, influencing how AI systems are measured and improved at scale.
โ
โ
About the Role
โ
As an Applied Scientist I at TR Labs you will:
- Design and Conduct Evaluations: Develop and execute evaluation pipelines for LLMs and agentic systems, assessing reasoning, factual accuracy, and alignment.
- Automate and Scale: Build tools and frameworks for automatic evaluation, including synthetic dataset creation, LLM-as-a-judge workflows, and continuous benchmarking systems.
- Collaborate and Translate: Partner with applied scientists, ML engineers, and product managers to translate evaluation results into model improvements and product insights.
- Research and Experiment: Prototype new evaluation metrics, contribute to internal reports, and support publications or presentations on evaluation methods.
- Champion Best Practices: Promote reproducibility, transparency, and ethical AI evaluation within the team and broader organization.
โ
โ
About You
โ
Youโre a great fit for this position as Applied Scientist I at TR Labs if you have:
- PhD in Computer Science, Artificial Intelligence, Machine Learning, or a related field (exceptional Masterโs candidates with equivalent experience will be considered).
- Research or hands-on experience with large language models, NLP evaluation, or agent-based AI systems.
- Strong understanding of LLM performance measurement, prompt evaluation, and reliability testing.
- Proficiency in Python and familiarity with ML libraries such as PyTorch, Transformers, and LangChain.
- Comfort with experimental design, data analysis, and communicating technical findings clearly.
โ
โ
If you have any of the following, we would like to hear more about this on your application:
- Experience with LLM evaluation frameworks (e.g., OpenAI Evals, HELM, LM Harness, or custom auto-eval tools).
- Familiarity with retrieval-augmented generation (RAG), tool-using agents, or agentic evaluation methodologies.
- Experience in cloud-based ML development (AWS, Azure, or GCP).
- Record of publications or preprints in top-tier venues (e.g., NeurIPS, ACL, EMNLP, ICLR) or equivalent research contributions.
- Interest in Responsible AI, fairness, and interpretability research.
โ