Your Team Responsibilities
We are developing cutting-edge software to identify and analyze climate change risk exposure of companies and real estate investments. Our models simulate nature and physical risks into insightful metrics for investors. We want to help investors for decision-making by exposing different types of climate risks and providing insightful and actionable data. For this, we are looking for an intern with background in location datasets and geospatial analysis. One of MSCI Climate Risk Center's key capabilities is our GeoSpatial dataset which is also a key input into our many of our climate models such as physical risk and biodiversity risk.
We are looking for a passionate Data Science Intern to support the integration of AI workflows and quality assurance (QA) protocols into our geospatial asset data workflow. This internship is ideal for candidates with strong data science foundations, an eye for data integrity, and curiosity about climate risk and geospatial applications. We can offer a hands-on opportunity to apply data science in a high-impact climate research environment, contributing to tools and datasets used by global investors.
Your Key Responsibilities
- Co-design and implement QA workflows using statistical diagnostics, validation rules, and anomaly detection techniques.
- Apply AI/machine learning tools to identify inconsistencies, missing values, or suspicious patterns in large-scale asset datasets.
- Build reusable QA tools in Python or R that can be automated and scaled across internal pipelines.
- Work with other research teams, our development team, and other stakeholders so that they can make best use out of the asset location data.
- Contribute to model documentation, reproducibility, and version control of data science assets.
Your skills and experience that will help you excel
- Proficiency in Python (preferred) or R, including libraries such as:
- pandas, numpy, scipy for data wrangling and statistical QA
- scikit-learn, xgboost, lightgbm for lightweight AI/ML modeling
- pytest, great_expectations, pydantic, or similar tools for data validation and QA
- Familiarity with SQL
- Understanding of data quality dimensions: accuracy, completeness, consistency, uniqueness, timeliness
- Knowledge of exploratory data analysis (EDA) techniques to assess dataset health
- Exposure to MLOps or reproducible science tools (e.g., DVC, MLflow, Jupyter Notebooks), geospatial libraries are a plus
- Experience with automated testing of data pipelines, validation schemas, and data version control and working with cloud computing environments (e.g., Google Cloud, Azure, or AWS) are a plus