University of Cambridge · Faculty of Economics

Causal Machine Learning

Reconciling prediction and causal inference in high-dimensional settings
Dr. Melvyn Weeks
Faculty of Economics and Clare College, University of Cambridge
01

Data Science Cambridge

Course Outline →

DS300: Causal Inference and Machine Learning. Core module of the MPhil in Economics and Data Science, Faculty of Economics, University of Cambridge.

There are two cultures in the use of statistical modelling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.

Leo Breiman, 2001 — the organising tension of this course

The course covers topics at the intersection of machine learning and econometrics, covering a mix of theory and applications. In making the distinction between models used to solve a prediction problem and models used to estimate a causal effect, we demonstrate how empirical strategies such as unconfoundedness, instrumental variables, and difference-in-difference can be used alongside machine learning methods for prediction.

The tension between parametric and nonparametric approaches reflects fundamental disciplinary differences. Econometricians prioritise interpretable parameters and structural understanding of economic relationships. Machine learning practitioners prioritise nonparametric flexibility and generalisation. Modern causal machine learning confronts the challenge of reconciling these competing objectives.

Course Sessions

01
Introduction
02
Best Predictor and the Conditional Expectation Function
03
Estimation and Inference for Causal Effects
04
High Dimensional Methods for Linear Models
05
Applications of Regularised Regression
06
Double Machine Learning
07
Treatment Effects and Double Robust Estimators
08
Random Forests
09
Architecture of Causal Trees and Generalised Random Forests
10
Generalised Causal Forests
10b
Testing for Heterogeneity
11
Introduction to Generative AI and Large Language Models

Applications

Labour Economics

Wages and gender (Lasso). Children and parental labour supply. Impact of job training on earnings. Fertility and labour supply with causal forests.

Finance

Credit card default classification. Forecasting financial crises with tree ensembles. Post-earnings announcement drift. Corporate cash holdings.

NLP & Policy

Central bank communication (FinBERT). Sentiment and Tesla stock price. Impact of microcredit (Crépon et al.). Time-of-use tariffs and smart meter data.

02

Summer School 2026

Details to follow. Drawing on DS300 course material, with applied sessions tailored for practitioners and researchers across disciplines.

Format

Intensive sessions covering Double Machine Learning, causal forests, and applications in labour, finance, and policy evaluation.

Audience

Graduate students, applied economists, and data scientists seeking to extend prediction-focused ML skills toward causal estimation.

03

Machine Learning for Causal Inference

View Proposal →

A textbook proposal submitted to Cambridge University Press, April 2026. Drawing on course materials developed and tested at the Faculty of Economics, University of Cambridge.

Traditional econometric methods struggle with high-dimensional data and complex heterogeneity. Machine learning approaches lack the architecture for causal estimation. This fundamental tension demands a synthesis.

Machine Learning for Causal Inference — Proposal Narrative

The book's central innovation lies in its presentation of modern causal machine learning methods within a coherent economic framework. Core themes include the reconciliation of prediction and causation, the treatment of high-dimensional nuisance parameter estimation, and the development of cross-fitting and sample-splitting procedures that enable valid statistical inference with flexible machine learning algorithms.

Book Structure

Part I — Introduction

Looking Ahead. Overview. The two statistical cultures. Prediction versus causation: the fundamental distinction.

Part II — Foundations

Best Predictor and the CEF. Estimation and Inference for Causal Effects. Frisch-Waugh-Lovell Theorem as the unifying bridge.

Part III — High-Dimensional Methods

Lasso and Ridge for linear models. Applications of regularised regression. Double Machine Learning.

Part IV — Modern Causal ML

Treatment Effects and Double Robust Estimators. Random Forests. Causal Trees and Generalised Random Forests. Heterogeneous effects.

Audience

Graduate students in economics. Data scientists moving from prediction to causal reasoning. Applied researchers in policy, finance, and labour economics.

Pedagogical Basis

Materials developed and tested through DS300 at Cambridge. Code throughout in R and Python. Applications drawn from real datasets across multiple domains.