DS300: Causal Inference and Machine Learning. Core module of the MPhil in Economics and Data Science, Faculty of Economics, University of Cambridge.
There are two cultures in the use of statistical modelling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.
Leo Breiman, 2001 — the organising tension of this courseThe course covers topics at the intersection of machine learning and econometrics, covering a mix of theory and applications. In making the distinction between models used to solve a prediction problem and models used to estimate a causal effect, we demonstrate how empirical strategies such as unconfoundedness, instrumental variables, and difference-in-difference can be used alongside machine learning methods for prediction.
The tension between parametric and nonparametric approaches reflects fundamental disciplinary differences. Econometricians prioritise interpretable parameters and structural understanding of economic relationships. Machine learning practitioners prioritise nonparametric flexibility and generalisation. Modern causal machine learning confronts the challenge of reconciling these competing objectives.
Wages and gender (Lasso). Children and parental labour supply. Impact of job training on earnings. Fertility and labour supply with causal forests.
Credit card default classification. Forecasting financial crises with tree ensembles. Post-earnings announcement drift. Corporate cash holdings.
Central bank communication (FinBERT). Sentiment and Tesla stock price. Impact of microcredit (Crépon et al.). Time-of-use tariffs and smart meter data.
Details to follow. Drawing on DS300 course material, with applied sessions tailored for practitioners and researchers across disciplines.
Intensive sessions covering Double Machine Learning, causal forests, and applications in labour, finance, and policy evaluation.
Graduate students, applied economists, and data scientists seeking to extend prediction-focused ML skills toward causal estimation.
A textbook proposal submitted to Cambridge University Press, April 2026. Drawing on course materials developed and tested at the Faculty of Economics, University of Cambridge.
Traditional econometric methods struggle with high-dimensional data and complex heterogeneity. Machine learning approaches lack the architecture for causal estimation. This fundamental tension demands a synthesis.
Machine Learning for Causal Inference — Proposal NarrativeThe book's central innovation lies in its presentation of modern causal machine learning methods within a coherent economic framework. Core themes include the reconciliation of prediction and causation, the treatment of high-dimensional nuisance parameter estimation, and the development of cross-fitting and sample-splitting procedures that enable valid statistical inference with flexible machine learning algorithms.
Looking Ahead. Overview. The two statistical cultures. Prediction versus causation: the fundamental distinction.
Best Predictor and the CEF. Estimation and Inference for Causal Effects. Frisch-Waugh-Lovell Theorem as the unifying bridge.
Lasso and Ridge for linear models. Applications of regularised regression. Double Machine Learning.
Treatment Effects and Double Robust Estimators. Random Forests. Causal Trees and Generalised Random Forests. Heterogeneous effects.
Graduate students in economics. Data scientists moving from prediction to causal reasoning. Applied researchers in policy, finance, and labour economics.
Materials developed and tested through DS300 at Cambridge. Code throughout in R and Python. Applications drawn from real datasets across multiple domains.