Statistical Modeling: The Two Cultures#
Leo Breiman 1995#
Two cultures#
Leo Breiman’s paper on the two cultures refers to two main approaches within the field of statistics:
Data Modeling Culture: This approach comes from more classical statistics and is based on generating models from data that provide insight into the underlying processes that have generated the data. Here, the focus is on understanding the relationship between variables and the reasons behind this relationship. The models are typically grounded in theory and are interpretable, meaning one can understand the role and significance of each variable.
Algorithmic Modeling Culture: This culture is more aligned with machine learning approaches which focus on predictive accuracy. It treats the true data-generating process as unknown and complex, and it uses algorithms to find patterns that can predict future unseen data. The models may be treated as “black boxes” where interpretability is often sacrificed for performance.
Lessons to takeaway from this paper#
Embrace computational and algorithmic thinking: If you are tasked with learning from data generated by a complicated model, you should search for the model that gives the best solution – either algorithmic or data model.
Being open to handling complexity: Using restrictive models can prevent statisticians from working on exciting new problems.Real-world data is becoming increasingly messy and complex. Be prepared tod eal with large datasets and complex models that may not fit into neat theoretical frameworks.
Critical thinking: Incorrect data modeling assumptions can lead to questionable scientific discoveries. However, it is also important to critically assess the appropriateness of black box models.
Value both cultures: Most importantly, Breiman’s paper encourages the statistical community to recognize the value in both cultures and a well-rounded statistician appreciates the strengths and limitations of each approach.
Summary#
Presentation#
2001 Paper#
If you are unable to view any of the documents above, please download the summary, presentation, and paper.