Greedy Function Approximation: A Gradient Boosting Machine#
Jerry Friedman 1999#
“Greedy Function Approximation: A Gradient Boosting Machine” by Jerome H. Friedman is a seminal paper that introduces the gradient boosting algorithm, which has become a cornerstone in machine learning for both regression and classification tasks. Friedman approaches function estimation from the perspective of numerical optimization in function space, rather than parameter space. The paper draws a connection between stagewise additive expansions and steepest-descent minimization, leading to the development of a general gradient descent “boosting” paradigm for additive expansions.
Key Concepts#
Stagewise Additive Expansions: The method builds models in a stagewise manner, adding one weak learner (typically a decision tree) at a time. Each new learner aims to correct the errors of the combined ensemble of previously added learners.
Gradient Descent in Function Space: Unlike traditional optimization methods that adjust parameters, gradient boosting optimizes by iteratively adding functions to minimize a given loss function.
Friedman presents specific algorithms for various loss functions, including:
Least Squares: For regression tasks.
Least Absolute Deviation: Robust to outliers.
Huber Loss: Combines sensitivity to outliers and efficiency.
Multiclass Logistic Likelihood: For classification tasks.
Special enhancements are derived for the case where the individual additive components are regression trees. These enhancements make the resulting models robust, interpretable, and suitable for handling noisy data. Gradient boosting, especially when using regression trees, produces competitive, highly robust procedures for both regression and classification. It is particularly useful for mining less-than-clean data and has strong connections to other boosting methods developed by Freund and Schapire, as well as Friedman, Hastie, and Tibshirani. In summary, the paper lays the foundation for gradient boosting by presenting a robust framework for function approximation using gradient descent in function space. It provides a detailed methodology for constructing predictive models that are both powerful and interpretable.
Conversation with Jerry Friedman#
1999 Paper#
If you are unable to view the interview, please watch it here. If you are unable to view any of the documents above, please download the paper.