The Risk Decomposition sub-pill fits a ridge regression of portfolio returns onto factor returns. This article explains ridge regression (Hoerl & Kennard 1970), what its single tuning parameter does, and why ridge is the right choice for backtest-scale factor decomposition.
The basic regression
Given T rebalance periods, portfolio returns rp,t, and factor returns ri,t for i in {value, quality, momentum, growth}:
OLS estimates the β coefficients that minimise the sum of squared errors:
The ridge modification
Ridge adds an L2 penalty on the coefficient magnitudes:
where λ (the ridge penalty, named alpha in scikit-learn) controls the strength of the shrinkage. λ = 0 reproduces OLS; large λ pushes all coefficients toward zero.
Why shrinkage helps in small samples
OLS is unbiased — on average it produces the correct β — but it has high variance when the regressors are correlated or the sample is small. Ridge introduces bias (the β coefficients are too small in expectation) but reduces variance enough that the mean squared error of the β estimates is lower than OLS. This is the bias-variance trade-off; for typical backtest sample sizes (T = 20–60) and correlated factor returns, ridge wins on MSE.
Closed-form solution
Ridge has a closed-form solution. With X as the regressor matrix (T × K) and y as the response vector:
The λI term is the only change from OLS's (XTX)−1. The added λI stabilises the inverse when XTX is near-singular (which happens with correlated factors).
Choosing λ
The default in FM103 is λ = 0.1, which is mild. The trade-off:
- λ near 0: behaves like OLS; coefficients are most variable; R² is highest.
- λ ~ 0.1: gentle shrinkage; coefficients are stabilised slightly; R² slightly lower.
- λ > 1: aggressive shrinkage; all β pulled near zero; R² falls materially.
For backtest factor decomposition the goal is interpretable coefficients, not predictive R² maximisation. Mild λ is the right default.
Cross-validation alternative
A more rigorous λ choice uses k-fold cross-validation on the regression sample. FM103 keeps λ fixed because the sample is too small (T = 20–60) for cross-validation to be reliable and because the sub-pill aims for stable, comparable results across users.
Ridge vs. Lasso
Lasso (Tibshirani 1996) uses an L1 penalty instead of L2:
Lasso's key property: it can set coefficients exactly to zero, performing variable selection. Ridge cannot. For factor decomposition we want all four factors represented in the output (even if one has small β); ridge is the right choice. For high-dimensional factor models with many candidate factors, Lasso's automatic selection becomes useful.
Elastic Net
Elastic Net combines L1 and L2: λ1 · ∑ |βi| + λ2 · ∑ βi2. Useful when you want both shrinkage and variable selection. For the four-factor decomposition in FM103, the added complexity isn't worth it; pure ridge is sufficient.
Interpretation caveats
- Ridge coefficients are smaller than OLS. Don't compare ridge β to OLS β from another study without noting the shrinkage.
- R² is slightly lower than OLS. The bias trade-off costs a bit of fit.
- Variance decomposition uses the ridge β. The per-factor contribution numbers in the sub-pill are based on the shrunk coefficients — they're slightly conservative.
Further Reading
Foundational papers
- Hoerl, A. E. & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67.
- Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society B, 58(1), 267–288.
Textbook references
- Campbell, J. Y., Lo, A. W. & MacKinlay, A. C. (1997). The Econometrics of Financial Markets. Princeton University Press.
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
Related QuanterLab articles
Try it in QuanterLab
The default ridge λ = 0.1 is mild. Increase to 1.0 if the per-factor β coefficients look unstable across re-runs; decrease toward 0 if you want OLS-like fits and have enough sample.