Ridge Regression for Factor Decomposition

The Risk Decomposition sub-pill fits a ridge regression of portfolio returns onto factor returns. This article explains ridge regression (Hoerl & Kennard 1970), what its single tuning parameter does, and why ridge is the right choice for backtest-scale factor decomposition.

The basic regression

Given T rebalance periods, portfolio returns rp,t, and factor returns ri,t for i in {value, quality, momentum, growth}:

rp,t = α + ∑i βi · ri,t + εt

OLS estimates the β coefficients that minimise the sum of squared errors:

minimise: ∑t εt2

The ridge modification

Ridge adds an L2 penalty on the coefficient magnitudes:

minimise: ∑t εt2 + λ · ∑i βi2

where λ (the ridge penalty, named alpha in scikit-learn) controls the strength of the shrinkage. λ = 0 reproduces OLS; large λ pushes all coefficients toward zero.

Why shrinkage helps in small samples

OLS is unbiased — on average it produces the correct β — but it has high variance when the regressors are correlated or the sample is small. Ridge introduces bias (the β coefficients are too small in expectation) but reduces variance enough that the mean squared error of the β estimates is lower than OLS. This is the bias-variance trade-off; for typical backtest sample sizes (T = 20–60) and correlated factor returns, ridge wins on MSE.

Closed-form solution

Ridge has a closed-form solution. With X as the regressor matrix (T × K) and y as the response vector:

βridge = (XTX + λI)−1 XTy

The λI term is the only change from OLS's (XTX)−1. The added λI stabilises the inverse when XTX is near-singular (which happens with correlated factors).

Choosing λ

The default in FM103 is λ = 0.1, which is mild. The trade-off:

  • λ near 0: behaves like OLS; coefficients are most variable; R² is highest.
  • λ ~ 0.1: gentle shrinkage; coefficients are stabilised slightly; R² slightly lower.
  • λ > 1: aggressive shrinkage; all β pulled near zero; R² falls materially.

For backtest factor decomposition the goal is interpretable coefficients, not predictive R² maximisation. Mild λ is the right default.

Cross-validation alternative

A more rigorous λ choice uses k-fold cross-validation on the regression sample. FM103 keeps λ fixed because the sample is too small (T = 20–60) for cross-validation to be reliable and because the sub-pill aims for stable, comparable results across users.

Ridge vs. Lasso

Lasso (Tibshirani 1996) uses an L1 penalty instead of L2:

minimise: ∑t εt2 + λ · ∑ii|

Lasso's key property: it can set coefficients exactly to zero, performing variable selection. Ridge cannot. For factor decomposition we want all four factors represented in the output (even if one has small β); ridge is the right choice. For high-dimensional factor models with many candidate factors, Lasso's automatic selection becomes useful.

Elastic Net

Elastic Net combines L1 and L2: λ1 · ∑ |βi| + λ2 · ∑ βi2. Useful when you want both shrinkage and variable selection. For the four-factor decomposition in FM103, the added complexity isn't worth it; pure ridge is sufficient.

Interpretation caveats

  • Ridge coefficients are smaller than OLS. Don't compare ridge β to OLS β from another study without noting the shrinkage.
  • R² is slightly lower than OLS. The bias trade-off costs a bit of fit.
  • Variance decomposition uses the ridge β. The per-factor contribution numbers in the sub-pill are based on the shrunk coefficients — they're slightly conservative.

Further Reading

Foundational papers

  • Hoerl, A. E. & Kennard, R. W. (1970). Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(1), 55–67.
  • Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society B, 58(1), 267–288.

Textbook references

  • Campbell, J. Y., Lo, A. W. & MacKinlay, A. C. (1997). The Econometrics of Financial Markets. Princeton University Press.
  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.

Related QuanterLab articles

Try it in QuanterLab

The default ridge λ = 0.1 is mild. Increase to 1.0 if the per-factor β coefficients look unstable across re-runs; decrease toward 0 if you want OLS-like fits and have enough sample.

Back to QuanterLab
Report
Loading report...
Article
Loading article...