A backtest produces a single point estimate — Sharpe 1.8, max DD 12%, win rate 62%. Those numbers are estimates from a finite sample, and like any estimate they have uncertainty around them. Bootstrap is the standard tool for measuring that uncertainty.
The Idea
You have N return observations from your backtest. The bootstrap procedure:
- Resample N observations with replacement from your original returns.
- Compute your metric (Sharpe, max DD, etc.) on the resampled series.
- Repeat 1,000 or 10,000 times.
- The 2.5th and 97.5th percentiles of the resampled metric values form a 95% confidence interval.
The result: instead of "Sharpe 1.8" you get "Sharpe 1.8, 95% CI [0.7, 2.6]". That interval is the question you should actually be asking — how confidently can I claim a Sharpe greater than zero?
Why Plain Bootstrap Sometimes Lies
Naïve bootstrap resampling assumes returns are independent. They usually aren't — there is autocorrelation, volatility clustering, and trend persistence. Resampling individual returns destroys those patterns and produces overly tight CIs.
The fix is the stationary bootstrap (Politis & Romano, 1994): instead of resampling individual returns, resample blocks of consecutive returns of random length. Block lengths preserve short-run dependence; randomization across blocks preserves the resampling logic.
For daily-bar strategies, use a stationary bootstrap with average block length around 10–20 days. For minute-bar strategies, scale up to 1–2 days. The exact length matters less than using some block-based scheme rather than i.i.d. resampling.
What CIs to Look At
- Sharpe CI. Most important. If the lower bound is below zero, you do not have statistically significant evidence of a positive Sharpe — even if the point estimate is high.
- Max DD CI. Useful for risk planning. Your historical max DD is a lower bound on what you might experience; the upper end of the CI is closer to a realistic worst case.
- Win rate CI. Particularly important for low-trade-count strategies. A 65% win rate from 20 trades has a 95% CI that often includes 50%.
How to Use Bootstrap CIs in Practice
- After any backtest, ask: is the Sharpe CI strictly positive? If not, the strategy is not statistically distinguishable from zero.
- When comparing two strategies, do their Sharpe CIs overlap? If yes, the difference is not statistically meaningful — you cannot honestly prefer one over the other on Sharpe alone.
- For risk planning, take the worst end of the max DD CI seriously. That is roughly the drawdown you should be psychologically and financially prepared for.
The Bottom Line
Point estimates lie about their precision. Bootstrap CIs tell you the truth about how much you actually know. Always read CIs alongside point estimates — and when in doubt, trust the interval, not the point.
Further Reading
Foundational papers
- Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife. Annals of Statistics, 7(1), 1–26.
- Politis, D. N. & Romano, J. P. (1994). The Stationary Bootstrap. Journal of the American Statistical Association, 89(428), 1303–1313.
Textbook references
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
- Campbell, J. Y., Lo, A. W. & MacKinlay, A. C. (1997). The Econometrics of Financial Markets. Princeton University Press.
Related QuanterLab articles
Try it in QuanterLab
When a backtest shows a confidence interval (e.g., "Sharpe 1.8, 95% CI [0.6, 2.9]"), the wider the interval, the less you should trust the point estimate. Tight CI = more sample, narrower uncertainty.