Bootstrap Confidence Intervals for Backtests

A backtest produces a single point estimate — Sharpe 1.8, max DD 12%, win rate 62%. Those numbers are estimates from a finite sample, and like any estimate they have uncertainty around them. Bootstrap is the standard tool for measuring that uncertainty.

The Idea

You have N return observations from your backtest. The bootstrap procedure:

Resample N observations with replacement from your original returns.
Compute your metric (Sharpe, max DD, etc.) on the resampled series.
Repeat 1,000 or 10,000 times.
The 2.5th and 97.5th percentiles of the resampled metric values form a 95% confidence interval.

The result: instead of "Sharpe 1.8" you get "Sharpe 1.8, 95% CI [0.7, 2.6]". That interval is the question you should actually be asking — how confidently can I claim a Sharpe greater than zero?

Why Plain Bootstrap Sometimes Lies

Naïve bootstrap resampling assumes returns are independent. They usually aren't — there is autocorrelation, volatility clustering, and trend persistence. Resampling individual returns destroys those patterns and produces overly tight CIs.

The fix is the stationary bootstrap (Politis & Romano, 1994): instead of resampling individual returns, resample blocks of consecutive returns of random length. Block lengths preserve short-run dependence; randomization across blocks preserves the resampling logic.

Rule of Thumb

For daily-bar strategies, use a stationary bootstrap with average block length around 10–20 days. For minute-bar strategies, scale up to 1–2 days. The exact length matters less than using some block-based scheme rather than i.i.d. resampling.

What CIs to Look At

Sharpe CI. Most important. If the lower bound is below zero, you do not have statistically significant evidence of a positive Sharpe — even if the point estimate is high.
Max DD CI. Useful for risk planning. Your historical max DD is a lower bound on what you might experience; the upper end of the CI is closer to a realistic worst case.
Win rate CI. Particularly important for low-trade-count strategies. A 65% win rate from 20 trades has a 95% CI that often includes 50%.

How to Use Bootstrap CIs in Practice

After any backtest, ask: is the Sharpe CI strictly positive? If not, the strategy is not statistically distinguishable from zero.
When comparing two strategies, do their Sharpe CIs overlap? If yes, the difference is not statistically meaningful — you cannot honestly prefer one over the other on Sharpe alone.
For risk planning, take the worst end of the max DD CI seriously. That is roughly the drawdown you should be psychologically and financially prepared for.

The Bottom Line

Point estimates lie about their precision. Bootstrap CIs tell you the truth about how much you actually know. Always read CIs alongside point estimates — and when in doubt, trust the interval, not the point.

Bootstrap Confidence Intervals for Backtests

The Idea

Why Plain Bootstrap Sometimes Lies

What CIs to Look At

How to Use Bootstrap CIs in Practice

The Bottom Line

Further Reading

Foundational papers

Textbook references

Related QuanterLab articles

Try it in QuanterLab