Deflated Sharpe Ratio (DSR) and Multiple-Testing Correction

The Sharpe ratio you see at the end of a parameter sweep is biased upward — sometimes by a lot. The Deflated Sharpe Ratio (DSR) is the correction you apply to recover an honest probability that the strategy has a real edge.

The Multiple-Testing Problem

Each backtest you run is a hypothesis test. Each parameter you sweep, each indicator you swap, each timeframe you try — every choice multiplies the number of hypotheses. With many hypotheses, even strategies with zero true edge will, by chance, throw up impressive Sharpe ratios.

This is the same problem as testing 1,000 coins for fairness. Some will look biased even when none are. The "fairest of the unfair" is selected by chance, not by truth.

A Concrete Example

Run a 100-cell parameter sweep on a strategy with no real edge. The expected maximum Sharpe across the 100 cells is approximately 2.0 — even though the average is 0.0. If you report only that maximum, you have reported pure noise as a 2.0 Sharpe.

What DSR Does

The Deflated Sharpe Ratio, introduced by Bailey and López de Prado (2014), takes the headline Sharpe and discounts it by:

The number of effective trials (how many parameter combinations you tried).
The variance of Sharpe estimates across those trials.
The skewness and kurtosis of the return distribution (Sharpe is more biased when returns are non-normal).
The length of the backtest in years (more data → less bias).

The output is a number between 0 and 1: the probability that the observed Sharpe is real, given the search you performed.

How to Read a DSR

DSR Calibration

DSR > 0.95: strong evidence of a real edge after accounting for the search.
DSR 0.80–0.95: probable edge, but borderline. Validate further with walk-forward.
DSR 0.50–0.80: possibly noise. Treat the strategy as a hypothesis, not a finding.
DSR < 0.50: the headline Sharpe is most likely an artifact of the search.

Why Headline Sharpe Lies

A 100-cell sweep producing Sharpe 2.6 at the best cell might have a DSR of 0.4 — meaning there is only a 40% chance that the result reflects a genuine edge rather than the luck of selecting the best of many. Same Sharpe, very different verdict.

Conversely, a 1-cell strategy (no sweep) with Sharpe 1.5 may have DSR 0.85 — fewer trials, less correction, more believable.

The Practical Workflow

Run a robustness sweep — broad enough to map the landscape, not so wide that you grade your entire library at once.
Read the DSR alongside the headline Sharpe. The Sharpe says "what was best"; the DSR says "is best meaningful."
If DSR is high, trust the plateau and proceed to walk-forward.
If DSR is low, accept that you do not yet have evidence of an edge. Either gather more data, narrow your search, or accept the strategy as exploratory.

The Bottom Line

DSR is the difference between "the best of 100 things" and "a thing that is genuinely good." Always look at DSR after a sweep. The headline Sharpe is what your eye sees; the DSR is what your money should listen to.

Bailey, D. H. & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5), 94–107.
Lo, A. W. (2002). The Statistics of Sharpe Ratios. Financial Analysts Journal, 58(4), 36–52.
Sharpe, W. F. (1994). The Sharpe Ratio. Journal of Portfolio Management, 21(1), 49–58.

Textbook references

López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.

Try it in QuanterLab

After any robustness sweep in SC001STCB, the DSR appears alongside the verdict. A DSR > 0.95 means the strategy survives multiple-testing correction; under that, treat the result as a hypothesis worth walk-forwarding, not a finding.