Cookbook: Interpreting Walk-Forward Results

A walk-forward run produces a lot of numbers — composite Sharpe, per-fold breakdown, parameter stability, decay ratio. This cookbook covers how to read the output, in priority order, and what each finding means for whether you should trade the strategy.

The Headline: Composite OOS Sharpe

The first number to look at is the composite Sharpe ratio across all OOS slices stitched together. This is the closest thing to "how the strategy would have performed in real trading."

Composite Sharpe > 1.5: strong evidence of edge. Worth deploying with appropriate sizing.
Composite Sharpe 1.0–1.5: moderate edge. Tradable with conservative sizing.
Composite Sharpe 0.5–1.0: marginal. May or may not work after costs. Treat as exploratory.
Composite Sharpe < 0.5: the strategy didn't survive walk-forward. Don't trade.

The Decay Ratio

Decay = composite OOS Sharpe / average IS Sharpe. Tells you how much edge survived from the in-sample fit to the out-of-sample reality.

Decay 0.7–1.0: healthy. Strategy generalizes well; in-sample performance was honest.
Decay 0.5–0.7: typical. Some overfit, but the underlying edge is real.
Decay 0.3–0.5: significant overfit. The strategy has some edge, but the IS Sharpe was misleading.
Decay < 0.3: almost all of the IS Sharpe was noise. The walk-forward result is the truth; the IS result was a mirage.

High Decay ≠ Bad Strategy

A strategy with IS Sharpe 4.0 and OOS Sharpe 2.0 has 50% decay. That's a "lot of overfit" — but the OOS Sharpe is still excellent. Always evaluate the OOS Sharpe in absolute terms, then use decay ratio to assess how much the IS number was lying.

Per-Fold Variance

The composite Sharpe averages across folds. The per-fold breakdown tells you whether the average is misleading.

All folds positive: strong consistency. The strategy works across the regimes covered.
One fold dominates: warning. If 4 of 5 folds are mediocre and one is excellent, you have one good period — not five. The composite Sharpe overstates true reliability.
Mostly positive, one disaster: the strategy works in most regimes and fails in one. If you can identify the failure regime (high vol? bear market?), you may be able to add a regime filter to disable the strategy in that regime.
Random sign: no real edge. Even though the composite may be positive, you're looking at noise.

Parameter Stability

Each fold re-optimizes parameters from scratch. The choices the optimizer makes across folds are very informative:

Stable across folds: the optimal parameters are robust. Your walk-forward result is on the same plateau in every fold.
Slowly drifting: the regime is changing slightly. Acceptable; consider rolling-mode WF and live re-tuning.
Wildly different per fold: the optimizer is chasing noise. The strategy doesn't have stable parameters across regimes — meaning it doesn't really work.

Composite Equity Curve Shape

Look at the stitched OOS equity curve as a single picture:

Steady upward drift with shallow drawdowns: tradable. This is what you want.
Stair-step pattern (gain → flat → gain): common in regime-conditional strategies. The flat periods are the unfavorable regimes; the gains are when the strategy is "on."
Big spike followed by long flat: dangerous. The strategy worked in one moment of history; OOS evidence beyond that moment is weak.
Steady drift with one massive drawdown: the strategy works most of the time but has one regime where it fails badly. Quantify whether you can tolerate that drawdown.

The DSR on the Composite

QuanterLab computes DSR on the composite OOS equity curve, accounting for the parameter search performed within each fold. This is the most rigorous single number for "is this real."

Composite DSR > 0.95: very strong evidence. Trade with confidence.
Composite DSR 0.8–0.95: probable edge. Trade with conservative sizing.
Composite DSR < 0.8: not yet established. Either gather more data or treat as exploratory.

The Decision Tree

Composite OOS Sharpe > 1.0? — if no, stop. Strategy doesn't work.
All folds positive? — if no, identify the failure regime; consider conditional deployment.
Decay ratio > 0.5? — if no, original IS Sharpe was severely misleading.
Parameter stability across folds? — if no, walk away.
Composite DSR > 0.8? — if no, treat as exploratory.
If all yes: save, paper trade, deploy with sizing appropriate to the OOS Sharpe.

The Bottom Line

Walk-forward results are a profile, not a number. The composite Sharpe is the headline, but per-fold variance and parameter stability are equally important. A strategy that passes all the checks is rare — and worth the discipline of getting through them. A strategy that fails any check is information saying "not yet" or "not at all."

Pardo, R. (2008). The Evaluation and Optimization of Trading Strategies (2nd ed.). Wiley.
Bailey, D. H. & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5), 94–107.
Bailey, D. H., Borwein, J. M., López de Prado, M. & Zhu, Q. J. (2014). Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance. Notices of the AMS, 61(5), 458–471.

Textbook references

Pardo, R. (2008). The Evaluation and Optimization of Trading Strategies (2nd ed.). Wiley.
López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.

Try it in QuanterLab

After every walk-forward run, force yourself to write down: composite OOS Sharpe, decay ratio, per-fold consistency, parameter stability, and DSR. The strategies that pass all five checks are rare; do not deploy strategies that fail any of them.