Drawdown Forensics: Episode-Level Attribution

"Max drawdown was −18%" is a single number. It hides whether the −18% came from one catastrophic event or accumulated from many small ones, which holdings drove it, which sectors were involved, and whether the recovery happened. Drawdown Forensics breaks the equity curve into discrete drawdown episodes and attributes each one to the tickers, sectors, and factors that caused it.

What an episode is

A drawdown episode is defined by three dates: peak (the equity high before the drawdown started), trough (the equity low), and recovery (when equity returned to or above the peak, or the end of the backtest if it hasn't). The episode's depth is the percentage decline from peak to trough; its duration is days from peak to trough; its recovery period is days from trough to recovery.

FM103 enumerates all episodes deeper than a configurable threshold (default 5%). The sub-pill shows them in order of severity, with the worst on top.

Per-episode attribution

For each episode, the engine identifies:

Bottom tickers: the 5–10 holdings that contributed most to the loss, ranked by their cumulative period contribution (weight × return).
Sector breakdown: per-sector loss contribution. Often a small number of sectors dominate; the rest are noise.
Worst factor: the factor (value / quality / momentum / growth) whose realised period spread was most negative during the episode — the macro reason the strategy underperformed.

Why this matters more than "max drawdown"

Two backtests can have the same −18% max drawdown with very different stories:

Backtest A: −18% from one episode in March 2020, caused by 3 travel-sector names taking heavy losses. Specific event, identifiable cause, addressable with sector-cap.
Backtest B: −18% accumulated across 4 episodes in different regimes, each driven by different factor exposures. Pattern of fragility, not addressable with a sector-cap.

The max drawdown number is the same; the implication for risk management is opposite.

Drawdown statistics that complement max-DD

The sub-pill also reports:

Number of episodes ≥ threshold. 1–2 deep episodes is acceptable; 5+ moderate episodes signals systemic fragility.
Average episode depth. A useful counterweight to max-DD.
Time underwater (fraction of days when equity was below a prior peak). Backtests with 60%+ time underwater are psychologically hard to sit through even if max-DD is moderate.
Calmar ratio. CAGR / max-DD — a return-to-pain ratio popular in trend-following.
Worst-episode share. Worst episode depth divided by sum of all episode depths. > 70% indicates one event dominates the entire drawdown profile.

Episode shape: V-shape vs. U-shape vs. L-shape

The recovery duration relative to the drawdown duration is informative:

V-shape: Fast drop, fast recovery (e.g., COVID March 2020 → June 2020). Common in equity strategies; manageable.
U-shape: Drop, prolonged trough, gradual recovery (e.g., 2008–2009). Tests psychological discipline more than P&L.
L-shape: Drop with no recovery in sample. Either the strategy permanently lost its edge or the sample ends mid-recovery. Either way, the backtest doesn't prove the strategy survives this episode.

Using drawdown forensics to design a stop-loss

The bottom-tickers list per episode is the input data for designing a stop-loss overlay. If a small number of tickers consistently appear in the bottom of multiple episodes, a per-stock SL is well-targeted. If the bottom tickers vary by episode (different names each time), a portfolio-level SL is more appropriate. The Autopsy mode's Stop-Loss simulator (separate tab) lets you test the overlay quantitatively.

Caveats

Episode counting depends on the threshold. A 3% threshold produces many small episodes; a 10% threshold produces a few large ones. Both are valid views; the default 5% is a practitioner compromise.
Recovery date is censored at backtest end. Episodes still underwater at the last date have no recovery duration. This can flatter the recent past.
Attribution is to factors at start of episode. Holdings rotate during the episode; the attribution assigns the loss to the starting composition. Long episodes with high turnover dilute this.

"This drawdown was avoidable" — usually not

The temptation after seeing drawdown forensics is to overlay rules that would have avoided the past drawdowns (skip travel sector in 2020, skip financials in 2008). This is a textbook backtest overfitting pattern (Bailey et al. 2014). The forensics inform future risk-overlay design at the level of structure (cap single-stock weight, cap sector weight), not at the level of named-stock or named-sector exclusions.

Magdon-Ismail, M. & Atiya, A. F. (2004). Maximum Drawdown. Risk Magazine, October 2004, 99–102.
Bailey, D. H., Borwein, J. M., López de Prado, M. & Zhu, Q. J. (2014). Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance. Notices of the AMS, 61(5), 458–471.

Textbook references

Jorion, P. (2007). Value at Risk: The New Benchmark for Managing Financial Risk (3rd ed.). McGraw-Hill.

Try it in QuanterLab

A backtest with one dominant drawdown episode is fundamentally different from one with many moderate episodes. Read both the depth and the number of episodes; never let max-DD alone summarise the risk profile.