The Strategy Health Card

The Strategy Health Card is the entry point to the Post-Mortem view in FM103APSX. It compresses a multi-year, multi-period factor backtest into a single 0–100 composite score, three sub-scores (Factor / Risk / Cost), and a short list of auto-generated warnings. This article explains what it measures, how the score is computed, and how to use it as a triage tool — not as a verdict.

Why a composite score?

A factor backtest produces hundreds of numbers: Sharpe, Sortino, max drawdown, turnover, factor exposures across N periods, regime-conditional performance, IC time series, and so on. Reading these one-by-one is the right thing to do for a final-stage decision — but it is the wrong thing to do at the start. Most strategies fail at least one structural test, and surfacing the failure cheaply lets the analyst skip wasted drill-down.

The composite score is the screening filter. It is intentionally coarse: a 78 vs. a 71 are not meaningfully different, but a 78 vs. a 32 are. The user is meant to glance at the headline, read the warnings, and decide whether to keep digging or to discard the strategy.

Composite scores are screening tools, not verdicts

Bailey & López de Prado (2014) make the point bluntly: every composite metric trades sensitivity for specificity. A strategy can score 80 on the health card and still be unfit — for instance, if it relies on a single regime that never recurs. Always look at the sub-scores and warnings before forming a view, and look at the underlying diagnostics before forming a conclusion.

The composite formula

The Strategy Health Card computes:

composite = 0.40 × factor + 0.35 × risk + 0.25 × cost

Each sub-score is on a 0–100 scale and is itself a weighted combination of normalised metrics. The weights reflect a deliberate stance: factor quality is the source of edge, so it gets the largest slice. Risk discipline determines whether the edge survives, so it gets the second slice. Cost erosion determines how much of the edge an investor actually receives, so it gets the third slice. The exact weights are tunable in the engine; the defaults reflect academic norms (Grinold & Kahn 1999 frame active management this way).

The three sub-scores

Factor sub-score

Combines three measurements drawn from the schema-v2 fields of the backtest:

  • Mean Information Coefficient across rebalance periods (Spearman rank correlation between each factor's score and realised forward return)
  • Hit rate — fraction of periods where the top-quintile beat the bottom-quintile by more than 0%
  • Specific (idiosyncratic) return — the share of strategy return not explained by the named factors

An IC of 0.05 averaged across 20 periods is statistically meaningful (its t-statistic is approximately 2.2 with the standard IC × sqrt(N) rule of thumb from Grinold 1989). An IC of 0.10 with a hit rate of 65% is a strong factor; below 0.03 with hit rate near 50% indicates noise.

Risk sub-score

Penalises portfolios that achieved good returns through hidden concentration or regime dependence:

  • Average concentration HHI — Herfindahl–Hirschman Index on portfolio weights (higher = more concentrated)
  • Maximum drawdown normalised by annual volatility
  • Regime breadth — fraction of macro regimes in which the strategy posted positive Sharpe
  • Sortino ratio — Sortino & van der Meer (1991) downside-only volatility

Cost sub-score

Estimates how much of the gross return survives realistic implementation costs:

  • Turnover-driven bps drag using configurable spread + impact bps (default 5/5 — see Transaction Cost)
  • Tax drag from realised gain mix (ST vs. LT — see Tax Drag)
  • Capacity ceiling derived from average daily volume (see Capacity & Liquidity)

Verdict bands

The composite is bucketed into four verdicts. The bands are explained in detail in Verdict Thresholds.

  • HEALTHY (≥ 70): Structurally sound across factor / risk / cost.
  • MONITOR (50–69): Tradable with caveats; watch the lowest sub-score.
  • REVIEW (30–49): Two or more pillars weak; needs redesign.
  • RECONSIDER (< 30): Likely not deployable in current form.

The warning list

Auto-warnings convert numerical thresholds into plain-English action items. Each warning carries a severity (info / warning / critical) and a suggested action. Examples:

  • "Factor half-life shorter than rebalance cadence — your factor edge dies before you trade it." (Critical; action: shorten rebalance frequency or change factors)
  • "Strategy posted positive Sharpe in only 1 of 4 macro regimes — regime dependence is severe." (Warning; action: investigate regime-conditional sub-panel)
  • "Average concentration HHI > 0.20 — effective N below 5 names; performance attributable to a few holdings." (Warning)

The 4 highest-severity warnings are pinned to the health card; the rest live in the relevant sub-panel.

Interpretation guide

  • HEALTHY with weak Factor sub-score (e.g., 45) and high Risk + Cost. The strategy works, but the source of return is not what the model claims. Common cause: a tight universe with mechanical rebalancing that captures market beta dressed up as a factor bet. Run Factor Attribution to confirm.
  • MONITOR with strong Factor + weak Cost. Real edge but turnover too high to realise. Try a slower cadence in Counterfactual Sweep.
  • REVIEW with strong Factor + weak Risk. Concentrated bet or regime-dependent. Style-box and regime-conditional panels are mandatory before any tuning.
  • RECONSIDER but headline CAGR looks great. Almost always a single fortunate sub-period. Drawdown forensics + regime-conditional will show it.

Schema requirements

Three sub-scores require fields added to FM101/FM102 backtests in May 2026 (schema v2): per-period full_universe_factor_scores, factor_return_series, and regime_classification. Older saved backtests render the card in degraded mode — the Factor sub-score is replaced with a "re-run for full diagnostics" placeholder. The Risk + Cost sub-scores still compute, so the composite is partial but informative.

Further Reading

Foundational papers

  • Bailey, D. H. & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5), 94–107.
  • Bailey, D. H., Borwein, J. M., López de Prado, M. & Zhu, Q. J. (2014). Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance. Notices of the AMS, 61(5), 458–471.
  • Harvey, C. R. & Liu, Y. (2015). Backtesting. Journal of Portfolio Management, 42(1), 13–28.

Textbook references

  • Grinold, R. C. & Kahn, R. N. (1999). Active Portfolio Management: A Quantitative Approach for Producing Superior Returns and Controlling Risk (2nd ed.). McGraw-Hill.
  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.

Related QuanterLab articles

Try it in QuanterLab

Run a saved FM101 or FM102 backtest through Post-Mortem and read only the Strategy Health Card. If the verdict is HEALTHY, jump to Stress Tests and Counterfactual Sweep. Otherwise, drill into the lowest sub-score first.

Back to QuanterLab
Report
Loading report...
Article
Loading article...