Counterfactual Sweep: Top-N and Stride Robustness

The Counterfactual Sweep re-runs the same strategy across a grid of parameter values — varying top-N (number of holdings) or rebalance stride (how many periods between rebalances) — and plots how Sharpe responds. A robust strategy shows a flat Sharpe-vs-parameter curve. A fragile strategy shows a sharp peak at the chosen value and collapse elsewhere. This is one of the cleanest tests for parameter overfitting.

What "counterfactual" means

The original backtest selected a single (top_n, frequency) pair. The counterfactual asks: what would have happened across the realistic alternative choices? If Sharpe is similar across all alternatives, the chosen configuration is not load-bearing. If Sharpe varies wildly, the chosen configuration is the dominant determinant — and you don't know how you arrived at it.

The two sweep variables

Top-N sweep

Varies the number of holdings selected per period. Default values: 15, 25, 50, 75, 100. The sweep keeps factor weights, rebalance frequency, and universe constant; only top_n changes.

Interpretation:

Sharpe flat across top_n: Strategy works across reasonable concentration choices. Robust.
Sharpe peaks at small top_n and collapses for large: Strategy concentrates alpha in the top few names. Concentrated bet dressed up as a factor strategy.
Sharpe peaks at large top_n and collapses for small: Strategy benefits from broad diversification — alpha is in the cross-section, not in the top names.
Sharpe peaks at one specific top_n: Likely parameter overfit. The chosen top_n is the one that happened to work in-sample.

Stride sweep

Varies how many periods between rebalances. Default values: 1, 2, 4, 8 (so a quarterly backtest with stride=2 effectively rebalances semi-annually). Tests whether the original rebalance frequency was correct.

Interpretation:

Sharpe flat across strides: Rebalance cadence is not critical — the signal is slow enough to survive holding longer or shorter.
Sharpe peaks at stride=1 (original cadence) and falls at longer strides: Signal is fast-decaying. Holding longer loses material edge. Cross-check with Factor Decay.
Sharpe rises at longer strides: Strategy was over-rebalancing. Slower cadence reduces turnover and tax drag without losing signal.

The robustness verdict

FM103 summarises the sweep with a Sharpe range metric:

range = max(Sharpe_sweep) − min(Sharpe_sweep)

Range < 0.30: High robustness. The strategy is parameter-insensitive.
Range 0.30–0.70: Moderate. The strategy has parameter sensitivity but no collapse.
Range > 0.70: Low. The strategy is parameter-fragile; the chosen configuration is doing material work.

Why this matters

Bailey, Borwein, López de Prado & Zhu (2014) and Pardo (2008) both emphasise that backtest overfitting most often manifests as parameter brittleness. The strategy that works at exactly top_n = 30 and quarterly rebalance, but breaks at 25 or 35 holdings or at monthly/semi-annual cadence, is overfit to the search procedure that found it. A sweep that flattens the Sharpe response is a cheap structural test for this pathology.

Reading the cumulative-return overlay

The sub-pill plots both Sharpe and cumulative return per sweep point. A pattern to watch: Sharpe similar across sweep points but cumulative return varying widely. This usually indicates the configurations have different variance levels — the Sharpe ratio normalises this out, but the absolute return matters for funding decisions. A more concentrated configuration may have similar Sharpe with much higher cumulative return at the cost of much higher drawdowns.

Best configuration label

The sub-pill highlights the configuration with the highest Sharpe within the sweep. This is informative but should not be used as a tuning target. Picking the best sweep point and treating it as a new chosen configuration is the start of an in-sample optimisation loop — the exact pattern Bailey et al. warn against. The label is a description of where Sharpe peaks, not a recommendation to switch.

Stride sweep + cost overlay

Stride sweep Sharpe ignores transaction costs (it uses gross returns). A stride that delivers higher Sharpe at higher turnover will lose more to costs. Cross-reference with the Cost sub-pill: net Sharpe at the high-turnover stride may underperform the lower-turnover stride after costs even when gross Sharpe disagrees.

Caveats

Sweep is in-sample. All sweep points use the same historical data. A different out-of-sample period may produce a different best-stride.
Sweep is not exhaustive. The default grids cover reasonable ranges but not all conceivable values. Edge cases (top_n = 5, stride = 16) are not tested.
Sweep keeps everything else constant. Joint variation (e.g., top_n × sectors_excluded) is not tested; only the marginal effect of one variable.

Counterfactual Sweep: Top-N and Stride Robustness

What "counterfactual" means

The two sweep variables

Top-N sweep

Stride sweep

The robustness verdict

Why this matters

Reading the cumulative-return overlay

Best configuration label

Stride sweep + cost overlay

Caveats

Further Reading

Foundational papers

Textbook references

Related QuanterLab articles

Try it in QuanterLab