The easiest person to fool is yourself, and a backtest is one of the most efficient self-fooling machines ever built. Run an idea over twenty years of history, twist the knobs until the equity curve points up, and the result feels earned. Most of the time it isn't. The single most common reason a backtest lies is look-ahead bias: somewhere in the pipeline, the strategy quietly used information it could not have had at the moment it was supposedly deciding.
In Primitives, the defence against this is not a checkbox or a warning you can ignore. It is the timeline — the in-sample, out-of-sample and forward windows arranged around a single date called the anchor. The timeline is not decoration under your circuit. It is a leakage detector, and on a bad day it will refuse to let you press Run. This article explains what it is detecting and why we built it to say no.
What look-ahead bias actually is
Look-ahead bias is using, at time t, any information that was not yet available at time t. It hides in places that feel innocent. You optimise a parameter over 2015–2024 and then "test" it on 2020 — but 2020 was inside the data you optimised on, so the test is rigged. You screen for the universe using today's S&P 500 list and backtest it through 2008 — but several of those companies weren't in the index then, and some that were have since gone bankrupt and vanished from the data entirely. That second case has its own name, survivorship bias, and it is just look-ahead bias wearing a different coat: you used membership knowledge from the future.
The honest standard is point-in-time (PIT): every decision may only see data that genuinely existed on or before the date of that decision. This is easy to state and very hard to enforce by discipline alone, because the leaks are small, plausible, and invisible in the final number. López de Prado (2018) makes the argument bluntly across Advances in Financial Machine Learning: most backtests fail not because the idea is bad but because the evaluation is contaminated, and a clean number you can't reproduce point-in-time is worth less than an honest one that's lower.
The anchor: a "today" you would have deployed from
Everything on the Primitives timeline pivots on one date, the anchor. Read it as a virtual today: the day you imagine standing on, with no knowledge of anything that came after. Everything to the left of the anchor is where the strategy is allowed to fit, tune and validate. Everything to the right is genuine out-of-sample — data the strategy never touched while it was being built, used only to ask "and then what actually happened?"
This is enforced in the engine, not just drawn on screen. When a run starts, the anchor is stored as a per-run context (a thread-local as_of date) that every price fetch and every universe lookup in that run inherits automatically. The comment in the executor states the design intent plainly: when an anchor is set, no bar dated after it can ever enter the circuit — both index membership and price history are clipped — so look-ahead is structurally impossible rather than merely discouraged. The anchor is also clamped to the last twenty years up to today, so you can't anchor into a data desert or, more importantly, into the future.
The anchor turns a vague intention ("don't cheat") into a hard line on a chart. Left of it: in-sample fitting and walk-forward validation. Right of it: the forward test, the only part of the run that can surprise you. A strategy that looks brilliant left of the anchor and falls apart right of it has just told you the truth about itself.
The leakage guard: the platform refuses to run
Here is the part that matters most. Primitives knows which stages of your circuit fit parameters — the signal definition and the various optimizers (static, per-regime, dynamic-mean, regression). It calls those train windows. It also knows which stages judge — a walk-forward's out-of-sample slices, and the forward backtest sitting to the right of the anchor. Those are test windows.
Before every run, the timeline computes the overlap between every train window and every test window. If a train window bleeds even one day into a test window, that overlap is leakage: the model fitted on data it is later "tested" on. The platform collects these into findings, and if any exist, the Run button is disabled. The tooltip is deliberately unambiguous: Run is blocked — a train window bleeds into a test window (look-ahead leakage). The Prospectus button, which generates the written write-up of your circuit, is blocked on the same condition. You cannot quietly export a contaminated result.
By default this almost never fires, because the timeline lays your research chain out in the only honest order: signal definition, then optimizer, then walk-forward, then the anchor, then the forward test — each stage acting on data strictly older than the next. The leakage guard only trips when you deliberately drag or resize a train bar forward, into territory it is supposed to be judged on. That is the whole point: the resize handles and the guard ship together. The platform lets you move the windows — it just refuses to pretend the result is clean when you've moved them somewhere dishonest.
There is an escape hatch, and it is honest about being one. A Run anyway control lets you acknowledge the leakage and proceed with known look-ahead bias — and that acknowledgement resets the moment you change the circuit's structure, so it can never silently carry over into a different experiment. The default is to stop you; overriding is a deliberate, logged choice, not a shrug.
Point-in-time re-selection: no hindsight in the basket
A timeline only protects you if the data flowing through it is also point-in-time. Two places in Primitives are easy to get wrong, and both are handled in the engine.
Universe membership. When you screen the S&P 500 as of some past anchor, Primitives reconstructs the index's membership for that date by reverse-applying the historical change-log, rather than handing you today's list. The result carries a pit_resolved flag, and when a true point-in-time list isn't available it tells you so and labels the run with the survivorship caveat instead of hiding it. Honesty about a limitation beats a clean-looking number you can't trust.
Re-selection at each rebalance. In a portfolio forward test, the basket isn't chosen once with the benefit of hindsight and then held. At each rebalance date the selection sub-graph is re-run as of that date — the same screen, re-evaluated on what was knowable then — and intersected with the index membership for that date, so a name that had been dropped from the index by the rebalance can't be held through it. The forward signal tester splits the price series once at the anchor, fetches a warm-up window before the anchor so the indicators are valid the moment you cross it, and treats the entire post-anchor leg as the out-of-sample test. None of this asks for your trust. It is built so that trusting it isn't required.
Why this is worth the friction
It would be easier to ship a builder that always runs and never argues. We didn't, because the statistics are unforgiving. Bailey, Borwein, López de Prado and Zhu (2014) showed how readily backtest overfitting manufactures impressive-looking strategies out of noise: try enough configurations and a spectacular in-sample Sharpe is essentially guaranteed, meaning nothing. Harvey and Liu (2015) made the parallel point for the published factor literature — once you account for how many ideas were tested, most "discoveries" don't clear the bar. Leakage is the same disease at the level of a single run: it inflates the in-sample number and tells you nothing about the future.
The timeline can't make your idea good. Nothing can do that. What it can do is stop the most common way a perfectly ordinary idea gets dressed up as a great one — by letting the future leak backwards into the past. When the platform blocks a run, it isn't being difficult. It's being the colleague who looks at your chart and asks the one question you didn't want to hear.
- The anchor is a virtual "today." Left of it the strategy may fit and validate; right of it is genuine out-of-sample. The engine clips both index membership and price history to the anchor, so post-anchor data structurally cannot enter the run.
- The leakage guard blocks the run. If a train window (signal definition, any optimizer) overlaps a test window (walk-forward OOS slices, the forward backtest), Run and Prospectus are disabled with an explicit look-ahead message.
- Run anyway is honest about itself. You can override with acknowledged bias, and the acknowledgement resets on any structural change to the circuit.
- Point-in-time everywhere. S&P 500 membership is reconstructed for the anchor date (with a survivorship flag when it can't be), and baskets are re-screened as of each rebalance, never with hindsight.
- It can't make an idea good. It can only stop the easiest way to fool yourself into thinking a mediocre one is good.
None of this guarantees a strategy will work. It guarantees something smaller and more useful: that the number you're looking at wasn't quietly built from the future. That's not a promise about markets. It's a promise about your own evidence — and it's the one we can actually keep.
Further Reading
Foundational papers
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
- Bailey, D. H., Borwein, J. M., López de Prado, M. & Zhu, Q. J. (2017). The Probability of Backtest Overfitting. Journal of Computational Finance, 20(4), 39–69.
- Harvey, C. R. & Liu, Y. (2015). Backtesting. Journal of Portfolio Management, 42(1), 13–28.
Textbook references
- López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
Related QuanterLab articles
Try it in QuanterLab
Try it in QuanterLab. Open Primitives and build a small circuit: a Universe, a screen, a Signal, an Optimizer and a Walk-Forward, with the anchor set a couple of years back. Run it once with the default honest sequencing — everything left of the anchor, the forward test on the right. Then grab the Optimizer's train bar and drag it forward into the walk-forward's out-of-sample window. Watch the leakage finding appear and the Run button lock with the look-ahead message. That red band is the platform showing you, in one gesture, exactly how a backtest learns to lie — and refusing to print the result until you fix it or knowingly override it.