A reading list for the honest quant

Every idea behind QuanterLab was borrowed. The pre-registered hypothesis, the zero-edge null test, the refusal to redistribute raw prices, the insistence that a walk-forward is only one kind of validation and a held basket is another — none of that is ours. We read it somewhere, found it convincing, and wired it into code. This article is the bibliography for that work: the books and papers that shaped how the platform thinks, grouped, with a sentence or two on why each one earned its place, and a note on where in QuanterLab the idea shows up.

We cite these works because we stand on them. The through-line is the one Richard Feynman gave us: the easiest person to fool is yourself. A reading list cannot fix that — but knowing the failure modes by name, in the words of the people who first described them, makes them a little harder to walk into.

Why a reading list is the capstone

A tool can only encode the judgment of the people who built it. If you want to understand why a forward test refuses to start before the data exists, or why a Sharpe ratio is treated with suspicion, the honest answer is "go read the source." This is that source list — not a credential, a map of where the ideas came from.

Philosophy of science: how not to fool yourself

Karl Popper — Conjectures and Refutations (1963). Popper argued that science advances not by piling up confirmations but by stating a claim sharp enough that the world could prove it wrong, then trying hard to do exactly that. A strategy that "works on everything you tried" has told you nothing; a claim you committed to before seeing the out-of-sample result can actually fail. This is why Primitives lets you freeze a deterministic, citable prospectus and register it as a hypothesis before the forward test runs — the claim is on the record before the evidence arrives.

Richard Feynman — "Cargo Cult Science" (1974). Feynman's Caltech commencement talk is the one we keep coming back to: the form of rigor without the substance, and his warning that you are the easiest person to fool, so you must bend over backwards to show how you might be wrong. Every honesty mechanism in the platform is an attempt to make self-deception cost something.

Philip Tetlock & Dan Gardner — Superforecasting (2015). Tetlock's long study of forecasters found the good ones share habits more than genius: they think in calibrated probabilities, update in small steps, and treat their own confidence as a number to be checked, not a feeling to be trusted. The forward-test verdict and the Monte-Carlo cone are there to push you toward a probability and a range, not a single hopeful point estimate.

John Ioannidis — "Why Most Published Research Findings Are False" (2005). Ioannidis showed that when many hypotheses are tested, few are truly real, and bias and flexibility creep in, most "significant" findings will be false positives. Backtesting is exactly that environment: thousands of configurations, one that looks great. Naming this problem is the first step to building guards against it.

Markets, luck and self-deception

Nassim Taleb — Fooled by Randomness (2001). Taleb's argument is that a track record can be almost entirely noise, and survivorship makes the noise look like skill — we see the winners and never count the silent graveyard of the ruined. The zero-edge null test in the Forward-Test Autopsy exists because of this: it ranks your realized result against a no-edge process built from your own demeaned returns, asking whether the edge is distinguishable from luck at all.

David Aronson — Evidence-Based Technical Analysis (2006). Aronson took technical analysis seriously enough to subject it to statistics, and his central contribution for our purposes is data-mining bias: when you select the best rule from a large set, its in-sample performance is upward-biased by the selection itself. The platform's bias toward point-in-time data and out-of-sample disposition labels is a direct response to the trap he documented.

López de Prado: the methodological anchors

If one body of work sits underneath the platform's validation machinery, it is this one. We treat these four as the load-bearing references.

Marcos López de Prado — Advances in Financial Machine Learning (2018). The book that connects machine-learning rigor to finance's specific pathologies — leakage, non-IID samples, the difference between a backtest and a research process. Its discipline around not letting future information touch a model informs why the executor reads regimes as-of each bar and admits filings only by their SEC acceptance date.

Bailey, Borwein, López de Prado & Zhu — "Pseudo-Mathematics and Financial Charlatanism" (2014). Their result is blunt: with enough trials, you can almost always find a backtest with any Sharpe ratio you like, so an unadjusted backtest Sharpe is nearly meaningless without knowing how many configurations were tried. This is why a single good-looking number is never the verdict here.

Bailey & López de Prado — "The Deflated Sharpe Ratio" (2014). The constructive companion: a way to discount a Sharpe ratio for the number of trials, the track length, and the non-normality of returns. The skill-band reads in the autopsy — weak, plausible, strong, very strong — are our plain-language gesture at the same idea, ranking the result against a null rather than reporting it raw.

Bailey, Borwein, López de Prado & Zhu — "The Probability of Backtest Overfitting" (2015/2017). Their combinatorially-symmetric cross-validation framework quantifies how often the in-sample best is out-of-sample mediocre — the precise failure a walk-forward is meant to expose. The platform's insistence on genuine out-of-sample windows, and its disposition vocabulary (walk-forward over forward-OOS over held-OOS), is a practical answer to the question they formalized.

Two more that belong with this group

Harvey & Liu, "…and the Cross-Section of Expected Returns" (2015), made the multiple-testing problem concrete for factor research — most published factors don't clear a multiple-testing-adjusted bar. Lo, "The Statistics of Sharpe Ratios" (2002), showed how serial correlation and annualization distort the Sharpe you report. And Pardo's Evaluation and Optimization of Trading Strategies (2008) is the practitioner's manual for walk-forward analysis itself; Sullivan, Timmermann & White (1999) gave the reality-check bootstrap for data-snooping. We lean on all of them.

Tools for thought: the instrument vision

The other half of QuanterLab is not statistics — it is the belief that the interface changes what you can think. Primitives is a node-graph canvas, closer to a video editor for research than a spreadsheet, and that choice has a lineage.

Bret Victor — "Inventing on Principle" (2012), "Media for Thinking the Unthinkable" (2013), "Up and Down the Ladder of Abstraction" (2011). Victor's argument is that creators need an immediate, visible connection to what they are making — that ideas you cannot see or manipulate directly are ideas you cannot really reason about. The whole point of wiring a strategy as a visible DAG, watching results flow through it, and being able to dock an autopsy onto any forward tester is to make the research seeable.

Seymour Papert — Mindstorms (1980). Papert's constructionism holds that people learn deepest by building things they care about and debugging them in the open. A canvas where you assemble a strategy block by block, and a "bug" is a wire that produces a result you can inspect, is constructionism applied to quantitative research.

The craft of good work

Robert Pirsig — Zen and the Art of Motorcycle Maintenance (1974). Pirsig's meditation on Quality — the care that is visible in work done well versus work merely completed — is why the platform sweats details that no benchmark rewards: deterministic outputs, reproducible seeds, a prospectus that re-derives byte-for-byte from its circuit.

Christopher Alexander — The Timeless Way of Building (1979). Alexander distinguished structures that are merely functional from ones that are alive — that fit their use so well they feel inevitable. It is an aspiration more than a metric, and an honest one: we are not there, but it is the direction we point the tool.

Key points
  • None of QuanterLab's ideas are original — this list credits where they came from, and the platform's mechanisms are implementations of them.
  • The philosophy group (Popper, Feynman, Tetlock, Ioannidis) is about the universal failure mode: fooling yourself.
  • The López de Prado anchors (AFML, Pseudo-Mathematics, Deflated Sharpe, Backtest Overfitting) underpin why a single good backtest number is never trusted on its own.
  • The tools-for-thought group (Victor, Papert) explains why Primitives is a visible node-graph rather than a black box.
  • The craft group (Pirsig, Alexander) is the quieter standard: care and aliveness, not just "it runs."

A closing note

We did not invent these ideas, and we want to be honest about that. What QuanterLab adds is plumbing — turning a pre-registration habit into a button, a multiple-testing warning into a skill band, a point-in-time discipline into an executor that physically cannot peek at the future. The thinking is theirs. If you read only one item on this list, read Feynman's 1974 talk; if you read two, add López de Prado's 2018 book. The rest will make more sense once those two have done their work on you. And if some passage here makes you trust a result less than you did this morning, the list has done its job.

Further Reading

Foundational papers

  • Popper, K. R. (1963). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge & Kegan Paul.
  • Feynman, R. P. (1974). Cargo Cult Science (Caltech commencement address). Engineering and Science, 37(7), 10–13.
  • Tetlock, P. E. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.
  • Ioannidis, J. P. A. (2005). Why Most Published Research Findings Are False. PLoS Medicine, 2(8), e124.
  • Bailey, D. H., Borwein, J. M., López de Prado, M. & Zhu, Q. J. (2014). Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance. Notices of the AMS, 61(5), 458–471.
  • Bailey, D. H. & López de Prado, M. (2014). The Deflated Sharpe Ratio: Correcting for Selection Bias, Backtest Overfitting, and Non-Normality. Journal of Portfolio Management, 40(5), 94–107.
  • Bailey, D. H., Borwein, J. M., López de Prado, M. & Zhu, Q. J. (2017). The Probability of Backtest Overfitting. Journal of Computational Finance, 20(4), 39–69.
  • Harvey, C. R. & Liu, Y. (2015). Backtesting. Journal of Portfolio Management, 42(1), 13–28.
  • Lo, A. W. (2002). The Statistics of Sharpe Ratios. Financial Analysts Journal, 58(4), 36–52.
  • Sullivan, R., Timmermann, A. & White, H. (1999). Data-Snooping, Technical Trading Rule Performance, and the Bootstrap. Journal of Finance, 54(5), 1647–1691.
  • Victor, B. (2012). Inventing on Principle (talk, CUSEC 2012). worrydream.com.
  • Victor, B. (2013). Media for Thinking the Unthinkable (talk). worrydream.com.
  • Victor, B. (2011). Up and Down the Ladder of Abstraction. worrydream.com.

Textbook references

  • López de Prado, M. (2018). Advances in Financial Machine Learning. Wiley.
  • Pardo, R. (2008). The Evaluation and Optimization of Trading Strategies (2nd ed.). Wiley.
  • Aronson, D. R. (2006). Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals. Wiley.
  • Taleb, N. N. (2001). Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets. Random House.
  • Papert, S. (1980). Mindstorms: Children, Computers, and Powerful Ideas. Basic Books.
  • Pirsig, R. M. (1974). Zen and the Art of Motorcycle Maintenance: An Inquiry into Values. William Morrow.
  • Alexander, C. (1979). The Timeless Way of Building. Oxford University Press.
  • Popper, K. R. (1963). Conjectures and Refutations: The Growth of Scientific Knowledge. Routledge & Kegan Paul.
  • Tetlock, P. E. & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.

Related QuanterLab articles

Try it in QuanterLab

Try it in QuanterLab: open Primitives, wire any small strategy, and before you run the forward test, generate its prospectus and freeze it as a hypothesis — Popper's idea as a button. Then dock a Forward-Test Autopsy onto the result and read the skill band: it ranks your number against a zero-edge null, which is López de Prado's and Taleb's idea as a panel. Two clicks, two centuries of borrowed thinking.

Back to QuanterLab
Report
Loading report...
Article
Loading article...