The Information Coefficient (see IC) and most cross-sectional factor measurements use Spearman rank correlation rather than Pearson. This article explains why — the distinction matters more than it first seems — and documents the math behind the choice.
Pearson vs. Spearman
Pearson correlation measures linear association between two raw variables:
Spearman correlation is Pearson applied to the ranks of the two variables (Spearman 1904). Replace X with the rank of X within its sample, Y with the rank of Y, then compute Pearson on the ranks.
Why ranks in cross-sectional finance
Three properties of factor scores and returns make raw Pearson unreliable in the cross-section:
- Heavy tails. A handful of names with huge return moves (Tesla in 2020) dominate Pearson disproportionately. Ranks compress these to rank values that are no more influential than any other.
- Non-linearity. Factor scores often have a monotone but non-linear relationship to returns (top-quintile outperforms, but the relationship within the top quintile is flat). Pearson penalises this; Spearman captures the monotone signal.
- Scale differences across factors. Value scores live on different scales than momentum scores. Ranking puts all factors on the same 1-to-N scale, enabling direct comparison.
Computation
For factor scores X1, ..., Xn and forward returns Y1, ..., Yn:
- Compute rank(Xi) within {X1, ..., Xn}, breaking ties by average rank.
- Compute rank(Yi) within {Y1, ..., Yn}, same tie-breaking.
- Compute Pearson correlation on the rank pairs.
FM103's implementation uses pandas Series rank with default average tie-breaking, then numpy corrcoef.
When Spearman and Pearson disagree
A large gap (Spearman >> Pearson) indicates the relationship is monotone but with outliers driving the Pearson result down. A large gap in the other direction (Pearson >> Spearman) is rarer and indicates the relationship is driven by a few extreme values whose ranks compress them.
In factor analysis, the typical pattern is Spearman ~ 0.05, Pearson ~ 0.02–0.04 — Spearman moderately higher because the cross-section has outliers. If Pearson and Spearman match closely, the factor is acting linearly across the universe.
Significance of small IC values
Mean IC of 0.05 with per-period standard deviation of 0.10 over T=20 periods has t-stat:
Just over the 2-sigma significance threshold. Over T=10 periods the same numbers give t = 1.58 — below significance. Always include sample size in any IC discussion.
Limitations
- Spearman discards magnitude information. The top-ranked name might have an enormous factor score; Spearman treats it as just "rank 1." For some applications this loss is undesirable.
- Tie handling matters with discrete scores. If the factor produces integer scores (e.g., F-Score from 0 to 9), many ties exist; the average-rank handling is conventional but other choices (min-rank, max-rank, ordinal) produce slightly different results.
- Sample size limits. Spearman is unbiased only for moderate-to-large samples. For small N (< 10), the IC estimate is noisy regardless of the metric.
Kendall's tau as an alternative
Kendall's tau is another rank-based correlation. It counts concordant and discordant pairs:
Kendall is more robust to ties and easier to interpret (it's a probability difference). But Spearman is the convention in factor literature because it returns values on the same -1 to +1 scale as Pearson, simplifying comparison. FM103 uses Spearman to match the literature.
Further Reading
Foundational papers
- Spearman, C. (1904). The Proof and Measurement of Association between Two Things. American Journal of Psychology, 15(1), 72–101.
- Cont, R. (2001). Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues. Quantitative Finance, 1(2), 223–236.
Textbook references
- Tsay, R. S. (2010). Analysis of Financial Time Series (3rd ed.). Wiley.
- Campbell, J. Y., Lo, A. W. & MacKinlay, A. C. (1997). The Econometrics of Financial Markets. Princeton University Press.
Related QuanterLab articles
Try it in QuanterLab
Compute both Spearman and Pearson IC for your strategy. A large Spearman-vs-Pearson gap signals that outliers are driving Pearson down — Spearman is the more honest read.