Spearman Rank Correlation in IC

The Information Coefficient (see IC) and most cross-sectional factor measurements use Spearman rank correlation rather than Pearson. This article explains why — the distinction matters more than it first seems — and documents the math behind the choice.

Pearson vs. Spearman

Pearson correlation measures linear association between two raw variables:

ρP(X, Y) = cov(X, Y) / (σX · σY)

Spearman correlation is Pearson applied to the ranks of the two variables (Spearman 1904). Replace X with the rank of X within its sample, Y with the rank of Y, then compute Pearson on the ranks.

Why ranks in cross-sectional finance

Three properties of factor scores and returns make raw Pearson unreliable in the cross-section:

  1. Heavy tails. A handful of names with huge return moves (Tesla in 2020) dominate Pearson disproportionately. Ranks compress these to rank values that are no more influential than any other.
  2. Non-linearity. Factor scores often have a monotone but non-linear relationship to returns (top-quintile outperforms, but the relationship within the top quintile is flat). Pearson penalises this; Spearman captures the monotone signal.
  3. Scale differences across factors. Value scores live on different scales than momentum scores. Ranking puts all factors on the same 1-to-N scale, enabling direct comparison.

Computation

For factor scores X1, ..., Xn and forward returns Y1, ..., Yn:

  1. Compute rank(Xi) within {X1, ..., Xn}, breaking ties by average rank.
  2. Compute rank(Yi) within {Y1, ..., Yn}, same tie-breaking.
  3. Compute Pearson correlation on the rank pairs.

FM103's implementation uses pandas Series rank with default average tie-breaking, then numpy corrcoef.

When Spearman and Pearson disagree

A large gap (Spearman >> Pearson) indicates the relationship is monotone but with outliers driving the Pearson result down. A large gap in the other direction (Pearson >> Spearman) is rarer and indicates the relationship is driven by a few extreme values whose ranks compress them.

In factor analysis, the typical pattern is Spearman ~ 0.05, Pearson ~ 0.02–0.04 — Spearman moderately higher because the cross-section has outliers. If Pearson and Spearman match closely, the factor is acting linearly across the universe.

Significance of small IC values

Mean IC of 0.05 with per-period standard deviation of 0.10 over T=20 periods has t-stat:

t = 0.05 / (0.10 / sqrt(20)) = 0.05 / 0.0224 ≈ 2.24

Just over the 2-sigma significance threshold. Over T=10 periods the same numbers give t = 1.58 — below significance. Always include sample size in any IC discussion.

Limitations

  • Spearman discards magnitude information. The top-ranked name might have an enormous factor score; Spearman treats it as just "rank 1." For some applications this loss is undesirable.
  • Tie handling matters with discrete scores. If the factor produces integer scores (e.g., F-Score from 0 to 9), many ties exist; the average-rank handling is conventional but other choices (min-rank, max-rank, ordinal) produce slightly different results.
  • Sample size limits. Spearman is unbiased only for moderate-to-large samples. For small N (< 10), the IC estimate is noisy regardless of the metric.

Kendall's tau as an alternative

Kendall's tau is another rank-based correlation. It counts concordant and discordant pairs:

τ = (concordant − discordant) / (n choose 2)

Kendall is more robust to ties and easier to interpret (it's a probability difference). But Spearman is the convention in factor literature because it returns values on the same -1 to +1 scale as Pearson, simplifying comparison. FM103 uses Spearman to match the literature.

Further Reading

Foundational papers

  • Spearman, C. (1904). The Proof and Measurement of Association between Two Things. American Journal of Psychology, 15(1), 72–101.
  • Cont, R. (2001). Empirical Properties of Asset Returns: Stylized Facts and Statistical Issues. Quantitative Finance, 1(2), 223–236.

Textbook references

  • Tsay, R. S. (2010). Analysis of Financial Time Series (3rd ed.). Wiley.
  • Campbell, J. Y., Lo, A. W. & MacKinlay, A. C. (1997). The Econometrics of Financial Markets. Princeton University Press.

Related QuanterLab articles

Try it in QuanterLab

Compute both Spearman and Pearson IC for your strategy. A large Spearman-vs-Pearson gap signals that outliers are driving Pearson down — Spearman is the more honest read.

Back to QuanterLab
Report
Loading report...
Article
Loading article...