We Backtested Our Upset Model Against 40 Years of March Madness
2,428 games. 695 upsets. Full transparency.
Most upset prediction models never show you proof. They claim accuracy rates, sell you picks, and hope you don’t check the math. We did the opposite: we ran BANE Score against every NCAA Tournament game from 1985 to 2024 - and we’re publishing every number, including what we get wrong.
The Setup
We took the complete NCAA tournament historical dataset: 2,428 tournament games across 39 played seasons (2020 was cancelled). For each game, we reconstructed what the BANE Score would have been using only pre-tournament regular season data - the same data available before tipoff.
For games from 2003-2024, we used the full 7-factor model with detailed box score statistics (3PT rate, turnovers, tempo, defensive efficiency). For 1985-2002, we used a simplified model based on win margin, conference strength, and Pythagorean expectation - the same approach we’d use if we only had basic stats.
Overall Results
Across all 2,428 games, 695 were upsets (lower seed won) - a 28.6% base upset rate. Here’s how BANE Score performed at different thresholds:
The pattern is clear: as BANE Score increases, upset probability increases. At BANE ≥65, nearly half of all flagged games resulted in upsets - 1.6× the base rate.
Where It Really Shines: Round of 64
The first round is where upsets matter most for brackets - and where BANE performs best.
49.5% hit rate at BANE ≥65 in Round of 64. Nearly 1 in 2 flagged games produced an upset - nearly double the 26.1% base rate.
The Sweet Spot: Seed Matchups
BANE Score excels at identifying competitive underdogs in the 5-12 through 8-9 range. These are the matchups where underlying fundamentals (defense, tempo, turnovers) provide genuine predictive signal:
The 6-vs-11 matchup is BANE’s best: 47.9% upset rate when BANE flags ≥60, versus the 39% base rate. That’s because 11-seeds often include power conference teams that got underseeded - exactly the kind of mismatch BANE’s conference strength and efficiency factors detect.
The Modern Era Advantage
BANE performs significantly better on 2003-2024 data, where we have full box score statistics:
For 2026, we have full box score data and adjusted efficiency ratings for every team. The model will operate at its strongest with complete data on every tournament team.
What We Get Wrong
Transparency means showing failures too. Here are the famous upsets BANE missed:
The pattern is obvious: BANE cannot reliably predict 15-over-2 and 16-over-1 upsets. These happen at rates of 1.3% and 7.1% historically - they’re driven by single-game variance (hot 3PT shooting, foul trouble, one player going nuclear) that no pre-game model can capture.
We could inflate these scores to “catch” them retroactively, but that would mean flagging hundreds of false positives. We’d rather be honest: these are black swan events, and any model claiming to predict them reliably is overfitting to history.
What BANE does catch:
How To Use BANE Score
BANE is a filter, not a betting system. Here’s our framework:
The Honest Bottom Line
BANE Score identifies upset candidates at nearly 2× the base rate in first-round matchups. It excels at the 5-12 through 8-9 seed matchups where defensive efficiency, tempo control, and turnover vulnerability provide genuine predictive signal.
It does not - and cannot - reliably predict once-a-decade Cinderella runs. No model can. But it can tell you which 11-seed has the defensive profile to grind out a 6-seed, which 12-seed plays at a tempo that compresses possessions against a fast 5-seed, and which favorites are running hot on luck.
That’s not a crystal ball. It’s decision support - and it’s what separates informed bracket strategy from coin flips.
Data source: NCAA tournament historical dataset. 2,428 games from 1985-2024 (excluding 2020). BANE Score computed using only pre-tournament regular season statistics available before each game. 2003+ games use the full 7-factor model with box score data. 1985-2002 games use a simplified model with win margin, conference strength, and Pythagorean expectation. No lookahead bias - each game scored using only data available before tipoff.
Full backtest code and raw results are available for review. We believe in showing our work.