← Back to work
LiveModel

F1 Race Predictor

Three forecasters, one honest scorecard. The deliverable is the evaluation, not the prediction.

What this shows: I can test an AI system honestly: no peeking at future data, probabilities that mean what they say, and the result published even when the AI loses.

Outcome

A calibrated probability for every driver, every race, graded against the real finishing order once it exists.

Proof

77 races tested. When it says 70%, it lands about 70%. Claude ran 24% worse than the baseline, kept in, not hidden.

Takeaway

Most of F1’s predictable signal is just where you start. The scorecard mattered more than the model.

Predicted order vs actual

Monaco GP · projected order vs actual finishLIVE

Monaco 2026, as it crossed the line

real result

At Monaco, overtaking is nearly impossible, so grid position usually decides the finish; the model leaned on that and called Antonelli's pole-to-win at 76%. Six cars retired, three our own podium picks (orange): externalities no pre-race data can see.

FINISHcasino climbhairpintunnelpiscineP4 Piastri: we said P7P5 Lawson: we said P10P6 Lindblad: we said P15P7 Gasly: we said P9P8 Albon: we said P11P9 Ocon: we said P17P10 Alonso: we said P21Antonelli: we said P1, finished P11ANTHamilton: we said P2, finished P22HAMHadjar: we said P6, finished P33HADVER: we said P3, then anti-stall at lights out forced him to retireVER · DNFmechanicalLEC: we said P4, then crashed at the final corner, bringing out the safety car and red flagLEC · DNFcrash → safety carNOR: we said P8, then a power-unit failure ended his raceNOR · DNFmechanicalTOP 3 · OUR PRE-RACE ODDSANT finished P1: 76% to win1ANT76% to winHAM finished P2: 54% podium odds2HAM54% podium oddsHAD finished P3: 13% · our P6 pick3HAD13% · our P6 pick6 DNFs · 2 crashes → safety car& red flag · 4 mechanical
ANT · MercedesHAM · FerrariHAD · Red Bulldnf · externality we can't predictrest of the field

Next race

projection · pre-qualifying

Round 8 · 28 Jun 2026

Austrian Grand Prix

Red Bull Ring, Spielberg

PROJECTED PODIUM

P1

Hamilton

Ferrari

47% to win
P2

Antonelli

Mercedes

50% podium
P3

Russell

Mercedes

46% podium

The percentages are the model’s track record over 72 of the 77 past races (the earliest races only train it, so they aren’t scored): how often its pre-qualifying P1 pick went on to win, and its P2 and P3 picks reached the podium.

Why these three

Hamilton won the last round at Barcelona, and the form-only model reads recent form and team pace, so it now makes him the pick for the win, even though Antonelli still leads the championship on 143 points (Hamilton 104, Russell 85). All three have run at the front all season.

What changes Saturday

Grid position is the model's strongest feature, and qualifying has not happened yet. Once the real grid exists, all three forecasters rerun and the win and podium probabilities update before lights out. The form model likes Hamilton, but qualifying pace has belonged to Mercedes all year.

Qualifying call

On this season's one-lap form, the front of the grid is Mercedes: Antonelli (four poles from seven) and Russell (the other three, including Barcelona). Mercedes has taken every pole in 2026, so Hamilton's win case rests on race pace rather than starting position.

ON THE HORIZONR9Britain · 5 Jul·Hamiltonform pickR10Belgium · 19 Jul·Hamiltonform pick

The season so far

stats model · graded

Every 2026 round, the model’s predicted podium against what actually happened. Green means a podium pick landed. Locked in after qualifying, before the race.

RoundOur podiumActual podiumWinner
R1 AustraliaRUSANTLECRUSANTLEC
R2 ChinaRUSANTLECANTRUSHAM×
R3 JapanANTRUSLECANTPIALEC
R4 MiamiANTRUSLECANTNORPIA
R5 CanadaANTRUSNORANTHAMVER
R6 MonacoANTHAMVERANTHAMGAS
R7 BarcelonaANTHAMRUSHAMRUSNOR×

Called the winner in 5 of 7 · landed 13 of 21 podium picks. The wins are easy; the third step is where it’s hard.

At a glance

77
Races tested (2023-26)
3
Forecasters compared
24% worse
Claude's odds vs. just using the grid order
2.78
Places off per driver, on average

How it works

  1. 1

    Gather

    Race and qualifying results for 2023 to 2026 become nine pre-race clues per driver: grid slot, quali gap, recent form, team pace, track history. Strictly nothing from the race being predicted.

  2. 2

    Predict

    Three forecasters fill in the same form: a naive baseline (you finish where you start), a statistical model trained only on past races, and Claude reasoning over a written pre-race brief.

  3. 3

    Grade

    Proper scoring rules (Brier score, log loss, skill vs the baseline) plus calibration curves: when it says 70%, does that happen 70% of the time?

  4. 4

    Track

    Every forecaster is graded race by race across the season, and the next race is always called before lights out, so the prediction is locked in before the result exists.

Honest about it

A prediction is only worth the eval behind it. So I keep score: three forecasters, every race, graded against what actually happened.

How often do we call the winner?

2026 · 7 rounds
012called the winnerAUSRUSCHNANTJPNANTMIAANTCANANTMONANTBARHAMwe picked RussellRussell led, then DNFHAM won from P22026 races · places off the winner
naive baselineour model
Places off our winner pick, race by race. The winner is usually the pole-sitter, so both call it five times in seven. Both miss Barcelona, where Hamilton won from P2; our model also slips in China, the baseline in Canada, where its pick led then retired. Calling the winner is the easy part.
real output

Problem

Prediction posts are easy to fake after the fact, and LLMs make it worse: past seasons sit in their training data, so a strong backtest proves memory, not skill. I wanted calls put on the record before each race, and an evaluation I could actually trust.

Approach

Three forecasters emit the same output, so they compete like for like. The statistical model only sees earlier races, automated tests prove no future data leaks in, and Claude is graded only on races after its training cutoff. The rest of 2026 is the live test: every pick is locked in after qualifying, before the race.

Eval results

Two honest findings. The baseline just predicts the starting grid order; the stats model beat it by about 12% on win probability, while Claude scored 24% worse than it — the kind of negative result most write-ups quietly drop. Almost all the signal is one thing: where you start. Remove grid position and podium error jumps about 20%; remove anything else and nothing moves. On ranking, the stats model lands within 2.78 places of each driver's real finish on average — under three spots off. The plot below is the part I trust most: when the model says 70%, it happens about 70% of the time.

Can you trust the probabilities?

every race, 2023-26
00252550507575100100on the line = trustworthySaid 1%, happened 2% (1002 cases, 95% range 1 to 3%)Said 14%, happened 13% (80 cases, 95% range 7 to 22%)Said 24%, happened 20% (65 cases, 95% range 12 to 31%)Said 34%, happened 37% (67 cases, 95% range 27 to 49%)Said 45%, happened 33% (42 cases, 95% range 21 to 48%)Said 54%, happened 48% (48 cases, 95% range 34 to 62%)Said 65%, happened 74% (50 cases, 95% range 60 to 84%)Said 75%, happened 56% (27 cases, 95% range 37 to 72%)Said 85%, happened 83% (47 cases, 95% range 70 to 91%)Said 94%, happened 92% (24 cases, 95% range 74 to 98%)what the model saidwhat actually happened
spot onclosemissed
When the model says a driver has a 70% shot at the podium, it lands there about 70% of the time. Points sitting on the line mean the numbers mean what they say.

What broke

A free data API silently returned four empty races after rate-limiting, caught by validation, not an error. Grid position 0 means a pit-lane start, which a model reads as better than pole. And the LLM sometimes returns duplicate finishing positions, so the schema rejects loudly and a deterministic repair re-ranks. The lesson that stuck: the eval design mattered more than the model — most of the work was keeping the test fair.

Curious how it scores? Just ask.

🥬Kale Bot

Ask me anything about Cael and his projects. I answer from real sources, and I will tell you if I do not know.