F1 Race Predictor
Three forecasters, one honest scorecard. The deliverable is the evaluation, not the prediction.
What this shows: I can test an AI system honestly: no peeking at future data, probabilities that mean what they say, and the result published even when the AI loses.
A calibrated probability for every driver, every race, graded against the real finishing order once it exists.
77 races tested. When it says 70%, it lands about 70%. Claude ran 24% worse than the baseline, kept in, not hidden.
Most of F1’s predictable signal is just where you start. The scorecard mattered more than the model.
Predicted order vs actual
Monaco 2026, as it crossed the line
real resultAt Monaco, overtaking is nearly impossible, so grid position usually decides the finish; the model leaned on that and called Antonelli's pole-to-win at 76%. Six cars retired, three our own podium picks (orange): externalities no pre-race data can see.
Next race
projection · pre-qualifyingRound 8 · 28 Jun 2026
Austrian Grand Prix
Red Bull Ring, Spielberg
PROJECTED PODIUM
Hamilton
Ferrari
Antonelli
Mercedes
Russell
Mercedes
The percentages are the model’s track record over 72 of the 77 past races (the earliest races only train it, so they aren’t scored): how often its pre-qualifying P1 pick went on to win, and its P2 and P3 picks reached the podium.
Hamilton won the last round at Barcelona, and the form-only model reads recent form and team pace, so it now makes him the pick for the win, even though Antonelli still leads the championship on 143 points (Hamilton 104, Russell 85). All three have run at the front all season.
Grid position is the model's strongest feature, and qualifying has not happened yet. Once the real grid exists, all three forecasters rerun and the win and podium probabilities update before lights out. The form model likes Hamilton, but qualifying pace has belonged to Mercedes all year.
On this season's one-lap form, the front of the grid is Mercedes: Antonelli (four poles from seven) and Russell (the other three, including Barcelona). Mercedes has taken every pole in 2026, so Hamilton's win case rests on race pace rather than starting position.
The season so far
stats model · gradedEvery 2026 round, the model’s predicted podium against what actually happened. Green means a podium pick landed. Locked in after qualifying, before the race.
| Round | Our podium | Actual podium | Winner |
|---|---|---|---|
| R1 Australia | RUSANTLEC | RUSANTLEC | ✓ |
| R2 China | RUSANTLEC | ANTRUSHAM | × |
| R3 Japan | ANTRUSLEC | ANTPIALEC | ✓ |
| R4 Miami | ANTRUSLEC | ANTNORPIA | ✓ |
| R5 Canada | ANTRUSNOR | ANTHAMVER | ✓ |
| R6 Monaco | ANTHAMVER | ANTHAMGAS | ✓ |
| R7 Barcelona | ANTHAMRUS | HAMRUSNOR | × |
Called the winner in 5 of 7 · landed 13 of 21 podium picks. The wins are easy; the third step is where it’s hard.
At a glance
- 77
- Races tested (2023-26)
- 3
- Forecasters compared
- 24% worse
- Claude's odds vs. just using the grid order
- 2.78
- Places off per driver, on average
How it works
- 1
Gather
Race and qualifying results for 2023 to 2026 become nine pre-race clues per driver: grid slot, quali gap, recent form, team pace, track history. Strictly nothing from the race being predicted.
- 2
Predict
Three forecasters fill in the same form: a naive baseline (you finish where you start), a statistical model trained only on past races, and Claude reasoning over a written pre-race brief.
- 3
Grade
Proper scoring rules (Brier score, log loss, skill vs the baseline) plus calibration curves: when it says 70%, does that happen 70% of the time?
- 4
Track
Every forecaster is graded race by race across the season, and the next race is always called before lights out, so the prediction is locked in before the result exists.
Honest about it
A prediction is only worth the eval behind it. So I keep score: three forecasters, every race, graded against what actually happened.
How often do we call the winner?
2026 · 7 roundsProblem
Prediction posts are easy to fake after the fact, and LLMs make it worse: past seasons sit in their training data, so a strong backtest proves memory, not skill. I wanted calls put on the record before each race, and an evaluation I could actually trust.
Approach
Three forecasters emit the same output, so they compete like for like. The statistical model only sees earlier races, automated tests prove no future data leaks in, and Claude is graded only on races after its training cutoff. The rest of 2026 is the live test: every pick is locked in after qualifying, before the race.
Eval results
Two honest findings. The baseline just predicts the starting grid order; the stats model beat it by about 12% on win probability, while Claude scored 24% worse than it — the kind of negative result most write-ups quietly drop. Almost all the signal is one thing: where you start. Remove grid position and podium error jumps about 20%; remove anything else and nothing moves. On ranking, the stats model lands within 2.78 places of each driver's real finish on average — under three spots off. The plot below is the part I trust most: when the model says 70%, it happens about 70% of the time.
Can you trust the probabilities?
every race, 2023-26What broke
A free data API silently returned four empty races after rate-limiting, caught by validation, not an error. Grid position 0 means a pit-lane start, which a model reads as better than pole. And the LLM sometimes returns duplicate finishing positions, so the schema rejects loudly and a deterministic repair re-ranks. The lesson that stuck: the eval design mattered more than the model — most of the work was keeping the test fair.