Kale Bot
Grounded answers about my work, or an honest “I don't know”.
What this shows: I can put a chatbot on the open internet that answers only from approved sources, stays on topic, and refuses everything else.
Recruiters can ask instead of dig. It retrieves the relevant notes from a curated vault and answers from those, in plain language.
On the eval set it declined every off-topic question and adversarial trap, and answered on-topic questions from the right note. Nothing made up, nothing leaked.
The bug was in the scorer, not the bot. Measuring an AI honestly turned out to be half the work.
How it works · live
Live snapshot · measured
What one question retrieves
“Did the AI beat the baseline in F1?”
- F1 Race Predictor0.77site-and-projects.md
- Claude lost to the naive baseline0.76f1-race-predictor.md
- Q&A bank0.75f1-race-predictor.md
- Why it exists0.74f1-race-predictor.md
Eval scorecard
- Retrieval hit-rateright note in top-495%
- Grounded answerson-topic, from a note95%
- Correct declinesoff-topic, traps + unknowns100%
48-prompt set · 20 on-topic, 28 off-topic & traps
A couple it won’t answer
What is Cael's salary expectation?
Salary questions go straight to Cael; please email him at caelcarmont@gmail.com.
declinedWho is Cael dating?
I'm not able to discuss Cael's private life, but happy to help with his projects, skills, or working style instead.
declinedThis is the live system, not a mock-up: real search over my notes (Gemini embeddings) and a real scorecard from a 48-prompt eval. Measured, and rerun with one command.
Built to stream
The answer appears as it’s written, token by token, instead of landing in a block after a pause. The server opens a streaming connection to Claude, relays the text in small chunks over plain HTTP, and the widget paints each piece the moment it arrives.
- 1Claude streams
Tokens emitted as they're generated
messages.stream - 2Server relays
Plain-text chunks, no buffering
ReadableStream - 3UI renders live
Each token painted on arrival
render on arrival
How it works
- 1
Guard the input
A public endpoint can't trust what arrives: messages are validated, capped in size, and only the recent conversation is kept.
- 2
Retrieve
The question is embedded with Gemini and compared to every chunk of the curated vault; only the closest few are pulled (retrieval-augmented generation).
- 3
Ground the answer
Claude answers only from those retrieved chunks, so the reply comes from real notes rather than memory.
- 4
Decline on a miss
If nothing clears a relevance floor, or the ask is off-topic or private, it declines in one line and points to my email.
Problem
Recruiters ask the same questions about my work, and a plain chatbot would happily make them up, or get talked into being a free general assistant. I wanted one that only speaks from real sources, stays on topic, and admits when it can't answer.
Approach
Each curated note (the same vault Second Brain manages) is chunked and embedded with Gemini; a question is embedded the same way, the closest chunks are pulled, and Claude answers only from those (retrieval-augmented generation). The rules are fixed: my work is in scope, my private life isn't, and no message can change the prompt or reveal it.
Eval results
A 48-prompt set covers on-topic, out-of-scope, and adversarial questions: 95% retrieval hit-rate, 95% of answers grounded in the right note, 100% of traps and unknowns declined. The one miss was an honest one, and every number reruns with one command.
What broke
The first eval reported a dismal 38% decline rate. That was the scorer, not the bot: a keyword matcher mis-graded answers, and the small-model judge that replaced it repeated the mistake until a bigger eval caught it. The measurement was wrong more often than the bot.
Learnings
Declining is a feature: a grounded bot that refuses what it shouldn't answer beats an eager one that makes things up. And the cheapest guardrail is scope: a bot that only knows four projects is far harder to misuse than one wired to everything.