The receipts — a real sample of picks the agents made
Companion to "I benched the betting agent." A randomized sample of actual decisions, with reasoning, sizing, and outcomes — and the reason a losing W/L record still finished at +128%.
This is the appendix to I benched the betting agent. Here's what replaced it.
If you want to see what the agents actually do on a game-by-game basis — what they pick, how big they go, what they pass on — this page has the receipts.
Every game on this page is a real MLB game on a real date. The teams, dates, odds, and outcomes can be verified on any standard sports source — baseball-reference, ESPN, MLB.com, the bookmakers themselves. Nothing here is composite, illustrative, or rounded for narrative effect. If something looks off, you can pull up the box score and check.
Worth saying out loud why I'm being borderline-pedantic about this. Fabricating returns by backdating trades to convenient prices is a real thing that has happened in finance — Bernie Madoff ran an entire floor of Wall Street whose actual job was picking moments when a stock was cheap to "buy" and moments when it was high to "sell," manufacturing decades of fictional profit out of thin air. "Go pull the public box score and verify the outcome yourself" is the absolute floor of trustworthy reporting on returns. I'd rather clear that floor cleanly than hover above it.
Dates covered in the tables below: April 14, 15, 17, 18, 19, 20, 21, 23, 24, and 25, 2026. Each row's date is a clickable footnote (the small superscript letter) that links to the ESPN MLB scoreboard for that day — every box score from that date is on the linked page, so you can find the specific game and verify the outcome yourself. I haven't linked individual game pages because that would require fabricating game IDs I don't have; the scoreboard URLs are real and stable.
A few things to know before the tables:
- Stake is the dollar amount the agent put down on the pick. Each agent started with a $1,000 paper-money balance.
- Edge is the gap between the engine's win-probability estimate and the bookmaker's implied probability. That's the agent's reason to act.
- Profile / strategy details are intentionally omitted. This isn't an attempt to hide the math — the engine is one shared model — but per-agent lever configurations are part of what makes the game a game.
The headline you should remember from the main post
Over fourteen days, Big Jake placed 88 bets. He won 40 and lost 48. That's a losing W/L record — 45.5%, worse than a coin flip.
His paper-money bankroll finished at +$1,280.74. That's +128.1% on a $1,000 starting balance. More than doubled.
Both numbers are real.
The reason those two facts can sit next to each other without contradicting each other is stake sizing. Big Jake's lever set sized up on high-edge plays where the model was very sure, and sized down on lower-conviction ones. When he won, he tended to win bigger; when he lost, he tended to lose smaller. The arithmetic of "40 wins covered 48 losses with $1,280 to spare" only works if the wins were not the same size as the losses.
That sentence — win rate is not how you win at this; sizing is — is enough of its own topic that it gets its own post later in the series. For now, look at the sample below with that lens.
A randomized sample across all eleven agents
Below are ten picks pulled at random from across all agents in the window. They're a flavor of what the agents do — not a full record, not cherry-picked, just a snapshot.
| # | Date | Agent | Game | Pick | Odds | Stake | Edge | Winner | P&L | Reason to pick |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2026-04-25 j | Tommy-B | Seattle @ St. Louis | St. Louis (home) | 2.34 | $17.58 | 15.1% | Seattle | ❌ −$17.58 | Edge 15.1% ≥ min; Kelly 2.00% |
| 2 | 2026-04-17 c | Jenny Bases | Kansas City @ NY Yankees | Kansas City (away) | 2.61 | $18.47 | 3.8% | NY Yankees | ❌ −$18.47 | Edge 3.8% ≥ min (posture +25%); Kelly 1.37% |
| 3 | 2026-04-20 f | Big Jake | Philadelphia @ Chi Cubs | Chi Cubs (home) | 1.95 | $100.00 | 6.9% | Chi Cubs | ✅ +$93.05 | Edge 6.9% ≥ min (posture +25%); Kelly 3.70% |
| 4 | 2026-04-19 e | Tommy Four-Seam | San Francisco @ Washington | Washington (home) | 2.28 | $23.01 | 16.3% | Washington | ✅ +$28.40 | Edge 16.3% ≥ min; Kelly 2.00% |
| 5 | 2026-04-18 d | Maria Cleanup | Kansas City @ NY Yankees | Kansas City (away) | 2.45 | $34.83 | 5.3% | NY Yankees | ❌ −$34.83 | Edge 5.3% ≥ min (posture +25%); Kelly 2.15% |
| 6 | 2026-04-21 g | Tommy-B | Cincinnati @ Tampa Bay | Tampa Bay (home) | 2.02 | $20.15 | 21.3% | Cincinnati | ❌ −$20.15 | Edge 21.3% ≥ min; RAG ↓ sizing; Kelly 2.00% |
| 7 | 2026-04-21 g | Jenny Bases | Milwaukee @ Detroit | Detroit (home) | 1.95 | $20.23 | 4.9% | Milwaukee | ❌ −$20.23 | Edge 4.9% ≥ min; Kelly 1.82% |
| 8 | 2026-04-24 i | Calibrated | Detroit @ Cincinnati | Cincinnati (home) | 2.21 | $89.79 | 6.6% | Cincinnati | ✅ +$104.68 | Edge 14.3% post-RAG ≥ min; Kelly 5.00% |
| 9 | 2026-04-15 b | Jenny Bases | Miami @ Atlanta | Miami (away) | 2.52 | $47.20 | 8.5% | Atlanta | ❌ −$47.20 | Edge 8.5% ≥ min (posture +25%); RAG ↓ sizing; Kelly 3.52% |
| 10 | 2026-04-23 h | Sofia Curveball | Milwaukee @ Detroit | Milwaukee (away) | 3.08 | $66.46 | 11.9% | Detroit | ❌ −$66.46 | Edge 11.9% ≥ min; Kelly 4.13% |
Look at the stake column before anything else
This is where the sizing story shows up at the row level.
- Pick #3, Big Jake — $100 on a 6.9% edge. The largest stake in the sample, on a moderate edge. Won.
- Pick #8 — $89.79 on a 6.6% edge. Big stake, similar conviction. Won.
- Pick #6, Tommy-B — $20.15 on a 21.3% edge. The highest-edge play in the whole sample sized at one-fifth of Big Jake's #3. Lost.
The edge column tells you whether the agent should play. The stake column tells you how hard. Different agents — and different lever sets within the same agent type — answer the second question very differently. That's where the +128% vs. −13% spread on the main post's leaderboard comes from.
Pick categories
The reasons fall into a small set of buckets, the same way the skip reasons do further down. The two probabilities the agent compares — the engine's model % estimate and the bookmaker's implied market % — give the edge; the edge above an agent's personal minimum is the trigger to act; sizing modifiers (posture, RAG) decide how hard.
- Edge above threshold. The model thinks the side wins more often than the market implies, by enough to clear that agent's personal minimum. Different agents have different bars. This is the baseline reason on every pick.
- Posture-modified edge. Some lever sets carry a daily-volume posture — e.g. "+25% daily limit" — that nudges the agent to take more positions when conditions are favorable. Picks 2, 3, 5, and 9 in the sample fired with that modifier.
- RAG-adjusted sizing. When the unstructured-news layer surfaces something material — bullpen injury, lineup return, ballpark weather — the agent adjusts stake size on top of the base Kelly fraction. Picks 6 and 9 were sized down by 1% off the news read; pick 8's edge was raised to 14.3% from a base 6.6% by RAG context confirming the model's view.
Decision-time numbers, pick-by-pick (model % vs. market %, the gap the agent acted on):
| # | Pick | Model % | Market % | Edge |
|---|---|---|---|---|
| 1 | SEA @ STL — home | 57.6% | 42.5% | 15.1% |
| 2 | KC @ NYY — away | 42.0% | 38.1% | 3.8% |
| 3 | PHI @ CHC — home | 58.0% | 51.1% | 6.9% |
| 4 | SF @ WSH — home | 60.0% | 43.7% | 16.3% |
| 5 | KC @ NYY — away | 46.0% | 40.7% | 5.3% |
| 6 | CIN @ TB — home | 70.3% | 49.0% | 21.3% |
| 7 | MIL @ DET — home | 55.8% | 50.9% | 4.9% |
| 8 | DET @ CIN — home | 51.5% | 44.9% | 6.6% (→ 14.3% post-RAG) |
| 9 | MIA @ ATL — away | 47.9% | 39.4% | 8.5% |
| 10 | MIL @ DET — away | 44.2% | 32.3% | 11.9% |
The two RAG context excerpts the agents logged at decision time, for completeness:
- Pick 6: "The Rays have lost two pitchers from their bullpen depth with Englert's new injury…"
- Pick 9: "The return of Michael Harris II from the paternity list is the most significant…"
These aren't post-hoc explanations written for a blog post — they're the actual rationale the agent's brain wrote into the decision log at the moment it acted.
Ten the agents passed on
These are decisions where an agent looked at a game, considered a side, and decided not to act. The skips are the half of the story that doesn't usually get shown.
| # | Date | Agent | Game | Considered | Odds | Edge | Reason for skip |
|---|---|---|---|---|---|---|---|
| 1 | 2026-04-21 g | Big Jake | NY Yankees @ Boston | Home | 1.84 | 0.8% | Edge 0.8% below min 3.0% |
| 2 | 2026-04-15 b | Sofia Curveball | LA Angels @ NY Yankees | Home | 1.59 | −11.0% | Edge −11.0% below min 3.0% |
| 3 | 2026-04-18 d | Jake-B | Cincinnati @ Minnesota | Home | 1.63 | −1.7% | Edge −1.7% below min 3.0% |
| 4 | 2026-04-15 b | Coach Cal | Texas @ Athletics | Away | 1.93 | −9.6% | Edge below min 3.0% |
| 5 | 2026-04-19 e | Sofia Curveball | Atlanta @ Philadelphia | Away | 2.08 | −1.6% | Edge −1.6% below min 3.0% |
| 6 | 2026-04-24 i | Sofia Curveball | Minnesota @ Tampa Bay | Home | 1.04 | −37.7% | Edge −37.7% below min 3.0% |
| 7 | 2026-04-19 e | Tommy-B | Detroit @ Boston | Home | 1.74 | 0.7% | Edge 0.7% below min 4.0% |
| 8 | 2026-04-18 d | Calibrated | Texas @ Seattle | Home | 1.83 | 1.5% | Daily exposure 19.2% ≥ limit 18.8% |
| 9 | 2026-04-14 a | Jenny Bases | NY Mets @ LA Dodgers | Away | 2.46 | −14.4% | Edge −14.4% below min 3.0% |
| 10 | 2026-04-14 a | Rosa Longshot | San Francisco @ Cincinnati | Home | 1.97 | 3.5% | Edge 3.5% below min 5.0% |
The reasons fall into four buckets that show up over and over:
- Edge below threshold. The agent saw value, but the gap between model and market wasn't wide enough to clear its personal bar.
- Negative edge. The model thinks the market is right, or has the wrong side. No bet.
- Daily exposure cap. The agent was already at its risk limit for the day. Even a good edge gets skipped.
- Timing strategy. Some agents wait until late in the day for closing-line value. A pick available at noon may not be available by the time the agent acts.
A betting agent that takes every game with positive edge is a different animal from one that only takes 5%+ edges with daily exposure caps. The picks above are the yes answers; these are the no answers. Both are decisions, both come from the same brain, both are part of what makes one agent +128% and another −13%.
Coming up
Stake sizing — why a 40-of-88 record and a +128.1% bankroll can be true on the same line — is its own post later in the series. So is the A/B testing methodology behind the variants. Both deserve a deeper look than they're getting here. This page is the receipts; those posts will be the why.
Sources for the result column
Each superscript letter in the tables above links to the ESPN MLB scoreboard for that day. Every box score for the date is on the linked page; find the matching team matchup to verify the outcome. Listed here for completeness:
- a — April 14, 2026 · ESPN scoreboard
- b — April 15, 2026 · ESPN scoreboard
- c — April 17, 2026 · ESPN scoreboard
- d — April 18, 2026 · ESPN scoreboard
- e — April 19, 2026 · ESPN scoreboard
- f — April 20, 2026 · ESPN scoreboard
- g — April 21, 2026 · ESPN scoreboard
- h — April 23, 2026 · ESPN scoreboard
- i — April 24, 2026 · ESPN scoreboard
- j — April 25, 2026 · ESPN scoreboard
If you'd rather use a different source, baseball-reference also publishes day-by-day boxes (example for April 25) and MLB.com publishes the same (example for April 25). I'm citing ESPN because it's the most familiar landing page; the underlying outcomes are the same on any of the three.
Notes
- All numbers are real, pulled from the live paper-trading database on 2026-04-27.
- Stakes are real Kelly-sized positions on the agent's $1,000 paper-money balance at the time of each pick.
- Sportsbook names visible in the source data (Pinnacle, Betfair, FanDuel, etc.) are real bookmakers offering the listed odds at placement time. They don't appear here because the focus is on the agent's decision, not the book.
- The "RAG context" lines on a couple of picks are excerpts from the unstructured-news layer — what the agent was reading at the moment of the call.
- This is paper money. None of these picks moved real dollars in real accounts. They moved numbers in a database. That's the whole point of paper trading.
These are personal notes from a side project I'm pursuing on my own time with my own resources. The views here are my own and are not connected to, endorsed by, or representative of my employer or any of my professional work.