Architecture Spec · Confidential

Kalshi Farmer

A multi-strategy prediction market trading system built for edge detection, speed, and disciplined risk management. Designed for Vishnu — funded by sushi.

Prepared for
Vishnu Nath
Prepared by
Sean Hsieh + Emila
Date
March 11, 2026
Version
v0.4 — Field Report + Auth Spec
Status
Repo Live · Infra Deployed
01 — Thesis

Compete where the quants aren't

91% of Kalshi's volume is sports. That's where Susquehanna, the quant shops, and every degenerate with a model are fighting. The edge is thin and the competition is fierce.

Our thesis: Build probabilistic models on structurally underserved markets — weather, economic data, travel volumes, esoteric events — where free public data gives us a 5-15% better estimate than the crowd. Use sports as a secondary channel for high-conviction plays with live data feeds.

v0.2 upgrade: After deep competitive recon (X threads, open-source repos, blog posts, HN), we've identified exactly where the field is crowded and where the alpha gaps remain. Every builder is using the same GFS-only playbook. We're going multi-model + market making + 3-venue arb.

91%

Volume in Sports

Where all the smart money fights. Hard to find edge without $10K+/mo data feeds.

9%

Everything Else

Weather, econ, travel, politics. Free data, fewer competitors, structural patterns.

1.75¢

Max Fee at 50¢

Fees are parabolic — near 0 at probability extremes. Farm the tails.

02 — Architecture

Three languages, one system

Each layer uses the right tool for the job. Rust for execution speed (order placement, WebSocket, arb detection). Go for data ingestion and model serving (concurrent API scraping, signal aggregation). Python for model development and research (backtesting, feature engineering, visualization).

Rust
Execution Engine
Order management, WebSocket feed handler, cross-platform arb detector, position tracking. RSA-PSS authenticated API client. Sub-millisecond order placement.
Go
Signal Aggregator
Concurrent data ingestion from 15+ APIs. Model inference server (ONNX runtime). Signal scoring and probability estimation. gRPC to Rust layer.
Python
Research & Models
Model training (XGBoost, LightGBM, neural nets). Backtesting engine. Feature engineering. Export to ONNX for Go serving. Jupyter notebooks.
Infra
Platform
TimescaleDB (time-series), Redis (real-time state + pub/sub), Grafana dashboard, Docker Compose. Deploy on pve-sea-01 or Hetzner.

Repository Structure

kalshi-farmer/
├── rust/
│   ├── src/
│   │   ├── main.rs              # Entry point, runtime orchestrator
│   │   ├── exchange/
│   │   │   ├── kalshi.rs        # Kalshi REST + WebSocket client (RSA-PSS auth)
│   │   │   ├── polymarket.rs    # Polymarket CLOB client (arb)
│   │   │   ├── forecastex.rs    # ForecastEx/IBKR client (3-venue arb)
│   │   │   └── types.rs         # Shared order/market types
│   │   ├── execution/
│   │   │   ├── oms.rs           # Order management system
│   │   │   ├── arb_detector.rs  # 3-venue spread scanner (Kalshi×Poly×ForecastEx)
│   │   │   ├── market_maker.rs  # Avellaneda-Stoikov in log-odds space
│   │   │   └── position.rs      # Position tracking + limits
│   │   └── risk/
│   │       ├── kelly.rs         # Fractional Kelly sizing
│   │       ├── limits.rs        # Hard position/loss limits
│   │       └── circuit.rs       # Circuit breakers (daily loss, drawdown)
│   └── Cargo.toml
├── go/
│   ├── cmd/
│   │   └── aggregator/main.go   # Signal aggregator entry point
│   ├── internal/
│   │   ├── sources/
│   │   │   ├── tsa.go           # TSA checkpoint scraper
│   │   │   ├── hrrr.go          # HRRR hourly rapid-refresh (key edge)
│   │   │   ├── noaa.go          # NOAA GFS ensemble + NWS forecasts
│   │   │   ├── fred.go          # FRED economic data
│   │   │   ├── bls.go           # BLS CPI/jobs data
│   │   │   ├── clevfed.go       # Cleveland Fed inflation nowcast
│   │   │   ├── atlfed.go        # Atlanta Fed GDPNow
│   │   │   ├── fedwatch.go      # CME FedWatch probabilities
│   │   │   ├── odds.go          # The Odds API (sports)
│   │   │   ├── espn.go          # ESPN hidden API
│   │   │   ├── trends.go        # Google Trends proxy
│   │   │   ├── reddit.go        # Reddit sentiment (PRAW)
│   │   │   └── polymarket.go    # Polymarket prices (arb ref)
│   │   ├── scorer/
│   │   │   ├── probability.go   # Model inference (ONNX)
│   │   │   └── edge.go          # Edge calculator (model vs market)
│   │   └── server/
│   │       └── grpc.go          # gRPC server → Rust client
│   └── go.mod
├── python/
│   ├── models/
│   │   ├── tsa_model.py         # TSA volume predictor
│   │   ├── weather_model.py     # Temperature forecast ensemble
│   │   ├── cpi_model.py         # CPI nowcast model
│   │   ├── fed_model.py         # Fed rate decision model
│   │   ├── sports_model.py      # Sports outcome model
│   │   └── export_onnx.py       # Export trained models → ONNX
│   ├── backtest/
│   │   ├── engine.py            # Historical simulation engine
│   │   ├── data_loader.py       # Load historical Kalshi data
│   │   └── metrics.py           # Sharpe, max drawdown, win rate
│   ├── notebooks/
│   │   ├── eda.ipynb            # Exploratory data analysis
│   │   └── signal_research.ipynb
│   └── requirements.txt
├── infra/
│   ├── docker-compose.yml       # TimescaleDB + Redis + Grafana
│   ├── grafana/dashboards/      # P&L, positions, signals
│   └── migrations/              # TimescaleDB schema
└── config/
    ├── strategies.toml          # Strategy params, enabled markets
    ├── risk.toml                # Position limits, Kelly fraction
    └── secrets.toml.example     # API keys template
03 — Signal Sources

15+ data feeds, mostly free

The edge comes from combining multiple weak signals into a strong probability estimate. Every signal below feeds the Go aggregator, which scores markets in real-time.

SourceDataMarketsCostEdge
TSA.govDaily checkpoint passenger volumesTravel volumeFreeSeasonal regression beats crowd by 5-10%
NOAA HRRRHourly 3km resolution rapid-refresh forecastsTemperature (<24h)FreeKEY EDGE: Updates hourly vs GFS 4x/day. 3km vs 25km resolution. Most bots don't use it.
NOAA GFS31-member ensemble, 4x dailyTemperature (2-7 day)FreeStandard model — table stakes. Everyone uses this.
ECMWFEuropean model, gold standard 2-5 dayTemperature, hurricaneFree*Best 2-5 day accuracy globally. *Via Open-Meteo
Open-Meteo13-model blend (GFS+HRRR+ECMWF+NAM+ICON+GEM+JMA+more)All weatherFreeOne API for all models. Ensemble spread = uncertainty quantification
NWS ForecastsOfficial city forecasts (resolution source)TemperatureFreeKalshi settles against NWS. This IS the ground truth.
Cleveland FedDaily CPI/PCE nowcastCPI, inflationFreeDaily updates vs monthly Kalshi settlement
Atlanta Fed GDPNowReal-time GDP nowcastGDP growthFreeFront-run revision after each data release
NY Fed NowcastGDP + macro nowcastGDP, employmentFreeCross-validate with GDPNow
FRED API840K+ economic time seriesAll econ marketsFreeFeature inputs for every macro model
BLS APICPI, jobs, wages (raw data)CPI, unemploymentFreeBuild model from components before headline drops
ESPN APILive scores, stats, schedulesSports outcomesFreeReal-time game state for live markets
PolymarketOrderbook, prices, tradesCross-platform arbFreePrice discrepancies = risk-free profit
Google TrendsSearch interest over timeAll marketsFree24-72hr leading indicator for event markets
SEC EDGARFilings, insider tradesCompany eventsFreeInsider trading signals policy/merger markets
CME FedWatchRate change probabilitiesFed rate decisions$25/moDeepest rate market implied probs vs Kalshi
The Odds API40+ bookmaker oddsSports$59-119/moSharp line movement, reverse line detection
X / TwitterReal-time sentiment streamBreaking events$100/moNews hits X 15-30 min before other platforms
Reddit / PRAWCommunity analysisAll marketsFreer/sportsbook reverse-engineers sharp money
FlightAwareReal-time flight dataTravel, weather disruptionUsageCancellation rates predict travel volume
FinnhubCongressional stock tradesPolitical/policyFreeCongress trades around policy decisions

Total cost for full signal stack: $184-244/mo. The free tier alone covers 80% of market categories.

04 — Strategy Matrix

Four strategies, ranked by expected edge

🌡️ S1: Weather Farming (v0.2 — Multi-Model Ensemble)

HIGHEST EDGE · OUR PRIMARY ADVANTAGE

Most open-source bots (suislanchez, AipublishiPRO, etc.) use GFS-only — a single 31-member ensemble from Open-Meteo. That's table stakes. Our edge: 13-model weighted ensemble with HRRR dominance for short-term forecasts.

  • HRRR (key advantage): Updates every hour (vs GFS 4x/day), 3km resolution (vs GFS 25km). Whoever ingests the latest HRRR run first gets mispriced contracts before market adjusts
  • Model stack: HRRR + GFS + ECMWF + NAM + ICON + GEM + JMA + NWS + Tomorrow.io + OpenWeatherMap + WeatherAPI + PirateWeather + Open-Meteo blend (mirrors Degen Doppler's 13-model approach)
  • Weighting: HRRR dominates <24h, ECMWF dominates 2-5 day, GFS fills gaps. Normal distribution with forecast error → probability
  • Edge threshold: Dynamic — 5% for high ensemble agreement (12+ models agree), 12% for low agreement
  • City selection (Chris Dodds lesson): Score cities by forecast variance. Skip stable-weather cities (Miami). Focus on volatile cities (Chicago, Denver, NYC) where mispricing is common
  • Calibration: Brier score + CRPS (Continuous Ranked Probability Score) for ensemble spread quality
  • Markets: KXHIGH series (NY, CHI, MIA, LAX, DEN) + Polymarket temp ranges — ⚠️ v0.4 note: KXHIGH daily markets are NOT always active. When unavailable, pivot to high-volume long-dated markets (politics, climate, financials) for engine validation
  • Fee optimization: Weather markets often at extremes (90c+) = near-zero fees
  • Win rate target: 65%+ (up from 62% with multi-model)

📊 S2: Economic Data Sniping

HIGH EDGE · MODERATE COMPETITION

Build nowcast models using the same methodology as Fed economists — but trade the Kalshi market before the number drops. Cleveland Fed daily CPI nowcast updates = daily edge refresh. Key: be faster at processing underlying component data (energy prices, shelter costs, used cars) than the crowd.

  • Inputs: Cleveland/Atlanta/NY Fed nowcasts, BLS components, FRED series, CME FedWatch
  • Markets: CPI, Fed rate decisions, GDP, unemployment
  • Edge window: 1-7 days before settlement

🔄 S3: Three-Venue Arb (v0.2 — ForecastEx Added)

RISK-FREE · SPEED-DEPENDENT · NOW 3 VENUES

Identical events on Kalshi vs Polymarket vs ForecastEx (Interactive Brokers) frequently show 2-5% price spreads. Three venues for the same underlying = more arb surface. BTC arb bots already proven on Kalshi×Polymarket — same pattern for weather temps.

  • Inputs: Kalshi orderbook (WebSocket), Polymarket CLOB API, ForecastEx (IBKR API)
  • ForecastEx: Same daily high temp markets on Interactive Brokers. Third price source = triangular arb opportunities
  • Execution: Rust arb detector scans every 100ms across all 3 venues, fires simultaneously
  • Risk: Partial fills (one leg fills, other doesn't), market definition mismatch, settlement time differences
  • Capital: Separate arb bankroll, higher allocation (lower risk per trade)

🏈 S4: Sports Value Plays

SELECTIVE · HIGH CONVICTION ONLY

Don't market-make sports. Cherry-pick value when the model disagrees with the market by >8%. Use The Odds API to detect reverse line movement (public on one side, sharp money on other). Add weather-at-venue and referee tendency data for edge. Vishnu's sports knowledge is the qualitative overlay.

  • Inputs: The Odds API (40+ books), ESPN, referee data, venue weather
  • Strategy: Only bet when model + sharp money + Vishnu all align
  • Sizing: Smaller positions, higher conviction bar
  • Markets: NFL, NBA, MLB — focus on props and totals (less efficient than spreads)

📈 S5: Market Making (v0.2 — New Strategy)

PASSIVE INCOME · CONSISTENT SMALL WINS · REQUIRES CAPITAL

Instead of only taking directional bets, post two-sided quotes and earn the spread. Adapted Avellaneda-Stoikov model for binary markets. Key insight from HN quant traders: work in log-odds space — a move from 2c→1c is a doubling (huge), while 50c→49c is noise. Most bots ignore this and get crushed at the extremes.

  • Model: Avellaneda-Stoikov (2008) adapted for binary settlement. γ risk aversion parameter controls aggressiveness
  • Log-odds pricing: Transform probabilities to logit space before computing bid-ask spreads. Prevents getting picked off at tails
  • Inventory management: Skew quotes based on current position to avoid accumulating one-sided risk
  • Best markets: Weather (stable flow, daily settlement, wide spreads) and econ (low competition on order books)
  • Capital requirement: Higher ($15K+ recommended). Makes money on volume, not individual trade edge
  • Reference: KalshiMarketMaker (rodlaf/KalshiMarketMaker on GitHub), USC QuantSC paper (20.3% return, 51 trades/day)
04b — Competitive Landscape (v0.2)

Know what everyone else is building

Deep recon across X threads, open-source GitHub repos, blog posts, and Hacker News reveals a crowded but shallow field. Most builders use the exact same playbook. Here's the map.

What the Field Looks Like

Builder/ToolApproachWeakness
suislanchez/weather-botGFS 31-member ensemble, Kelly sizing, 8% edge threshold, React dashboardSingle model (GFS only), no HRRR, no market making
Degen Doppler13 weather models, normal distribution probability, YES edge calculatorTool only (no execution), no arb, manual trading
Polyforecast.io5-model ensemble, real trades on Polymarket, on-chain verifiablePolymarket only, no Kalshi, subscription signal service
AipublishiPRO18-city scanner, NWS data, "locked wins" approachPython only, no speed advantage, no cross-platform
cpratim/Jump TankLSTM neural nets, multi-source error weightingResearch project, not production-hardened
ryanfrigo/AI botGrok-4 + multi-agent LLM decision makingLLM latency kills edge, token costs eat profits
Sentinel_AlgoElo-based sports signals, regime awarenessSports-focused (crowded), sells Fiverr gigs
$24K Polymarket botSimple NOAA vs market price, multi-city, 24/7Single model, no ensemble, no risk management

Where We Beat Them

Multi-Model Ensemble

13 models vs their 1. HRRR hourly updates vs GFS 4x/day. We see the latest forecast 6-23 hours before GFS-only bots.

Three-Venue Arb

Everyone does Kalshi OR Polymarket. We do Kalshi + Polymarket + ForecastEx. Triangular arb = more surface area.

Market Making + Directional

Nobody combines both. We earn spread income (market making) AND take directional bets (model signals). Two revenue streams.

Rust Execution

Field is 95% Python. Our Rust layer is 10-100x faster for arb detection and order placement. Speed is alpha.

Cautionary Intel

Chris Dodds' Hard-Won Lessons

Built a Kalshi weather bot, documented everything. Key findings:

  • Cheap contracts (<10c) lose 60%+ of invested money. Longshot bias is real and brutal. Farm the tails where YOU'RE the favorite, not the longshot.
  • City selection matters enormously. Some cities have stable weather (low mispricing). Volatile cities = more edge.
  • Bot was "profitable-ish" only after blacklisting cities and adding filters. Undisciplined scanning = death by a thousand paper cuts.
  • "The dangerous part is the framing." Prediction markets get sold as rational — the practical reality is harder than it looks.

Sources: suislanchez/weather-bot · Degen Doppler · Polyforecast.io · Chris Dodds analysis · $24K bot guide · HN market making · KalshiMarketMaker

05 — Risk Management

The bot that survives wins

Most prediction market bots blow up because of position sizing, not bad models. We use fractional Kelly criterion with hard circuit breakers.

Kelly Sizing

Quarter Kelly (0.25x) for new strategies. Graduate to half Kelly (0.5x) after 200+ profitable trades with realized Sharpe >1.0. Never full Kelly — probability estimation error makes it suicidal.

Position Limits

Max 5% of bankroll per market (hard cap). Kalshi enforces $25K per market at standard tier. Separate bankroll: 60% arb capital, 40% edge capital.

Circuit Breakers

Daily loss limit: 3% of bankroll → halt all trading for 24h. Weekly drawdown: 8% → halt + review models. Monthly drawdown: 15% → shut down, manual review required.

Kill Conditions (auto-halt, no override)

06 — Why This Wins

Compounding advantages

🧠

13-Model Weather Ensemble

Field uses GFS-only (1 model). We fuse HRRR + GFS + ECMWF + NAM + ICON + GEM + JMA + 6 more. HRRR's hourly updates give us a 6-23 hour information advantage over GFS-only bots.

Rust execution speed

Arb windows last seconds. Most Kalshi bots are Python. Our order placement is 10-100x faster. On arb: speed IS the edge.

🎯

Fee optimization

Kalshi's parabolic fee structure means tail bets (90c+) cost almost nothing. Our weather and econ models naturally produce high-confidence (extreme probability) predictions. We trade where fees approach zero.

🔄

Daily compounding

Weather and TSA markets settle daily. That's 365 opportunities/year per market type. Sports is seasonal. Our core strategies never go dormant.

📊

Market making + directional

Two revenue streams. Earn spreads passively (Avellaneda-Stoikov) while taking directional bets on model signals. No other open-source bot combines both.

🔺

Three-venue arb surface

Kalshi + Polymarket + ForecastEx. Triangular arb across 3 regulated venues. Everyone else does 1 or 2 platforms max.

07 — Roadmap

From spec to live trading in 6 weeks

PhaseTimelineDeliverableMilestone
M0: SandboxWeek 1Rust Kalshi client + Go signal aggregator (TSA, NOAA, FRED)Place orders on Kalshi sandbox environment
M1: ModelsWeek 2-3Python weather + CPI models trained on historical data. Export to ONNX.Backtest shows >55% win rate, Sharpe >1.0
M2: Arb EngineWeek 3-4Rust cross-platform arb detector (Kalshi × Polymarket). Full execution loop.Detect and log arb opportunities in sandbox
M3: Paper TradeWeek 4-5Full system running on live data, paper trading (no real money)2 weeks of simulated P&L tracking, all 4 strategies
M4: Go LiveWeek 6Real money, minimum sizing (quarter Kelly). Grafana dashboard.First profitable week with positive Sharpe
M5: ScaleWeek 8+Add sports models, increase sizing on proven strategies, apply for Advanced API tierConsistent monthly returns, upgrade Kelly fraction
M6: EvolutionWeek 10+Deploy Darwinian loop. 5 strategy agents + meta-allocator. Weekly mutation cycles begin.First successful mutation (Sharpe improvement post-mutation)
M7: ConvergenceWeek 24+~180 days of evolution. Surviving params are "evolutionary products." System is self-tuning.Agent params stable. Allocation auto-optimized. Hands-off operation.
08 — What Vishnu Needs to Do

Prerequisites

1. Kalshi Account + API Keys

Sign up at kalshi.com. Fund with initial capital. Generate API keys in account settings. Apply for Advanced tier (30/s rate limit).

2. Polymarket Wallet

For cross-platform arb. Needs a Polygon wallet funded with USDC. Polymarket CLOB API access is free.

3. Define Bankroll

How much capital to deploy? Recommend starting with $5-10K. Separate arb capital (60%) from edge capital (40%). Never trade with money you can't lose.

4. API Keys (Free Tier)

FRED API key (instant), BLS API v2 registration (instant), The Odds API key ($59/mo for 40+ books). Optional: CME FedWatch ($25/mo).

09 — Field Report (v0.4)

What we learned building it

After scaffolding the full repo, deploying infra, scanning 4,000+ live markets, and wiring up authenticated trading — here's what the spec didn't predict.

API Endpoint Migration

Kalshi moved their API from trading-api.kalshi.com to api.elections.kalshi.com. The old domain returns a 401 with a migration notice. All clients must target the new domain.

PROD REST: https://api.elections.kalshi.com/trade-api/v2
PROD WS:   wss://api.elections.kalshi.com/trade-api/ws/v2

RSA-PSS Auth (NOT PKCS1v15)

Kalshi uses RSA-PSS signing — not PKCS1v15 like most REST APIs. This is a critical detail that most example code gets wrong. The signing mechanism:

Sign string: {timestamp_ms}{METHOD}{path_without_query}
Algorithm:  RSA-PSS with SHA-256
MGF:        MGF1(SHA-256)
Salt length: DIGEST_LENGTH (32 bytes)

Headers:
  KALSHI-ACCESS-KEY: {api_key_id}
  KALSHI-ACCESS-SIGNATURE: {base64(signature)}
  KALSHI-ACCESS-TIMESTAMP: {timestamp_ms}

Market Landscape (Live Scan, March 2026)

Scanned all open markets via the public API. The landscape is very different from what Kalshi's marketing suggests.

~4,000

Open Markets

Total markets listed on the exchange right now.

209

Actively Quoted

Markets with both bid AND ask — where you can actually trade. ~5% of total.

0

KXHIGH Weather

Daily temperature markets were NOT active during our scan. Seasonal or on-demand listing — cannot rely on them year-round.

Top Markets by Volume (Live Data)

TickerVolumeSpreadCategoryNotes
ELON-TRILLIONAIRE92K+FinancialsHighest volume on platform. Long-dated, very liquid.
US climate goal markets42K+2-4¢ClimateGovernment policy outcomes. Good MM candidates.
OpenAI/Anthropic IPO40K+2-3¢Tech/FinancialsHigh interest, event-driven. Our kind of market.
Political markets20-50K1-5¢PoliticsCabinet confirmations, legislation. News-driven edge.
Science/space markets5-20K3-8¢ScienceWider spreads = better for market making.

Key insight: The thesis holds — the best opportunities are NOT in sports. High-volume markets with 2-5¢ spreads exist across politics, financials, climate, and science. Weather markets are seasonal, but the market-making and arb strategies apply across all categories.

Infrastructure (Deployed)

GitHub Repo

envisean/kalshi-farmer (private). Rust execution engine + Go aggregator + Python ML + Docker infra. 4 commits on main.

pve-sea-01 Stack

TimescaleDB (port 5433, 6 hypertables), Redis (port 6380), Grafana (port 3100 — 10-panel trading overview dashboard). All healthy.

Kalshi API Access

Account: kalshifarm. RSA keypair generated and deployed to Mac Studio + pve-sea-01. Ready for paper trading.

Grafana Dashboard

Cumulative P&L, P&L by strategy, fills table, signal accuracy, open orders, today's P&L, win rate gauge, circuit breaker status, market volume chart, Sharpe ratio by strategy.

Revised Strategy Priority

Given that daily weather markets are intermittent, the priority order shifts:

#StrategyWhy Now
1Market MakingWorks on ANY liquid market. High-volume political/financial markets have 2-5¢ spreads. Start here for consistent income.
2Weather FarmingStill highest edge when KXHIGH markets are active. Seasonal — ramp up in spring/summer.
3Cross-venue ArbPolymarket + Kalshi price discrepancies exist on political markets. Validate with paper trades first.
4Econ SnipingCPI/GDP markets have moderate volume. Cleveland Fed nowcast gives daily edge refresh.
5SportsDeferred until core engine is validated. Crowded, requires paid data feeds.
10 — Darwinian Strategy Evolution (v0.3)

Prompts are the weights. Sharpe is the loss function.

Inspired by @Chris_Worsey's autoresearch loop (Karpathy's method applied to markets: 25 AI agents debating daily, worst agent by rolling Sharpe gets its prompt rewritten, +22% in 173 days with real capital). We apply the same evolutionary pressure to our 5 strategy agents — but prediction markets give us an unfair advantage: daily binary settlement = 365 feedback cycles/year vs stocks' ~50-100 meaningful signals.

The Architecture: Strategy Agents + Meta-Allocator

Meta
Portfolio Allocator
Darwinian-weighted capital allocation across all strategy agents. Rebalances daily based on rolling 30-day Sharpe. The "fund manager" agent.
Agent 1
Weather Farmer
Evolves: model weight ratios (HRRR vs GFS vs ECMWF), edge thresholds, city selection, time-of-day bias, ensemble agreement minimums.
Agent 2
Econ Sniper
Evolves: which nowcast sources to trust, component weighting (energy vs shelter vs used cars), entry timing relative to data releases.
Agent 3
Arb Scanner
Evolves: minimum spread thresholds, venue preferences, timing windows, partial fill tolerance, market category selection.
Agent 4
Sports Picker
Evolves: which sharp book signals matter, sport/league preferences, prop vs total vs spread selection, conviction thresholds.
Agent 5
Market Maker
Evolves: A-S parameters (γ, σ, κ) per market category, inventory limits, spread floors, time-decay aggressiveness, log-odds transform parameters.

The Evolution Loop

DAILY CYCLE (automated, no human intervention):

  1. PROPOSE  — Each agent generates trade recommendations with
               confidence scores based on current parameters
  2. ALLOCATE — Meta-agent weights proposals by rolling Sharpe:
               capital_i = bankroll × (sharpe_i / Σ sharpe_all)
               Floor: 5% minimum per active agent (no starvation)
               Cap: 40% maximum (no concentration)
  3. EXECUTE  — Rust engine places orders per weighted allocation
  4. SETTLE   — Daily: weather + TSA resolve. Weekly: econ. Ongoing: sports
  5. SCORE    — Compute per-agent metrics:
               • Rolling 30-day Sharpe ratio
               • Win rate (last 100 trades)
               • Brier score (probability calibration)
               • Max drawdown (last 30 days)
  6. EVOLVE   — Worst agent by rolling Sharpe triggers mutation:

MUTATION PROTOCOL:
  a) Snapshot current agent config (the "genome")
  b) Generate N candidate mutations (parameter perturbations):
     - Continuous params: ±10-25% random walk
     - Categorical params: swap one selection
     - Threshold params: widen or tighten by 1-3%
  c) Backtest each mutation on last 60 days of data
  d) Select best mutation by simulated Sharpe
  e) Deploy mutated config for 7-day trial period
  f) Compare trial Sharpe vs pre-mutation Sharpe:
     KEEP if trial Sharpe > pre-mutation (evolution succeeded)
     REVERT if trial Sharpe ≤ pre-mutation (mutation failed)
  g) Log everything to evolution_history table

What Evolves (the "Genome" per Agent)

AgentEvolvable ParametersMutation Range
WeatherModel weights (13 floats), edge threshold, city blacklist, min ensemble agreement, time-of-day biasWeights: ±15%. Threshold: ±2%. Cities: add/drop 1.
EconNowcast source weights, component importance, entry timing offset, data staleness toleranceWeights: ±20%. Timing: ±6h. Staleness: ±1 day.
ArbMin spread (¢), scan interval (ms), venue priority order, max partial fill %, category filterSpread: ±0.5¢. Interval: ±50ms. Categories: swap 1.
SportsSharp book list, reverse-line threshold, conviction minimum, league weights, prop type preferencesThreshold: ±3%. League weights: ±25%. Props: swap 1.
Market Makerγ (risk aversion), σ (volatility est), κ (arrival rate), inventory cap, spread floor, logit clamp rangeγ: ±0.02. σ: ±0.01. κ: ±0.3. Cap: ±5 contracts.
Meta (Allocator)Sharpe lookback window, floor/cap %, rebalance frequency, drawdown penalty weightWindow: ±5 days. Floor: ±1%. Frequency: ±1 day.

Why This Works Better on Prediction Markets Than Stocks

365×

Feedback Cycles / Year

Weather settles daily. Worsey gets ~250 trading days/year for stock signals. We get 365 clean binary outcomes per city per year — 5 cities = 1,825 data points.

Binary

Clean Signal

Stocks: noisy, continuous, unrealized. Prediction markets: $0 or $1, settled, final. No ambiguity in the loss function. Sharpe is clean.

5 Agents

Diverse Gene Pool

Weather, econ, arb, sports, MM — uncorrelated strategies. Evolution can shift capital to whichever regime is working. Agents compete for allocation, not just accuracy.

Evolution Timeline

PhasePeriodWhat Happens
SeedingDay 1-30All agents run with hand-tuned initial params. No mutations yet. Collecting baseline Sharpe data.
First MutationsDay 31-60Worst agent mutated weekly. Conservative mutations (±10%). Meta-allocator starts shifting capital.
Full EvolutionDay 61-180All agents eligible for mutation. Wider mutation range (±25%). Expect ~20-30 mutations, 8-12 survive (Worsey's ratio).
ConvergenceDay 180+Surviving params are "evolutionary products" — shaped by market feedback. Mutations slow as Sharpe stabilizes. System is self-tuning.

The Punchline

Worsey's system discovered that its own portfolio manager was the weakest link before the humans did. Ours will do the same — but faster, because daily settlement means we're running the evolutionary loop at 7x the speed of a stock-based system. 378 iterations took Worsey 18 months. On Kalshi weather, we hit 378 iterations in ~54 days (7 mutations/week × 8 weeks).

Inspiration: @Chris_Worsey — Karpathy autoresearch loop applied to markets. 575 likes, 1,061 bookmarks. +22% in 173 days. The prompts are the weights, Sharpe is the loss function.