A multi-strategy prediction market trading system built for edge detection, speed, and disciplined risk management. Designed for Vishnu — funded by sushi.
91% of Kalshi's volume is sports. That's where Susquehanna, the quant shops, and every degenerate with a model are fighting. The edge is thin and the competition is fierce.
Our thesis: Build probabilistic models on structurally underserved markets — weather, economic data, travel volumes, esoteric events — where free public data gives us a 5-15% better estimate than the crowd. Use sports as a secondary channel for high-conviction plays with live data feeds.
v0.2 upgrade: After deep competitive recon (X threads, open-source repos, blog posts, HN), we've identified exactly where the field is crowded and where the alpha gaps remain. Every builder is using the same GFS-only playbook. We're going multi-model + market making + 3-venue arb.
Where all the smart money fights. Hard to find edge without $10K+/mo data feeds.
Weather, econ, travel, politics. Free data, fewer competitors, structural patterns.
Fees are parabolic — near 0 at probability extremes. Farm the tails.
Each layer uses the right tool for the job. Rust for execution speed (order placement, WebSocket, arb detection). Go for data ingestion and model serving (concurrent API scraping, signal aggregation). Python for model development and research (backtesting, feature engineering, visualization).
kalshi-farmer/
├── rust/
│ ├── src/
│ │ ├── main.rs # Entry point, runtime orchestrator
│ │ ├── exchange/
│ │ │ ├── kalshi.rs # Kalshi REST + WebSocket client (RSA-PSS auth)
│ │ │ ├── polymarket.rs # Polymarket CLOB client (arb)
│ │ │ ├── forecastex.rs # ForecastEx/IBKR client (3-venue arb)
│ │ │ └── types.rs # Shared order/market types
│ │ ├── execution/
│ │ │ ├── oms.rs # Order management system
│ │ │ ├── arb_detector.rs # 3-venue spread scanner (Kalshi×Poly×ForecastEx)
│ │ │ ├── market_maker.rs # Avellaneda-Stoikov in log-odds space
│ │ │ └── position.rs # Position tracking + limits
│ │ └── risk/
│ │ ├── kelly.rs # Fractional Kelly sizing
│ │ ├── limits.rs # Hard position/loss limits
│ │ └── circuit.rs # Circuit breakers (daily loss, drawdown)
│ └── Cargo.toml
├── go/
│ ├── cmd/
│ │ └── aggregator/main.go # Signal aggregator entry point
│ ├── internal/
│ │ ├── sources/
│ │ │ ├── tsa.go # TSA checkpoint scraper
│ │ │ ├── hrrr.go # HRRR hourly rapid-refresh (key edge)
│ │ │ ├── noaa.go # NOAA GFS ensemble + NWS forecasts
│ │ │ ├── fred.go # FRED economic data
│ │ │ ├── bls.go # BLS CPI/jobs data
│ │ │ ├── clevfed.go # Cleveland Fed inflation nowcast
│ │ │ ├── atlfed.go # Atlanta Fed GDPNow
│ │ │ ├── fedwatch.go # CME FedWatch probabilities
│ │ │ ├── odds.go # The Odds API (sports)
│ │ │ ├── espn.go # ESPN hidden API
│ │ │ ├── trends.go # Google Trends proxy
│ │ │ ├── reddit.go # Reddit sentiment (PRAW)
│ │ │ └── polymarket.go # Polymarket prices (arb ref)
│ │ ├── scorer/
│ │ │ ├── probability.go # Model inference (ONNX)
│ │ │ └── edge.go # Edge calculator (model vs market)
│ │ └── server/
│ │ └── grpc.go # gRPC server → Rust client
│ └── go.mod
├── python/
│ ├── models/
│ │ ├── tsa_model.py # TSA volume predictor
│ │ ├── weather_model.py # Temperature forecast ensemble
│ │ ├── cpi_model.py # CPI nowcast model
│ │ ├── fed_model.py # Fed rate decision model
│ │ ├── sports_model.py # Sports outcome model
│ │ └── export_onnx.py # Export trained models → ONNX
│ ├── backtest/
│ │ ├── engine.py # Historical simulation engine
│ │ ├── data_loader.py # Load historical Kalshi data
│ │ └── metrics.py # Sharpe, max drawdown, win rate
│ ├── notebooks/
│ │ ├── eda.ipynb # Exploratory data analysis
│ │ └── signal_research.ipynb
│ └── requirements.txt
├── infra/
│ ├── docker-compose.yml # TimescaleDB + Redis + Grafana
│ ├── grafana/dashboards/ # P&L, positions, signals
│ └── migrations/ # TimescaleDB schema
└── config/
├── strategies.toml # Strategy params, enabled markets
├── risk.toml # Position limits, Kelly fraction
└── secrets.toml.example # API keys template
The edge comes from combining multiple weak signals into a strong probability estimate. Every signal below feeds the Go aggregator, which scores markets in real-time.
| Source | Data | Markets | Cost | Edge |
|---|---|---|---|---|
| TSA.gov | Daily checkpoint passenger volumes | Travel volume | Free | Seasonal regression beats crowd by 5-10% |
| NOAA HRRR | Hourly 3km resolution rapid-refresh forecasts | Temperature (<24h) | Free | KEY EDGE: Updates hourly vs GFS 4x/day. 3km vs 25km resolution. Most bots don't use it. |
| NOAA GFS | 31-member ensemble, 4x daily | Temperature (2-7 day) | Free | Standard model — table stakes. Everyone uses this. |
| ECMWF | European model, gold standard 2-5 day | Temperature, hurricane | Free* | Best 2-5 day accuracy globally. *Via Open-Meteo |
| Open-Meteo | 13-model blend (GFS+HRRR+ECMWF+NAM+ICON+GEM+JMA+more) | All weather | Free | One API for all models. Ensemble spread = uncertainty quantification |
| NWS Forecasts | Official city forecasts (resolution source) | Temperature | Free | Kalshi settles against NWS. This IS the ground truth. |
| Cleveland Fed | Daily CPI/PCE nowcast | CPI, inflation | Free | Daily updates vs monthly Kalshi settlement |
| Atlanta Fed GDPNow | Real-time GDP nowcast | GDP growth | Free | Front-run revision after each data release |
| NY Fed Nowcast | GDP + macro nowcast | GDP, employment | Free | Cross-validate with GDPNow |
| FRED API | 840K+ economic time series | All econ markets | Free | Feature inputs for every macro model |
| BLS API | CPI, jobs, wages (raw data) | CPI, unemployment | Free | Build model from components before headline drops |
| ESPN API | Live scores, stats, schedules | Sports outcomes | Free | Real-time game state for live markets |
| Polymarket | Orderbook, prices, trades | Cross-platform arb | Free | Price discrepancies = risk-free profit |
| Google Trends | Search interest over time | All markets | Free | 24-72hr leading indicator for event markets |
| SEC EDGAR | Filings, insider trades | Company events | Free | Insider trading signals policy/merger markets |
| CME FedWatch | Rate change probabilities | Fed rate decisions | $25/mo | Deepest rate market implied probs vs Kalshi |
| The Odds API | 40+ bookmaker odds | Sports | $59-119/mo | Sharp line movement, reverse line detection |
| X / Twitter | Real-time sentiment stream | Breaking events | $100/mo | News hits X 15-30 min before other platforms |
| Reddit / PRAW | Community analysis | All markets | Free | r/sportsbook reverse-engineers sharp money |
| FlightAware | Real-time flight data | Travel, weather disruption | Usage | Cancellation rates predict travel volume |
| Finnhub | Congressional stock trades | Political/policy | Free | Congress trades around policy decisions |
Total cost for full signal stack: $184-244/mo. The free tier alone covers 80% of market categories.
HIGHEST EDGE · OUR PRIMARY ADVANTAGE
Most open-source bots (suislanchez, AipublishiPRO, etc.) use GFS-only — a single 31-member ensemble from Open-Meteo. That's table stakes. Our edge: 13-model weighted ensemble with HRRR dominance for short-term forecasts.
HIGH EDGE · MODERATE COMPETITION
Build nowcast models using the same methodology as Fed economists — but trade the Kalshi market before the number drops. Cleveland Fed daily CPI nowcast updates = daily edge refresh. Key: be faster at processing underlying component data (energy prices, shelter costs, used cars) than the crowd.
RISK-FREE · SPEED-DEPENDENT · NOW 3 VENUES
Identical events on Kalshi vs Polymarket vs ForecastEx (Interactive Brokers) frequently show 2-5% price spreads. Three venues for the same underlying = more arb surface. BTC arb bots already proven on Kalshi×Polymarket — same pattern for weather temps.
SELECTIVE · HIGH CONVICTION ONLY
Don't market-make sports. Cherry-pick value when the model disagrees with the market by >8%. Use The Odds API to detect reverse line movement (public on one side, sharp money on other). Add weather-at-venue and referee tendency data for edge. Vishnu's sports knowledge is the qualitative overlay.
PASSIVE INCOME · CONSISTENT SMALL WINS · REQUIRES CAPITAL
Instead of only taking directional bets, post two-sided quotes and earn the spread. Adapted Avellaneda-Stoikov model for binary markets. Key insight from HN quant traders: work in log-odds space — a move from 2c→1c is a doubling (huge), while 50c→49c is noise. Most bots ignore this and get crushed at the extremes.
Deep recon across X threads, open-source GitHub repos, blog posts, and Hacker News reveals a crowded but shallow field. Most builders use the exact same playbook. Here's the map.
| Builder/Tool | Approach | Weakness |
|---|---|---|
| suislanchez/weather-bot | GFS 31-member ensemble, Kelly sizing, 8% edge threshold, React dashboard | Single model (GFS only), no HRRR, no market making |
| Degen Doppler | 13 weather models, normal distribution probability, YES edge calculator | Tool only (no execution), no arb, manual trading |
| Polyforecast.io | 5-model ensemble, real trades on Polymarket, on-chain verifiable | Polymarket only, no Kalshi, subscription signal service |
| AipublishiPRO | 18-city scanner, NWS data, "locked wins" approach | Python only, no speed advantage, no cross-platform |
| cpratim/Jump Tank | LSTM neural nets, multi-source error weighting | Research project, not production-hardened |
| ryanfrigo/AI bot | Grok-4 + multi-agent LLM decision making | LLM latency kills edge, token costs eat profits |
| Sentinel_Algo | Elo-based sports signals, regime awareness | Sports-focused (crowded), sells Fiverr gigs |
| $24K Polymarket bot | Simple NOAA vs market price, multi-city, 24/7 | Single model, no ensemble, no risk management |
13 models vs their 1. HRRR hourly updates vs GFS 4x/day. We see the latest forecast 6-23 hours before GFS-only bots.
Everyone does Kalshi OR Polymarket. We do Kalshi + Polymarket + ForecastEx. Triangular arb = more surface area.
Nobody combines both. We earn spread income (market making) AND take directional bets (model signals). Two revenue streams.
Field is 95% Python. Our Rust layer is 10-100x faster for arb detection and order placement. Speed is alpha.
Built a Kalshi weather bot, documented everything. Key findings:
Sources: suislanchez/weather-bot · Degen Doppler · Polyforecast.io · Chris Dodds analysis · $24K bot guide · HN market making · KalshiMarketMaker
Most prediction market bots blow up because of position sizing, not bad models. We use fractional Kelly criterion with hard circuit breakers.
Quarter Kelly (0.25x) for new strategies. Graduate to half Kelly (0.5x) after 200+ profitable trades with realized Sharpe >1.0. Never full Kelly — probability estimation error makes it suicidal.
Max 5% of bankroll per market (hard cap). Kalshi enforces $25K per market at standard tier. Separate bankroll: 60% arb capital, 40% edge capital.
Daily loss limit: 3% of bankroll → halt all trading for 24h. Weekly drawdown: 8% → halt + review models. Monthly drawdown: 15% → shut down, manual review required.
Field uses GFS-only (1 model). We fuse HRRR + GFS + ECMWF + NAM + ICON + GEM + JMA + 6 more. HRRR's hourly updates give us a 6-23 hour information advantage over GFS-only bots.
Arb windows last seconds. Most Kalshi bots are Python. Our order placement is 10-100x faster. On arb: speed IS the edge.
Kalshi's parabolic fee structure means tail bets (90c+) cost almost nothing. Our weather and econ models naturally produce high-confidence (extreme probability) predictions. We trade where fees approach zero.
Weather and TSA markets settle daily. That's 365 opportunities/year per market type. Sports is seasonal. Our core strategies never go dormant.
Two revenue streams. Earn spreads passively (Avellaneda-Stoikov) while taking directional bets on model signals. No other open-source bot combines both.
Kalshi + Polymarket + ForecastEx. Triangular arb across 3 regulated venues. Everyone else does 1 or 2 platforms max.
| Phase | Timeline | Deliverable | Milestone |
|---|---|---|---|
| M0: Sandbox | Week 1 | Rust Kalshi client + Go signal aggregator (TSA, NOAA, FRED) | Place orders on Kalshi sandbox environment |
| M1: Models | Week 2-3 | Python weather + CPI models trained on historical data. Export to ONNX. | Backtest shows >55% win rate, Sharpe >1.0 |
| M2: Arb Engine | Week 3-4 | Rust cross-platform arb detector (Kalshi × Polymarket). Full execution loop. | Detect and log arb opportunities in sandbox |
| M3: Paper Trade | Week 4-5 | Full system running on live data, paper trading (no real money) | 2 weeks of simulated P&L tracking, all 4 strategies |
| M4: Go Live | Week 6 | Real money, minimum sizing (quarter Kelly). Grafana dashboard. | First profitable week with positive Sharpe |
| M5: Scale | Week 8+ | Add sports models, increase sizing on proven strategies, apply for Advanced API tier | Consistent monthly returns, upgrade Kelly fraction |
| M6: Evolution | Week 10+ | Deploy Darwinian loop. 5 strategy agents + meta-allocator. Weekly mutation cycles begin. | First successful mutation (Sharpe improvement post-mutation) |
| M7: Convergence | Week 24+ | ~180 days of evolution. Surviving params are "evolutionary products." System is self-tuning. | Agent params stable. Allocation auto-optimized. Hands-off operation. |
Sign up at kalshi.com. Fund with initial capital. Generate API keys in account settings. Apply for Advanced tier (30/s rate limit).
For cross-platform arb. Needs a Polygon wallet funded with USDC. Polymarket CLOB API access is free.
How much capital to deploy? Recommend starting with $5-10K. Separate arb capital (60%) from edge capital (40%). Never trade with money you can't lose.
FRED API key (instant), BLS API v2 registration (instant), The Odds API key ($59/mo for 40+ books). Optional: CME FedWatch ($25/mo).
After scaffolding the full repo, deploying infra, scanning 4,000+ live markets, and wiring up authenticated trading — here's what the spec didn't predict.
Kalshi moved their API from trading-api.kalshi.com to api.elections.kalshi.com. The old domain returns a 401 with a migration notice. All clients must target the new domain.
PROD REST: https://api.elections.kalshi.com/trade-api/v2 PROD WS: wss://api.elections.kalshi.com/trade-api/ws/v2
Kalshi uses RSA-PSS signing — not PKCS1v15 like most REST APIs. This is a critical detail that most example code gets wrong. The signing mechanism:
Sign string: {timestamp_ms}{METHOD}{path_without_query}
Algorithm: RSA-PSS with SHA-256
MGF: MGF1(SHA-256)
Salt length: DIGEST_LENGTH (32 bytes)
Headers:
KALSHI-ACCESS-KEY: {api_key_id}
KALSHI-ACCESS-SIGNATURE: {base64(signature)}
KALSHI-ACCESS-TIMESTAMP: {timestamp_ms}
Scanned all open markets via the public API. The landscape is very different from what Kalshi's marketing suggests.
Total markets listed on the exchange right now.
Markets with both bid AND ask — where you can actually trade. ~5% of total.
Daily temperature markets were NOT active during our scan. Seasonal or on-demand listing — cannot rely on them year-round.
| Ticker | Volume | Spread | Category | Notes |
|---|---|---|---|---|
| ELON-TRILLIONAIRE | 92K+ | 2¢ | Financials | Highest volume on platform. Long-dated, very liquid. |
| US climate goal markets | 42K+ | 2-4¢ | Climate | Government policy outcomes. Good MM candidates. |
| OpenAI/Anthropic IPO | 40K+ | 2-3¢ | Tech/Financials | High interest, event-driven. Our kind of market. |
| Political markets | 20-50K | 1-5¢ | Politics | Cabinet confirmations, legislation. News-driven edge. |
| Science/space markets | 5-20K | 3-8¢ | Science | Wider spreads = better for market making. |
Key insight: The thesis holds — the best opportunities are NOT in sports. High-volume markets with 2-5¢ spreads exist across politics, financials, climate, and science. Weather markets are seasonal, but the market-making and arb strategies apply across all categories.
envisean/kalshi-farmer (private). Rust execution engine + Go aggregator + Python ML + Docker infra. 4 commits on main.
TimescaleDB (port 5433, 6 hypertables), Redis (port 6380), Grafana (port 3100 — 10-panel trading overview dashboard). All healthy.
Account: kalshifarm. RSA keypair generated and deployed to Mac Studio + pve-sea-01. Ready for paper trading.
Cumulative P&L, P&L by strategy, fills table, signal accuracy, open orders, today's P&L, win rate gauge, circuit breaker status, market volume chart, Sharpe ratio by strategy.
Given that daily weather markets are intermittent, the priority order shifts:
| # | Strategy | Why Now |
|---|---|---|
| 1 | Market Making | Works on ANY liquid market. High-volume political/financial markets have 2-5¢ spreads. Start here for consistent income. |
| 2 | Weather Farming | Still highest edge when KXHIGH markets are active. Seasonal — ramp up in spring/summer. |
| 3 | Cross-venue Arb | Polymarket + Kalshi price discrepancies exist on political markets. Validate with paper trades first. |
| 4 | Econ Sniping | CPI/GDP markets have moderate volume. Cleveland Fed nowcast gives daily edge refresh. |
| 5 | Sports | Deferred until core engine is validated. Crowded, requires paid data feeds. |
Inspired by @Chris_Worsey's autoresearch loop (Karpathy's method applied to markets: 25 AI agents debating daily, worst agent by rolling Sharpe gets its prompt rewritten, +22% in 173 days with real capital). We apply the same evolutionary pressure to our 5 strategy agents — but prediction markets give us an unfair advantage: daily binary settlement = 365 feedback cycles/year vs stocks' ~50-100 meaningful signals.
DAILY CYCLE (automated, no human intervention):
1. PROPOSE — Each agent generates trade recommendations with
confidence scores based on current parameters
2. ALLOCATE — Meta-agent weights proposals by rolling Sharpe:
capital_i = bankroll × (sharpe_i / Σ sharpe_all)
Floor: 5% minimum per active agent (no starvation)
Cap: 40% maximum (no concentration)
3. EXECUTE — Rust engine places orders per weighted allocation
4. SETTLE — Daily: weather + TSA resolve. Weekly: econ. Ongoing: sports
5. SCORE — Compute per-agent metrics:
• Rolling 30-day Sharpe ratio
• Win rate (last 100 trades)
• Brier score (probability calibration)
• Max drawdown (last 30 days)
6. EVOLVE — Worst agent by rolling Sharpe triggers mutation:
MUTATION PROTOCOL:
a) Snapshot current agent config (the "genome")
b) Generate N candidate mutations (parameter perturbations):
- Continuous params: ±10-25% random walk
- Categorical params: swap one selection
- Threshold params: widen or tighten by 1-3%
c) Backtest each mutation on last 60 days of data
d) Select best mutation by simulated Sharpe
e) Deploy mutated config for 7-day trial period
f) Compare trial Sharpe vs pre-mutation Sharpe:
KEEP if trial Sharpe > pre-mutation (evolution succeeded)
REVERT if trial Sharpe ≤ pre-mutation (mutation failed)
g) Log everything to evolution_history table
| Agent | Evolvable Parameters | Mutation Range |
|---|---|---|
| Weather | Model weights (13 floats), edge threshold, city blacklist, min ensemble agreement, time-of-day bias | Weights: ±15%. Threshold: ±2%. Cities: add/drop 1. |
| Econ | Nowcast source weights, component importance, entry timing offset, data staleness tolerance | Weights: ±20%. Timing: ±6h. Staleness: ±1 day. |
| Arb | Min spread (¢), scan interval (ms), venue priority order, max partial fill %, category filter | Spread: ±0.5¢. Interval: ±50ms. Categories: swap 1. |
| Sports | Sharp book list, reverse-line threshold, conviction minimum, league weights, prop type preferences | Threshold: ±3%. League weights: ±25%. Props: swap 1. |
| Market Maker | γ (risk aversion), σ (volatility est), κ (arrival rate), inventory cap, spread floor, logit clamp range | γ: ±0.02. σ: ±0.01. κ: ±0.3. Cap: ±5 contracts. |
| Meta (Allocator) | Sharpe lookback window, floor/cap %, rebalance frequency, drawdown penalty weight | Window: ±5 days. Floor: ±1%. Frequency: ±1 day. |
Weather settles daily. Worsey gets ~250 trading days/year for stock signals. We get 365 clean binary outcomes per city per year — 5 cities = 1,825 data points.
Stocks: noisy, continuous, unrealized. Prediction markets: $0 or $1, settled, final. No ambiguity in the loss function. Sharpe is clean.
Weather, econ, arb, sports, MM — uncorrelated strategies. Evolution can shift capital to whichever regime is working. Agents compete for allocation, not just accuracy.
| Phase | Period | What Happens |
|---|---|---|
| Seeding | Day 1-30 | All agents run with hand-tuned initial params. No mutations yet. Collecting baseline Sharpe data. |
| First Mutations | Day 31-60 | Worst agent mutated weekly. Conservative mutations (±10%). Meta-allocator starts shifting capital. |
| Full Evolution | Day 61-180 | All agents eligible for mutation. Wider mutation range (±25%). Expect ~20-30 mutations, 8-12 survive (Worsey's ratio). |
| Convergence | Day 180+ | Surviving params are "evolutionary products" — shaped by market feedback. Mutations slow as Sharpe stabilizes. System is self-tuning. |
Worsey's system discovered that its own portfolio manager was the weakest link before the humans did. Ours will do the same — but faster, because daily settlement means we're running the evolutionary loop at 7x the speed of a stock-based system. 378 iterations took Worsey 18 months. On Kalshi weather, we hit 378 iterations in ~54 days (7 mutations/week × 8 weeks).
Inspiration: @Chris_Worsey — Karpathy autoresearch loop applied to markets. 575 likes, 1,061 bookmarks. +22% in 173 days. The prompts are the weights, Sharpe is the loss function.