AI infrastructure, tools, and open research.
Sparkco is an open-source research project on the post-AGI stack — the runtime containers agents live in, the harnessing (glue code) inside them, and the messaging between them. It's built by the team behind SimpleFunctions, where we're exploring how live prediction-market probabilities can serve as a real-time world state for AI agents. The site is our public log of that work: a live feed of AI and prediction-market signals, plus the setups and tools we recommend for agent builders.
We ship tools as CLIs first, not MCP — 0 tokens to expose, ~100% reliable, pipe-composable.
Parametric memory: replacing the context window with weights.
Today's chat models remember by re-reading the entire conversation on every turn. Compaction loses information, retrieval crowds the window, and a new session starts blank. We're testing whether the facts, preferences, and behavior in a dialogue can be encoded directly into model weights — leaving the context free for what's actually being said now.
Want to collaborate? patrick@simplefunctions.dev
Read the full directionHide
The context window is a finite token sequence, fully recomputed on every turn. Every existing workaround — summarization memory, vector retrieval, KV caching — moves the cost without solving it: long context drifts, compaction discards information, retrieval crowds the same window it pulls from. If conversational state could live in weight deltas instead of tokens, the window would only need to hold the current turn.
- Test-time training. ByteDance In-Place TTT (ICLR 2026 oral) and Stanford/NVIDIA TTT-E2E update MLP projection weights online during inference, compressing long context into fast weights. All published work targets long-document throughput; nobody has tested whether the fast weights survive once the document is dropped from context.
- Hypernetwork → adapter. Sakana's Doc-to-LoRA (Feb 2026) and P2P (Oct 2025) train a hypernet that emits a LoRA from raw text or a user profile in under a second. Validates "text → weights" as a tractable mapping — but neither was designed for accumulating dialogue history.
- Dialogue-direct fine-tuning. PLUM (Nov 2024) fine-tunes a LoRA on dialogue Q/A pairs and matches RAG at 100 turns. MemLoRA trains memory management itself as a LoRA. IBM's Activated LoRA (Dec 2025) solves multi-LoRA hot-swap without KV recompute — making per-conversation memory modules feasible.
- Knowledge editing. ROME and MEMIT do surgical single-fact edits on weights, but catastrophic forgetting appears past ~1000 edits. Not a candidate at dialogue scale.
These live in disjoint communities — efficient inference, recsys, personalization NLP, on-device, model editing — and have never been compared on the same benchmark. None has been evaluated end-to-end on a real user's multi-hundred-turn history across technical, strategic, philosophical, and personal domains, with the conversation removed from context. Existing benchmarks (RULER, needle-in-haystack, LaMP) are synthetic or shallow.
- TTT fast weights as memory. Ingest a fact-bearing dialogue with In-Place TTT, drop the context, probe. Iterations 1–2 ran on a single A100 with a self-trained checkpoint — full write-up here. Negative: trained fast weights produced perturbation noise, not retrievable encoding, even at small inference-time scales. Joint base+TTT training is the next attack surface.
- Doc-to-LoRA over real dialogues. Same probes, hypernet-generated LoRA instead of TTT. Compare raw-dialogue input against structured-profile input for information retention.
- Modular memory adapters. Decompose dialogue history into facts, preferences, and project context. Train one LoRA per axis; hot-swap with Activated LoRA. Measure single-load vs combined-load interference.
- Capacity and forgetting curves. Stream new facts turn-by-turn; locate the point at which turn N overwrites turn 1. Trace the capacity–fidelity tradeoff.
- A "conversation memory retention" benchmark — three difficulty tiers, six fact dimensions. None currently exists for this scenario.
- First head-to-head comparison of TTT fast weights, Doc-to-LoRA, PLUM-style dialogue-LoRA, and classical summarization memory on the same eval.
- An empirical answer to whether modular per-domain memory adapters can be composed without cross-interference.
Three layers, and what's already out there.
Containers
Sandboxes, microVMs, durable runtimes — where the agent lives.
- e2bCode-interpreter sandboxes; the default for general-purpose runs.
- ModalgVisor + GPU-native; sub-1s starts, scales to 50k+ concurrent.
- DaytonaOpen source; ~90–200ms cold start, fastest in class.
- Fly.io SpritesStateful microVMs with checkpoint/restore and persistent NVMe.
- Vercel SandboxFirecracker + idle-billed; the JS-stack default.
SimpleFunctions sits on top: autonomous daemons, scheduler, and risk gates for prediction-market agents.
Harnessing
Glue code inside the container. Context curation, tool routing, the runtime loop.
- Claude Agent SDKAnthropic's harness; powers Claude Code itself.
- Inspect AIEval-grade harness used by METR, Apollo, and government AISIs.
- LangGraphLangChain's runtime layer — durable execution, threads, HITL.
- Claude Code / Cursor / AiderOpinionated harnesses-in-product; not sold separately.
SimpleFunctions ships /api/agent/world as ~800-token markdown context, plus a CLI with --json for deterministic harness mode.
Messaging
Between containers. Discovery, identity, stateful tasks — not tool-calling.
- A2AGoogle's Agent2Agent (Linux Foundation, 2025) — the emerging consensus.
- ANPPeer-to-peer agent network over HTTPS + DIDs for identity.
- LettaShared memory blocks + thread-based message passing.
- AutoGen GroupChatIn-process orchestration; supervisor / round-robin patterns.
SimpleFunctions Chatbus: agents DM and broadcast in real time — the messaging substrate for trading agents.
What we ship publicly.
Harness & agents
- harnessDual pi-agent runtime — two agents (local + Cloudflare) negotiate, share state, and self-modify via a 5-message protocol.
- MementoContext-integrity stress testing for Claude. Adversarial harness tampers with memory between sessions and watches whether the agent notices.
- claude-arenaAI vs AI vs AI — autonomous Claude agents battle in a live CTF arena with trading.
- claude-tradingAutonomous Claude agents trade against each other on a live exchange — maker vs takers.
SimpleFunctions
Curated lists
- awesome-cli-agentic-toolsCLI tools for AI agents — prediction markets, agent frameworks, coding agents, browser agents, developer CLIs.
- awesome-prediction-marketsAPIs, datasets, and resources for developers and AI agents.
- prediction-markets-reading256 articles on Kalshi, Polymarket, market microstructure, calibration, and trading strategies.
Terminal tools
- kalshi-orderbook-viewerDepth charts for prediction markets, in your terminal.
- kalshi-price-monitorAlerts on significant Kalshi/Polymarket price changes.
- polymarket-sports-mmSports market maker; pre-game and live quoting tuned to the quadratic reward function.
- polymarket-ticker-resolverResolve any Polymarket ID format (numeric, conditionId, CLOB token, slug). Zero deps.
Signals & probability
- prediction-market-edge-detectorDetect mispricings across 30,000+ markets.
- prediction-market-regimeReal-time crisis / risk-off / risk-on / complacent classifier.
- prediction-market-uncertaintyUncertainty index from 30,000+ markets — one number, 0–100.
- causal-tree-decompositionStandalone causal-tree probability engine; thesis → weighted confidence. Zero deps.
World-state plumbing
SDK adapters
- crewai-prediction-marketsCrewAI tools.
- langchain-prediction-marketsLangChain tools.
- openai-agents-prediction-marketsOpenAI Agents SDK tools.
- vercel-ai-prediction-marketsVercel AI SDK tools.
- create-prediction-market-agentScaffold a project. Works with LangChain, CrewAI, OpenAI Agents SDK, or vanilla TypeScript.
- prediction-market-mcp-exampleMinimal MCP server example.
Live feed
Mixed stream from prediction markets, theses, new listings, and the blog.
Dogecoin Up or Down - June 10, 6:25PM-6:30PM ET
Solana Up or Down - June 10, 6:25PM-6:30PM ET
Hyperliquid Up or Down - June 10, 6:25PM-6:30PM ET
Ethereum Up or Down - June 10, 6:25PM-6:30PM ET
BNB Up or Down - June 10, 6:25PM-6:30PM ET
Bitcoin Up or Down - June 10, 6:25PM-6:30PM ET
US freezes Russian assets, sanctions Iran, bombs Iran — each action tells the world the dollar syste
Confidence increased slightly due to constructive orderbook flow in BTC and gold, signaling strong market participation in the thesis despite no fundamental change in the geopolitical trajectory.
Stagflation traps the Fed in an impossible triangle. Powell stays until Warsh confirmation. Trump in
The thesis remains robust; slight confidence increase reflects continued evidence of the Fed's constrained policy path despite cooling inflationary pressures in recent CPI prints. Most market moves are consistent with the existing volatilit
Hormuz blockade disrupts fertilizer supply chains. Fertilizer prices spike, US farm costs surge, foo
Recent data shows a sharp, aggressive surge in Strait of Hormuz transit volumes, suggesting regional friction may be lower than previously assumed, which acts as a drag on the fertilizer disruption thesis. Confidence is adjusted slightly do
US military assets pinned in Middle East — largest deployment since 2003. Pacific theater is exposed
The thesis confidence slightly decreased as reports confirm further deployment of US assets from the Indo-Pacific to the Middle East, complicating the transition toward Pacific re-engagement. While REFORPAC exercises provide a baseline for
Brent crude above $97 on June 12 — buy the taker flow
R8 sits at 29¢ with a regime shift from neutral-to-taker, signaling smart money is accumulating. Iran airspace closure at 100¢ and Hormuz disruption premium being priced in creates a direct causal path to near-term crude spikes. The 15¢ mov
US-Iran peace deal by June 30 is mispriced — sell the hope
Peace deal probability sits at 17¢ (down 6¢) with the 'No Meeting by June 30' contract at 67¢, yet some residual 17¢ remains on the table. Airspace closure at 100¢ is logically inconsistent with any June peace deal — that's a 17-point mispr
Mythos Dec-1 contract massively lags near-term release signal
C7 and C8 both flag a 59¢ contagion gap: the Oct-1 and Jun-15 trigger contracts have moved sharply while the Dec-1 lagging contract (M10) sits at 76¢ — but the raw laggingPrice in the contagion data shows 25¢, implying the Dec-1 YES is deep
Anthropic Nov-1 Mythos at 66¢ offers 130 IY — buy the yield
M17 prices Mythos release before Nov 1 at 66¢ with a 130 implied yield and 145 days to expiry — one of the cleanest yield plays in the dataset. The June-15 contract surging 25¢ to 46¢ means the conditional probability of release by Nov 1 sh
Bitcoin vs gold cross-venue arb: 8-cent gap, 95% confidence
X1 on Kalshi prices Bitcoin outperforming gold in 2026 at 17¢ versus X2 on Polymarket at 25¢ — an 8¢ gap at 95% confidence, clearing the 5¢ arb threshold decisively. Buy X2 on Polymarket at 25¢ and sell X1 on Kalshi at 17¢ for a locked 8¢ s
Congress veto override before 2027: 1,788 IY at 9¢ is extreme
L1 prices a Congress veto override before 2027 at just 9¢ with a 1,788 implied yield — the highest actionable IY in the legislative dataset. L2 (same event, perpetual) sits at 26¢ for 109 IY, implying the near-term contract is dramatically
SpaceX Communication Services reclassification at 75¢ — buy the policy catalyst
M14 prices SpaceX being assigned to Communication Services at 75¢ with a 57 IY and 212 days to expiry, on $22,593 in 24h volume — the highest volume policy-adjacent market in the dataset. The delta of +48 shows momentum is building. A Starl
The United States will launch a ground invasion of Iran. After 5 weeks of airstrikes, the US faces t
Thesis confidence drops as multiple mediation channels (Oman, Pakistan) report breakthroughs, directly contradicting the 'no diplomatic off-ramp' core assumption. Market prices for oil and shipping transit have aggressively corrected, sugge
Putin profits from Iran war oil prices. Russian military budget fully funded. Ukraine peace talks st
The thesis confidence faced a minor downward revision as oil futures markets showed a trend toward stabilizing or retreating from high-end upside bets, contradicting the expectation of an extreme price spike supporting Russia's war budget.
Oil above $100 drives electricity costs up. Data center operating costs surge. AI companies delay or
Recent market signals show a strong retreat in energy price expectations, specifically regarding WTI oil and natural gas benchmarks, which weakens the thesis that electricity costs will surge to the point of impacting data center expansion.
What we'd install on a fresh machine
Three of ours, five from the community we trust.
npm i -g @spfunctions/cli@spfunctions/harness
SparkcoDual-agent runtime. Two pi-agents (local + Cloudflare) negotiate, share state, and self-modify via a 5-message protocol. $1/day to run.
npm i -g @spfunctions/harnessBrowse 69+ CLI tools
Taste-curated. Filter by category, sorted by Sparkco-first then stars.
npm i -g @spfunctions/cligit clone https://github.com/spfunctions/polymarket-sports-mm@spfunctions/prediction-market-mcp
SparkcoMCP server with 4 tools. Works with Claude, Cursor, VS Code.
npx @spfunctions/prediction-market-mcppip install simplefunctions-aigit clone https://github.com/spfunctions/prediction-market-mcp-examplegit clone https://github.com/spfunctions/kalshi-price-monitorgit clone https://github.com/spfunctions/prediction-market-contextgit clone https://github.com/spfunctions/causal-tree-decompositioncreate-prediction-market-agent
SparkcoScaffold agent projects: LangChain, CrewAI, OpenAI, vanilla TS.
npx create-prediction-market-agentuses: spfunctions/world-state-action@v1npm i langchain-prediction-marketsnpm i openai-agents-prediction-marketsnpm i vercel-ai-prediction-marketspip install crewai-prediction-marketsnpm i agent-world-awarenessgit clone https://github.com/spfunctions/prediction-market-edge-detector@spfunctions/harness
SparkcoDual-agent runtime. Two pi-agents (local + Cloudflare) negotiate, share state, and self-modify via a 5-message protocol. $1/day to run.
npm i -g @spfunctions/harness@spfunctions/bi
SparkcoAgent-friendly BI CLI. Query CSV/JSON/Parquet with SQL via DuckDB. 4 commands: head, schema, query, convert.
npm i -g @spfunctions/bicode --install-extension saoudrizwan.claude-devpip install openai-agentsgo install github.com/xo/usql@latestbrew install stripe/stripe-cli/stripego install github.com/cube2222/octosql/cmd/octosql@latestnpx @anthropic/playwright-mcpgit clone https://github.com/nweii/prediction-market-analysispip install sqlite-utilsbrew install supabase/tap/supabasegit clone https://github.com/Polymarket/agentsgit clone https://github.com/elizaOS/kalshi-ai-trading-botgit clone https://github.com/berlinbra/polymarket-mcp-servergit clone https://github.com/polybot-nexus/polybotgit clone https://github.com/PredictOS/predictospip install dr-manhattangit clone https://github.com/CloddsBot/cloddsbotgit clone https://github.com/polymarket-pipeline/pipelinegit clone https://github.com/gnosis/prediction-market-agentgit clone https://github.com/kalshi-trading/bot-clipip install kalshi-pythonpip install prediction-market-agent-toolingLatest from the blog
Insights on AI agents, prediction markets, and developer tools.
Automated Prediction Market Trading: CLI Agents on Kalshi
A practical guide for developers and traders on using CLI-based agents to automate order placement on Kalshi prediction markets. Covers thesis-driven trading logic, real tickers, and the agentic runtime behind production-grade automation.
Prediction Market Terminal Dashboard: Bloomberg-Style Monitoring for Kalshi Traders
A practical guide to building a professional-grade terminal dashboard for monitoring Kalshi prediction markets in real time. Covers CLI tooling, agentic scanning, position tracking, and thesis-driven trade execution.
Prediction Market Edge Detection: How to Find Mispriced Contracts on Kalshi
A systematic approach to finding mispriced prediction market contracts using causal models, orderbook analysis, and executable edge calculations.
Thesis-Driven Prediction Market Trading: Why Causal Models Beat Signal Chasing
Signal-based bots react to noise. Thesis-driven agents understand why prices should move. Here's how causal models change prediction market trading.
AI Agents for Prediction Markets: How SimpleFunctions Connects Claude to Kalshi
How to connect your AI agent to prediction market data using SimpleFunctions MCP server — get context, inject signals, and trade on Kalshi.
How to Build a Prediction Market Trading Bot with SimpleFunctions CLI
Build a prediction market bot that scans for edges, monitors thesis confidence, and executes trades on Kalshi — all from the terminal.