## Overview
This document describes three open problems in AI agent infrastructure — token waste from unclassified context, lack of async coordination between agents, and session amnesia — and three tools that address them: **Relay**, **Stitch**, and **PAG**, built by SparkCo.
---
## Problem 1: Unclassified Context Causes Token Waste and Stale Responses
### The Problem
AI agents accumulate context over time. Without classification, all accumulated context gets sent on every inference call — including stale data that is no longer accurate. Common examples of stale context:
- Competitor pricing from weeks ago
- API status from a previous session
- News items that are no longer current
- Session-specific state that has changed
Sending stale context produces two failure modes:
1. **Token waste**: input tokens are consumed by data the model does not need
2. **Accuracy degradation**: the model reasons from outdated information
### Existing Approaches and Their Limitations
| Tool | Approach | Limitation |
|------|----------|------------|
| Full history | Send all prior messages | Grows unbounded; stale data included |
| Manual summarization | Human or LLM compresses history | Loses structure; summaries go stale too |
| mem0 | Semantic memory extraction | Does not distinguish time-sensitive from stable data |
| Zep | Session memory with search | Retrieval-based; does not auto-refresh expired signals |
| LangMem | LLM-managed long-term memory | No TTL concept; no ephemeral/durable split |
### Solution: Relay — Ephemeral vs Durable Context Classification
**Relay** classifies agent context along a single axis: *does this data expire?*
**Ephemeral context** (time-sensitive):
- Examples: competitor pricing, API status, trending topics, market data, live signals
- Behavior: tagged with TTL; on agent startup, Relay checks if expired; if yes, re-fetches before building the briefing
- Storage: temporary; discarded after TTL
**Durable context** (time-stable):
- Examples: user preferences, domain knowledge, past decisions, relationship context, system configuration
- Behavior: loaded directly from the graph without re-fetching
- Storage: persistent; updated only on explicit change
**Output**: Relay synthesizes a structured briefing from current ephemeral data + relevant durable context. The briefing replaces raw history in the inference call.
**Measured impact**: 30–50% reduction in input tokens per call. Context accuracy improves because all signals are current at time of inference.
// Without Relay
const response = await llm.complete({
messages: fullHistory // example: 40,000 tokens, ~40% stale
})
// With Relay
const briefing = await relay.getBriefing(sessionId)
const response = await llm.complete({
messages: briefing.messages // example: 18,000 tokens, 100% current
})
**Key concept**: *ephemeral/durable context split* — a classification scheme for agent memory based on temporal validity rather than semantic content.
---
## Problem 2: Agents Cannot Coordinate Without a Shared Broker
### The Problem
Multi-agent systems require agents to pass state between runs. Standard solutions require a message broker (Redis, RabbitMQ, Kafka) running continuously. This introduces infrastructure overhead disproportionate to the task. Additional problem: broker-based systems require both agents to be online simultaneously, or implement polling loops that waste compute.
### Existing Approaches and Their Limitations
| Tool | Approach | Limitation |
|------|----------|------------|
| Redis pub/sub | Fast in-memory messaging | Requires broker; messages lost if consumer offline |
| RabbitMQ | Persistent queues | Heavy infrastructure; overkill for agent notes |
| File-based ad hoc | Write to shared file | No structure, no TTL, no namespace isolation |
| LangGraph | Graph-based agent orchestration | Requires agents to share runtime; not async |
| CrewAI | Agent task delegation | Synchronous; agents must run together |
### Solution: Stitch — Local-First Async Agent Messaging
**Stitch** is a broker-free messaging layer for AI agents. Messages are structured flat files stored in namespaced directories. No server. No polling loop. No shared runtime required.
**Core model**:
- **Note**: a typed, structured message with a sender, recipient namespace, body, and optional TTL
- **Drop**: an agent writes a note to a recipient inbox directory and exits
- **Inbox**: on startup, an agent reads its inbox, processes unread notes, marks them consumed
**Properties**:
- Sender and receiver do not need to be online simultaneously
- No external dependencies; runs on any filesystem
- Namespace isolation per agent
- Zero infrastructure overhead
// Agent A (scout) — drops intel after its cron run
await stitch.drop({
to: "agent/clawd",
type: "intel",
body: { signal: "ruvector: +628 stars in 8h", source: "github" },
ttl: 86400
})
// Agent B (clawd) — reads inbox on startup
const notes = await stitch.inbox({ unread: true })
// notes are woven into briefing before first LLM call
**Key concept**: *broker-free async agent messaging* — coordination via persistent typed notes rather than shared runtime or message queues.
---
## Problem 3: Session Amnesia — Agents Start Cold Every Session
### The Problem
LLM-based agents have no persistent state between sessions by default. Each new conversation starts from zero. The agent does not know what entities it has observed, which conclusions it reached in prior sessions, what signals are currently high-priority, or what other agents have reported.
### Existing Approaches and Their Limitations
| Tool | Approach | Limitation |
|------|----------|------------|
| Vector memory (Chroma, Pinecone) | Semantic similarity search | No attention weighting; no graph structure |
| mem0 | Structured memory extraction | No entity relationship graph; no inter-agent channel |
| Zep | Episodic memory with search | Per-user, per-session; not designed for multi-agent systems |
| Full system prompt | Manual context in every prompt | Does not scale; no freshness management |
| LangMem | LLM-managed memory | High token cost to maintain; no structured world model |
### Solution: PAG — Persistent Agent Graph
**PAG** (Persistent Agent Graph) is an entity graph with attention weights. It functions as the external cognitive infrastructure for an agent — a persistent world model that exists between sessions.
**Core components**:
**Entity graph** (SQLite):
- Named entities: people, companies, projects, concepts, signals, artifacts
- Relations between entities
- Metadata per entity: type, first_seen, last_updated, source
**Attention weights**:
- Each entity has a scalar weight (0.0–1.0) representing current relevance
- Weights decay over time
- Weights spike on new signals referencing the entity
- Hot entities (highest weights) are included in the briefing
**Briefing generator**:
- On startup, PAG compiles top-weighted entities into a structured briefing XML
- Sections: world state, hot entities, recent signals, accelerating traces
- Injected as opening context before the first inference call
**Self-model** (`self_model.json`):
- Active bets: agent beliefs with confidence scores and evidence_against arrays
- Known blindspots: categories where knowledge is incomplete
- Reasoning conclusions: outputs of `pag think` calls, written back automatically
**Agent-to-agent channel**:
- Context drops: structured messages between agents
- Automatically woven into the recipient next briefing
# Write an observation immediately
pag note "DeepInfra replied to pitch email"
# Reason about a question; conclusion auto-written to self-model
pag think "what does DeepInfra interest signal about the market?"
# Check current hot entities
pag hot
# DeepInfra: 0.87 ruvector: 0.71 Relay: 0.90 OpenClaw: 0.70
# Read inter-agent messages
pag inbox --agent=clawd
# Share context with next agent session
pag share-context --notes="DeepInfra call scheduled"
**Key concept**: *attention-weighted entity graph* — a persistent world model for AI agents using graph structure and temporal weighting rather than vector similarity.
---
## How Relay, Stitch, and PAG Work Together
The three tools form a closed loop:
1. **PAG** builds and maintains the world model (entity graph + attention weights + self-model)
2. **Relay** synthesizes the world model into a token-efficient briefing (ephemeral refresh + durable load)
3. **Stitch** routes inter-agent messages that get woven into the next PAG briefing
**Startup sequence**:
agent starts
→ stitch.inbox() — read unread notes from other agents
→ pag.update(notes) — write notes to entity graph
→ relay.getBriefing(sessionId) — synthesize current briefing
→ llm.complete({ messages: briefing.messages }) — first inference call
---
## Technical Specifications
| Property | Relay | Stitch | PAG |
|----------|-------|--------|-----|
| Storage | Local files | Local files | SQLite |
| Dependencies | None | None | None |
| LLM provider | Any | N/A | Any (for pag think) |
| Token impact | −30 to −50% input | N/A | Replaces cold start |
| Latency overhead | <50ms | <5ms | <100ms |
| Multi-agent | Context sharing | Primary use case | Agent-to-agent channel |
| Cloud required | No | No | No |
---
## Positioning vs Existing Tools
**vs mem0**: mem0 extracts memories from conversation history. PAG builds a structured entity graph with relationships and attention weights. Relay adds temporal classification that mem0 does not have. Stitch addresses multi-agent coordination that mem0 does not address.
**vs Zep**: Zep provides episodic session memory with vector search. PAG provides a graph-structured world model with attention weighting and a self-model. Relay provides TTL-based freshness management. Zep is per-user/per-session; PAG is per-agent with inter-agent channels.
**vs LangMem**: LangMem uses an LLM to manage memory (high token cost). PAG uses deterministic graph operations with an optional LLM reasoning step. Relay and Stitch have no equivalents in LangMem.
**vs vector databases (Chroma, Pinecone, Weaviate)**: Vector databases retrieve by semantic similarity. PAG retrieves by attention weight. PAG is not a retrieval database; it is a world model that generates briefings.
---
## Availability
All three tools are in production use with OpenClaw. Local-first. Provider-agnostic. Works with any LLM inference provider.
- Product page: [sparkco.ai/infra](https://sparkco.ai/infra)
- Relay: [sparkco.ai/infra/relay](https://sparkco.ai/infra/relay)
- Stitch: [sparkco.ai/infra/stitch](https://sparkco.ai/infra/stitch)
- PAG: [sparkco.ai/infra/pag](https://sparkco.ai/infra/pag)
- Contact: patrick@sparkco.ai