Your agents pay for every token.
Most of them don't need to.
Relay classifies context as ephemeral or durable — so agents only send what's needed, refresh what's stale, and never waste tokens on outdated state.
Without Relay
With Relay
Cost Impact
Context Classification Engine
Relay knows what to remember, what to forget, what to refresh
Ephemeral Context
News, prices, API statuses, competitor data — anything time-sensitive. Relay tags these for auto-refresh on agent startup. Agents never act on stale signals.
ttl: 3600s
refresh: on_startup
Durable Context
User preferences, domain knowledge, decision frameworks — persistent across sessions. Relay loads these directly without re-fetching. Stable signal, zero waste.
ttl: indefinite
refresh: on_change
Synthesized Briefing
On startup, Relay refreshes expired ephemeral data and loads durable context. The agent receives a single clean briefing — not a raw history dump.
tokens: minimized
accuracy: current
Inference Platform Integration
For platforms where every token counts
Phase 1: Extract
What Relay Parses
Extraction Quality
Phase 2: Classify
Ephemeral Examples
competitor pricing data
live API response bodies
market news & events
rate limits & quotas
session-specific state
Durable Examples
user preferences & style
domain knowledge
decision frameworks
past decisions & rationale
system configuration
Phase 3: Refresh & Deliver
Refresh Pipeline
Delivery Format
# Agent Briefing — 2026-02-24
## Durable
user_pref: concise responses
## Ephemeral (refreshed)
competitor_price: $0.40/Mtok
api_status: ✓ operational
Token Economics
What your customers stop paying for
Eliminated Waste
Measured Impact
Integration
Sits in front of any inference call
Connect
SDK wraps your existing inference calls. No architecture changes.
Tag
Relay auto-classifies context. Override with explicit tags where precision matters.
Ship
Agents start sessions from clean briefings. Inference costs drop. Accuracy holds.
const response = await deepinfra.chat({)
model: "meta-llama/...",
messages: fullHistory, // 40k tokens — most stale
});
const briefing = await relay.getBriefing(sessionId);
const response = await deepinfra.chat({)
model: "meta-llama/...",
messages: briefing.messages, // ~18k tokens — all current
});