Relay — Smart Context for AI Inference

Your agents pay for every token.
Most of them don't need to.

Relay classifies context as ephemeral or durable — so agents only send what's needed, refresh what's stale, and never waste tokens on outdated state.

Without Relay

Context sent per callFull history
Stale data in window~40%
Session startupCold start

With Relay

Context sent per callRelevant only
Stale data in window<5%
Session startupBriefed instantly

Cost Impact

Token reduction30–50%
Latency improvement↓ per call
Context accuracyAlways current

Context Classification Engine

Relay knows what to remember, what to forget, what to refresh

Ephemeral Context

News, prices, API statuses, competitor data — anything time-sensitive. Relay tags these for auto-refresh on agent startup. Agents never act on stale signals.

tag: ephemeral
ttl: 3600s
refresh: on_startup

Durable Context

User preferences, domain knowledge, decision frameworks — persistent across sessions. Relay loads these directly without re-fetching. Stable signal, zero waste.

tag: durable
ttl: indefinite
refresh: on_change

Synthesized Briefing

On startup, Relay refreshes expired ephemeral data and loads durable context. The agent receives a single clean briefing — not a raw history dump.

output: briefing.md
tokens: minimized
accuracy: current

Inference Platform Integration

For platforms where every token counts

Phase 1: Extract

Automatic
What Relay Parses
Conversation turnsEntities extracted
Tool call outputsClassified & stored
External sourcesTagged by type
Extraction Quality
Entity detectionAutomatic
Relationship mappingIncluded
Latency overhead<50ms

Phase 2: Classify

Temporal Tagging
Ephemeral Examples

competitor pricing data

live API response bodies

market news & events

rate limits & quotas

session-specific state

Durable Examples

user preferences & style

domain knowledge

decision frameworks

past decisions & rationale

system configuration

Phase 3: Refresh & Deliver

On Agent Startup
Refresh Pipeline
Expired ephemeralRe-fetched live
Durable contextLoaded directly
Briefing outputSynthesized
Delivery Format

# Agent Briefing — 2026-02-24

## Durable

user_pref: concise responses

## Ephemeral (refreshed)

competitor_price: $0.40/Mtok

api_status: ✓ operational

Token Economics

What your customers stop paying for

Eliminated Waste
Stale context re-sentEliminated
Full history on every callReplaced by briefing
Redundant tool callsCached when durable
Cold-start re-scaffoldingGone
Measured Impact
Input token reduction30–50%
Response relevance↑ (current context)
Time-to-first-tokenFaster
Net cost per sessionMaterially lower

Integration

Sits in front of any inference call

STEP 1

Connect

SDK wraps your existing inference calls. No architecture changes.

<1 day
STEP 2

Tag

Relay auto-classifies context. Override with explicit tags where precision matters.

Automatic
STEP 3

Ship

Agents start sessions from clean briefings. Inference costs drop. Accuracy holds.

Measurable
// Before Relay

const response = await deepinfra.chat({)

model: "meta-llama/...",

messages: fullHistory, // 40k tokens — most stale

});

// After Relay

const briefing = await relay.getBriefing(sessionId);

const response = await deepinfra.chat({)

model: "meta-llama/...",

messages: briefing.messages, // ~18k tokens — all current

});

Relay — Smart Context for AI Inference | SparkCo