RLM: Code as Reasoning

RLM (Recursive Language Model) enables LLMs to solve complex tasks by writing and executing Python code in a sandboxed environment. Instead of processing everything in a single context window, the model can programmatically explore data, make recursive LLM calls, and iteratively refine its approach.

Why RLM?

Traditional LLM calls struggle with:

Long context: Performance degrades as context grows
Complex reasoning: Multi-step logic is error-prone in natural language
Data exploration: No way to peek, grep, or sample large datasets

RLM solves these by treating the prompt as a Python variable. The model can:

Peek at data to understand structure
Grep with regex to narrow search space
Partition + Map chunks with parallel llm_batch() for semantic tasks
Summarize subsets for decision-making
Process programmatically for deterministic operations

If the best frontier LLM can handle 10M tokens, an RLM can handle 100M tokens through recursive decomposition.

When to Use RLM

Use RLM when your task requires multi-step reasoning across data, not just finding a single answer:

Task Type	Best Tool
“Find this specific fact”	RAG / embeddings
“Answer questions about this document”	Long context
“Analyze patterns across many items”	RLM
“Aggregate and summarize a dataset”	RLM
“Find anomalies or outliers”	RLM

Use Cases

For Developers & Individuals

Use Case	Example Query	What RLM Does
Personal finance	“Categorize 2 years of transactions and find forgotten subscriptions”	Parse → categorize → aggregate → find anomalies
Codebase analysis	“Find API endpoints missing authentication”	Grep patterns → cross-reference files → trace calls
Research synthesis	“Compare methodologies across these 30 papers”	Partition → extract → compare → synthesize
Chat/email mining	“Find all commitments I made in Slack this quarter”	Grep promise patterns → filter dates → summarize
Log debugging	“Why did my app crash? Here’s 500K lines of logs”	Grep errors → trace backwards → root cause

For Enterprise

Use Case	Example Query	What RLM Does
Contract intelligence	“Find contracts expiring in Q2 with auto-renewal clauses”	Parse legal text → extract dates/clauses → cross-reference
Support ticket analysis	“Top 5 feature requests hidden in 50K tickets”	Categorize → aggregate → rank → extract examples
Compliance auditing	“Check 1000 documents for PII that shouldn’t be there”	Systematic scan → flag issues → provide evidence
Incident investigation	“Correlate 5 services’ logs to find root cause”	Multi-source grep → timeline → causal analysis
Competitive intelligence	“Analyze 200 competitor press releases for trends”	Extract announcements → categorize → timeline

What Makes These RLM-Appropriate

These tasks share common traits:

Volume: Too much data to review manually
Reasoning: Answer requires connecting dots across items
Iteration: Need to explore before knowing what to look for
Aggregation: Final answer synthesizes many pieces

Quick Start

curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "query": "Find all messages about project deadlines and summarize them",
    "context": {
      "messages": [
        {"from": "alice", "text": "The API deadline is next Friday"},
        {"from": "bob", "text": "Can we push the deadline?"},
        ...
      ]
    }
  }'

Response:

{
  "id": "5f3c0f6f-1a77-4d8f-9bd7-3a4f5a1f5e11",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{"type": "text", "text": "{\"deadline_messages\":[...],\"summary\":\"There are 5 messages about deadlines...\"}"}]
    }
  ],
  "stop_reason": "final",
  "model": "claude-sonnet-4-5",
  "answer": {
    "deadline_messages": [...],
    "summary": "There are 5 messages about deadlines..."
  },
  "iterations": 3,
  "subcalls": 2,
  "usage": {
    "input_tokens": 12500,
    "output_tokens": 2800,
    "total_tokens": 15300
  }
}

Large Contexts (Upload Once, Reuse Many)

For large contexts (e.g., document collections, chat histories), upload once and reference by ID:

Why use context handles?

Faster requests: Skip re-uploading megabytes of data on every call
Lower bandwidth: Upload once, reference many times
Multiple queries: Ask different questions about the same data

Step 1: Upload Context

curl -X POST https://api.modelrelay.ai/api/v1/rlm/context \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "context": { "messages": [...] },
    "ttl_seconds": 3600
  }'

Field	Type	Required	Default	Description
`context`	any	Yes	—	JSON payload to store (max 10MB)
`ttl_seconds`	integer	No	86400 (24h)	Time-to-live in seconds (max 30 days)

Response:

{
  "id": "c9bfa8e3-9c0e-4b5d-9d5d-1f3b3fb0b7c8",
  "expires_at": "2025-01-19T11:00:00.000Z",
  "size_bytes": 5242880
}

Step 2: Execute with Context Reference

Use context_ref instead of context:

curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "query": "Summarize the key themes",
    "context_ref": "c9bfa8e3-9c0e-4b5d-9d5d-1f3b3fb0b7c8"
  }'

You can call /rlm/execute multiple times with the same context_ref until it expires:

# First query
curl ... -d '{"model": "...", "query": "Find action items", "context_ref": "c9bfa8e3-..."}'

# Second query (same context, different question)
curl ... -d '{"model": "...", "query": "Who mentioned deadlines?", "context_ref": "c9bfa8e3-..."}'

# Third query
curl ... -d '{"model": "...", "query": "Summarize by topic", "context_ref": "c9bfa8e3-..."}'

Important Notes

Immutable: Context handles cannot be updated. Upload a new one if your data changes.
Expiration: Returns 404 after expires_at. Upload again with a new TTL.
Scoped: Customer-scoped tokens can only access their own context handles.
Either/or: Provide context OR context_ref, never both.

How It Works

You send a query/context or /responses-style input to /rlm/execute
The model receives a system prompt explaining the REPL environment
Each iteration, the model writes Python code
The sandbox executes the code and returns stdout/stderr
The model sees the output and continues (or calls FINAL() to finish)

┌─────────────────────────────────────────────────────────┐
│                     Your Request                        │
│  query: "Find messages about deadlines"                 │
│  context: { messages: [...] }                           │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│                    ModelRelay                           │
│  1. Start sandboxed Python REPL                         │
│  2. Inject context as variable                          │
│  3. Run iterative loop:                                 │
│     - LLM generates code                                │
│     - Sandbox executes code                             │
│     - LLM sees output, continues or finishes            │
└─────────────────────────────────────────────────────────┘
                          │
              ┌───────────┴───────────┐
              ▼                       ▼
      Python Sandbox           llm_query() callbacks
      (Firecracker VM)         (recursive LLM calls)

Available Functions

Inside the sandbox, the model has access to:

Function	Description
`context`	The input data you provided
`llm_query(prompt)`	Make a single LLM call (for summarization, classification, etc.)
`llm_batch(prompts)`	Make parallel LLM calls (for processing multiple chunks)
`FINAL(value)`	Return the answer and stop execution

Example: Semantic Search

# Model-generated code (you don't write this)

# 1. Peek at the data structure
print(f"Total messages: {len(context['messages'])}")
print(f"Sample: {context['messages'][0]}")

# 2. Find relevant messages with regex
import re
deadline_msgs = [m for m in context['messages']
                 if re.search(r'deadline|due|by friday', m['text'], re.I)]

# 3. Use LLM to summarize findings
if deadline_msgs:
    summary = llm_query(f"Summarize these deadline-related messages: {deadline_msgs}")
    FINAL({"messages": deadline_msgs, "summary": summary})
else:
    FINAL({"messages": [], "summary": "No deadline messages found"})

Request Parameters

Field	Type	Required	Default	Description
`model`	string	Yes	—	Model for code generation and subcalls
`query`	string	No*	—	The task for the LLM to accomplish
`input`	array	No*	—	`/responses`-style input items (messages, tool results)
`context`	any	No	`null`	Data available as `context` variable
`context_ref`	string	No	—	Context handle ID (UUID). Provide instead of `context`.
`max_iterations`	integer	No	10	Max code generation cycles
`max_subcalls`	integer	No	50	Max `llm_query`/`llm_batch` calls
`max_depth`	integer	No	1	Max recursion depth for nested RLM
`timeout_ms`	integer	No	60000	Timeout per code execution (ms)

*Provide either query or input. If both are present, the request is rejected.

Note: input must be text-only. Non-text content parts (files/images/audio) are rejected.

Response

{
  "model": "claude-sonnet-4-5",
  "answer": { ... },
  "iterations": 4,
  "subcalls": 12,
  "usage": {
    "input_tokens": 15000,
    "output_tokens": 3000,
    "total_tokens": 18000
  },
  "trajectory": [
    {
      "iteration": 1,
      "code": "print(len(context['messages']))",
      "stdout": "1000\n",
      "stderr": "",
      "exit_code": 0
    },
    ...
  ]
}

Field	Description
`id`	Session-scoped response ID for the RLM execution
`output`	`/responses`-style output (assistant text). JSON answers are encoded as text.
`stop_reason`	`final` (RLM completed)
`answer`	The value passed to `FINAL()`
`iterations`	Number of code generation cycles used
`subcalls`	Total `llm_query` + `llm_batch` calls made
`usage`	Aggregated token usage across all LLM calls
`trajectory`	Full execution history (code + output per iteration)
`progress`	Progress events emitted during execution

Streaming

Streaming for /rlm/execute is not supported yet. Use the progress array in the response
to inspect per-iteration status, and subscribe to workflow run events if you are executing
RLM via llm.rlm nodes in a workflow.

Limits and Errors

Condition	HTTP Status	Description
Invalid input	400	Missing required fields, negative limits
Max iterations exceeded	409	Model didn’t call `FINAL()` in time
Max subcalls exceeded	409	Too many `llm_query`/`llm_batch` calls
Max depth exceeded	409	Recursive RLM calls too deep
Execution timeout	—	Sandbox restarts, model can recover

Best Practices

Start simple: Let the model explore the data structure first
Use llm_batch for parallelism: Processing chunks in parallel is faster and often cheaper
Set appropriate limits: Higher max_iterations for complex tasks, lower for simple ones
Monitor usage: RLM can make many subcalls; watch your token consumption

Compared to Workflows

Feature	`/rlm/execute`	Workflows
Complexity	Single call	Multi-node DAG
Use case	Exploratory, data-intensive	Structured pipelines
State	Ephemeral	Persistent runs
Flexibility	Model decides approach	You define the flow

Use /rlm/execute when:

You don’t know the exact steps needed
The task requires exploring or searching data
You want the model to decide how to decompose the problem

Use workflows when:

You have a known sequence of steps
You need persistent audit trails
You want explicit control over the execution flow