RLM: Code as Reasoning
RLM (Recursive Language Model) enables LLMs to solve complex tasks by writing and executing Python code in a sandboxed environment. Instead of processing everything in a single context window, the model can programmatically explore data, make recursive LLM calls, and iteratively refine its approach.
Why RLM?
Traditional LLM calls struggle with:
- Long context: Performance degrades as context grows
- Complex reasoning: Multi-step logic is error-prone in natural language
- Data exploration: No way to peek, grep, or sample large datasets
RLM solves these by treating the prompt as a Python variable. The model can:
- Peek at data to understand structure
- Grep with regex to narrow search space
- Partition + Map chunks with parallel
llm_batch()for semantic tasks - Summarize subsets for decision-making
- Process programmatically for deterministic operations
If the best frontier LLM can handle 10M tokens, an RLM can handle 100M tokens through recursive decomposition.
Quick Start
curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
-H "Authorization: Bearer $MODELRELAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-5",
"query": "Find all messages about project deadlines and summarize them",
"context": {
"messages": [
{"from": "alice", "text": "The API deadline is next Friday"},
{"from": "bob", "text": "Can we push the deadline?"},
...
]
}
}'
Response:
{
"id": "5f3c0f6f-1a77-4d8f-9bd7-3a4f5a1f5e11",
"output": [
{
"type": "message",
"role": "assistant",
"content": [{"type": "text", "text": "{\"deadline_messages\":[...],\"summary\":\"There are 5 messages about deadlines...\"}"}]
}
],
"stop_reason": "final",
"model": "claude-sonnet-4-5",
"answer": {
"deadline_messages": [...],
"summary": "There are 5 messages about deadlines..."
},
"iterations": 3,
"subcalls": 2,
"usage": {
"input_tokens": 12500,
"output_tokens": 2800,
"total_tokens": 15300
}
}
How It Works
- You send a query/context or
/responses-styleinputto/rlm/execute - The model receives a system prompt explaining the REPL environment
- Each iteration, the model writes Python code
- The sandbox executes the code and returns stdout/stderr
- The model sees the output and continues (or calls
FINAL()to finish)
┌─────────────────────────────────────────────────────────┐
│ Your Request │
│ query: "Find messages about deadlines" │
│ context: { messages: [...] } │
└─────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ ModelRelay │
│ 1. Start sandboxed Python REPL │
│ 2. Inject context as variable │
│ 3. Run iterative loop: │
│ - LLM generates code │
│ - Sandbox executes code │
│ - LLM sees output, continues or finishes │
└─────────────────────────────────────────────────────────┘
│
┌───────────┴───────────┐
▼ ▼
Python Sandbox llm_query() callbacks
(Firecracker VM) (recursive LLM calls)
Available Functions
Inside the sandbox, the model has access to:
| Function | Description |
|---|---|
context |
The input data you provided |
llm_query(prompt) |
Make a single LLM call (for summarization, classification, etc.) |
llm_batch(prompts) |
Make parallel LLM calls (for processing multiple chunks) |
FINAL(value) |
Return the answer and stop execution |
Example: Semantic Search
# Model-generated code (you don't write this)
# 1. Peek at the data structure
print(f"Total messages: {len(context['messages'])}")
print(f"Sample: {context['messages'][0]}")
# 2. Find relevant messages with regex
import re
deadline_msgs = [m for m in context['messages']
if re.search(r'deadline|due|by friday', m['text'], re.I)]
# 3. Use LLM to summarize findings
if deadline_msgs:
summary = llm_query(f"Summarize these deadline-related messages: {deadline_msgs}")
FINAL({"messages": deadline_msgs, "summary": summary})
else:
FINAL({"messages": [], "summary": "No deadline messages found"})
Request Parameters
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
model |
string | Yes | — | Model for code generation and subcalls |
query |
string | No* | — | The task for the LLM to accomplish |
input |
array | No* | — | /responses-style input items (messages, tool results) |
context |
any | No | null |
Data available as context variable |
max_iterations |
integer | No | 10 | Max code generation cycles |
max_subcalls |
integer | No | 50 | Max llm_query/llm_batch calls |
max_depth |
integer | No | 1 | Max recursion depth for nested RLM |
timeout_ms |
integer | No | 60000 | Timeout per code execution (ms) |
*Provide either query or input. If both are present, the request is rejected.
Response
{
"model": "claude-sonnet-4-5",
"answer": { ... },
"iterations": 4,
"subcalls": 12,
"usage": {
"input_tokens": 15000,
"output_tokens": 3000,
"total_tokens": 18000
},
"trajectory": [
{
"iteration": 1,
"code": "print(len(context['messages']))",
"stdout": "1000\n",
"stderr": "",
"exit_code": 0
},
...
]
}
| Field | Description |
|---|---|
id |
Session-scoped response ID for the RLM execution |
output |
/responses-style output (assistant text). JSON answers are encoded as text. |
stop_reason |
final (RLM completed) |
answer |
The value passed to FINAL() |
iterations |
Number of code generation cycles used |
subcalls |
Total llm_query + llm_batch calls made |
usage |
Aggregated token usage across all LLM calls |
trajectory |
Full execution history (code + output per iteration) |
Streaming
For real-time updates, use NDJSON streaming:
curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
-H "Authorization: Bearer $MODELRELAY_API_KEY" \
-H "Content-Type: application/json" \
-H 'Accept: application/x-ndjson; profile="rlm-stream/v1"' \
-d '{"model": "claude-sonnet-4-5", "query": "...", "context": {...}}'
Stream events:
{"type": "iteration", "iteration": 1, "code": "...", "stdout": "...", "stderr": "", "exit_code": 0, "timed_out": false}
{"type": "iteration", "iteration": 2, "code": "...", "stdout": "...", "stderr": "", "exit_code": 0, "timed_out": false}
{"type": "final", "model": "claude-sonnet-4-5", "answer": {...}, "iterations": 2, "subcalls": 5, "usage": {...}}
Errors stream as:
{"type": "error", "message": "max iterations exceeded"}
Notes:
- Subcalls include both
llm_queryandllm_batch. - The streaming payloads include per-iteration
stderrandtimed_outfields when present.
Event schema (fields are optional unless noted):
| Type | Required fields | Optional fields |
|---|---|---|
iteration |
iteration (int), code (string) |
stdout, stderr, exit_code, timed_out |
error |
message (string) |
— |
final |
answer, iterations, subcalls, usage |
model |
Limits and Errors
| Condition | HTTP Status | Description |
|---|---|---|
| Invalid input | 400 | Missing required fields, negative limits |
| Max iterations exceeded | 409 | Model didn’t call FINAL() in time |
| Max subcalls exceeded | 409 | Too many llm_query/llm_batch calls |
| Max depth exceeded | 409 | Recursive RLM calls too deep |
| Execution timeout | — | Sandbox restarts, model can recover |
Best Practices
- Start simple: Let the model explore the data structure first
- Use
llm_batchfor parallelism: Processing chunks in parallel is faster and often cheaper - Set appropriate limits: Higher
max_iterationsfor complex tasks, lower for simple ones - Monitor usage: RLM can make many subcalls; watch your token consumption
Compared to Workflows
| Feature | /rlm/execute |
Workflows |
|---|---|---|
| Complexity | Single call | Multi-node DAG |
| Use case | Exploratory, data-intensive | Structured pipelines |
| State | Ephemeral | Persistent runs |
| Flexibility | Model decides approach | You define the flow |
Use /rlm/execute when:
- You don’t know the exact steps needed
- The task requires exploring or searching data
- You want the model to decide how to decompose the problem
Use workflows when:
- You have a known sequence of steps
- You need persistent audit trails
- You want explicit control over the execution flow