RLM: Code as Reasoning

RLM (Recursive Language Model) enables LLMs to solve complex tasks by writing and executing Python code in a sandboxed environment. Instead of processing everything in a single context window, the model can programmatically explore data, make recursive LLM calls, and iteratively refine its approach.

Why RLM?

Traditional LLM calls struggle with:

  • Long context: Performance degrades as context grows
  • Complex reasoning: Multi-step logic is error-prone in natural language
  • Data exploration: No way to peek, grep, or sample large datasets

RLM solves these by treating the prompt as a Python variable. The model can:

  • Peek at data to understand structure
  • Grep with regex to narrow search space
  • Partition + Map chunks with parallel llm_batch() for semantic tasks
  • Summarize subsets for decision-making
  • Process programmatically for deterministic operations

If the best frontier LLM can handle 10M tokens, an RLM can handle 100M tokens through recursive decomposition.

Quick Start

curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "query": "Find all messages about project deadlines and summarize them",
    "context": {
      "messages": [
        {"from": "alice", "text": "The API deadline is next Friday"},
        {"from": "bob", "text": "Can we push the deadline?"},
        ...
      ]
    }
  }'

Response:

{
  "id": "5f3c0f6f-1a77-4d8f-9bd7-3a4f5a1f5e11",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [{"type": "text", "text": "{\"deadline_messages\":[...],\"summary\":\"There are 5 messages about deadlines...\"}"}]
    }
  ],
  "stop_reason": "final",
  "model": "claude-sonnet-4-5",
  "answer": {
    "deadline_messages": [...],
    "summary": "There are 5 messages about deadlines..."
  },
  "iterations": 3,
  "subcalls": 2,
  "usage": {
    "input_tokens": 12500,
    "output_tokens": 2800,
    "total_tokens": 15300
  }
}

How It Works

  1. You send a query/context or /responses-style input to /rlm/execute
  2. The model receives a system prompt explaining the REPL environment
  3. Each iteration, the model writes Python code
  4. The sandbox executes the code and returns stdout/stderr
  5. The model sees the output and continues (or calls FINAL() to finish)
┌─────────────────────────────────────────────────────────┐
│                     Your Request                        │
│  query: "Find messages about deadlines"│  context: { messages: [...] }                           │
└─────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│                    ModelRelay                           │
1. Start sandboxed Python REPL                         │
2. Inject context as variable                          │
3. Run iterative loop:                                 │
- LLM generates code                                │
- Sandbox executes code                             │
- LLM sees output, continues or finishes            │
└─────────────────────────────────────────────────────────┘
              ┌───────────┴───────────┐
              ▼                       ▼
      Python Sandbox           llm_query() callbacks
      (Firecracker VM)         (recursive LLM calls)

Available Functions

Inside the sandbox, the model has access to:

Function Description
context The input data you provided
llm_query(prompt) Make a single LLM call (for summarization, classification, etc.)
llm_batch(prompts) Make parallel LLM calls (for processing multiple chunks)
FINAL(value) Return the answer and stop execution
# Model-generated code (you don't write this)

# 1. Peek at the data structure
print(f"Total messages: {len(context['messages'])}")
print(f"Sample: {context['messages'][0]}")

# 2. Find relevant messages with regex
import re
deadline_msgs = [m for m in context['messages']
                 if re.search(r'deadline|due|by friday', m['text'], re.I)]

# 3. Use LLM to summarize findings
if deadline_msgs:
    summary = llm_query(f"Summarize these deadline-related messages: {deadline_msgs}")
    FINAL({"messages": deadline_msgs, "summary": summary})
else:
    FINAL({"messages": [], "summary": "No deadline messages found"})

Request Parameters

Field Type Required Default Description
model string Yes Model for code generation and subcalls
query string No* The task for the LLM to accomplish
input array No* /responses-style input items (messages, tool results)
context any No null Data available as context variable
max_iterations integer No 10 Max code generation cycles
max_subcalls integer No 50 Max llm_query/llm_batch calls
max_depth integer No 1 Max recursion depth for nested RLM
timeout_ms integer No 60000 Timeout per code execution (ms)

*Provide either query or input. If both are present, the request is rejected.

Response

{
  "model": "claude-sonnet-4-5",
  "answer": { ... },
  "iterations": 4,
  "subcalls": 12,
  "usage": {
    "input_tokens": 15000,
    "output_tokens": 3000,
    "total_tokens": 18000
  },
  "trajectory": [
    {
      "iteration": 1,
      "code": "print(len(context['messages']))",
      "stdout": "1000\n",
      "stderr": "",
      "exit_code": 0
    },
    ...
  ]
}
Field Description
id Session-scoped response ID for the RLM execution
output /responses-style output (assistant text). JSON answers are encoded as text.
stop_reason final (RLM completed)
answer The value passed to FINAL()
iterations Number of code generation cycles used
subcalls Total llm_query + llm_batch calls made
usage Aggregated token usage across all LLM calls
trajectory Full execution history (code + output per iteration)

Streaming

For real-time updates, use NDJSON streaming:

curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -H 'Accept: application/x-ndjson; profile="rlm-stream/v1"' \
  -d '{"model": "claude-sonnet-4-5", "query": "...", "context": {...}}'

Stream events:

{"type": "iteration", "iteration": 1, "code": "...", "stdout": "...", "stderr": "", "exit_code": 0, "timed_out": false}
{"type": "iteration", "iteration": 2, "code": "...", "stdout": "...", "stderr": "", "exit_code": 0, "timed_out": false}
{"type": "final", "model": "claude-sonnet-4-5", "answer": {...}, "iterations": 2, "subcalls": 5, "usage": {...}}

Errors stream as:

{"type": "error", "message": "max iterations exceeded"}

Notes:

  • Subcalls include both llm_query and llm_batch.
  • The streaming payloads include per-iteration stderr and timed_out fields when present.

Event schema (fields are optional unless noted):

Type Required fields Optional fields
iteration iteration (int), code (string) stdout, stderr, exit_code, timed_out
error message (string)
final answer, iterations, subcalls, usage model

Limits and Errors

Condition HTTP Status Description
Invalid input 400 Missing required fields, negative limits
Max iterations exceeded 409 Model didn’t call FINAL() in time
Max subcalls exceeded 409 Too many llm_query/llm_batch calls
Max depth exceeded 409 Recursive RLM calls too deep
Execution timeout Sandbox restarts, model can recover

Best Practices

  1. Start simple: Let the model explore the data structure first
  2. Use llm_batch for parallelism: Processing chunks in parallel is faster and often cheaper
  3. Set appropriate limits: Higher max_iterations for complex tasks, lower for simple ones
  4. Monitor usage: RLM can make many subcalls; watch your token consumption

Compared to Workflows

Feature /rlm/execute Workflows
Complexity Single call Multi-node DAG
Use case Exploratory, data-intensive Structured pipelines
State Ephemeral Persistent runs
Flexibility Model decides approach You define the flow

Use /rlm/execute when:

  • You don’t know the exact steps needed
  • The task requires exploring or searching data
  • You want the model to decide how to decompose the problem

Use workflows when:

  • You have a known sequence of steps
  • You need persistent audit trails
  • You want explicit control over the execution flow