Data Source Integration

Connect external data sources (Gmail, Slack, databases, etc.) to RLM. Your wrapper service implements three endpoints; the LLM gets Python functions to search and retrieve your data.

Overview

When you include a data_source in your /rlm/execute request, the Python sandbox automatically gets helper functions:

Function Purpose
data_source_search(query, filters, page) Search or list items
data_source_get(id) Fetch item metadata
data_source_content(id, format, max_bytes) Fetch item content

These functions call YOUR wrapper service, which translates requests to your data source (Gmail API, database, etc.).

┌─────────────────┐    ┌──────────────────┐    ┌────────────────┐    ┌────────────┐
│   Your App      │───▶│  ModelRelay RLM  │───▶│  Your Wrapper  │───▶│ Data Source│
│                 │    │  (Python sandbox)│    │  (translates)  │    │ (Gmail, DB)│
└─────────────────┘    └──────────────────┘    └────────────────┘    └────────────┘

Quick Start

  1. Build a wrapper with /search, /get, /content endpoints
  2. Call /rlm/execute with data_source pointing to your wrapper
  3. The LLM writes Python using your data via the helper functions

Configuration

The data_source parameter in /rlm/execute:

Field Type Required Description
type string Yes Must be "wrapper_v1"
base_url string Yes HTTPS URL to your wrapper
token string Yes Bearer token for authorization
allowed_hosts string[] No Additional hosts for sandbox egress
limits object No Request/response limits

Limits

Field Type Default Description
timeout_ms integer Per-request timeout (ms)
max_requests integer Max requests per RLM execution
max_response_bytes integer Max response size per request

Example Request

curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "query": "Find emails about project deadlines and summarize them",
    "data_source": {
      "type": "wrapper_v1",
      "base_url": "https://your-wrapper.example.com",
      "token": "user_session_token_xxx",
      "allowed_hosts": ["attachment-cdn.example.com"],
      "limits": {
        "timeout_ms": 15000,
        "max_requests": 20,
        "max_response_bytes": 1048576
      }
    }
  }'

Python Functions

When data_source is configured, these functions become available in the sandbox:

data_source_search(query, filters=None, page=None)

Search or list items from your data source.

# Example: Search for emails about deadlines
results = data_source_search(
    query="project deadline",
    filters={"label": "inbox", "has_attachment": True},
    page={"limit": 25}
)
# Returns: {"items": [...], "next_cursor": "..."}

# The LLM might write:
for item in results["items"]:
    print(f"Found: {item['id']} - {item['title']}")

Calls POST {base_url}/search with:

{"query": "project deadline", "filters": {"label": "inbox"}, "page": {"limit": 25}}

data_source_get(id)

Fetch metadata for a single item.

# Example: Get email metadata
email = data_source_get("msg_abc123")
# Returns: {"id": "msg_abc123", "title": "Re: Project Update", "from": "alice@example.com", ...}

print(f"From: {email['from']}, Subject: {email['title']}")

Calls POST {base_url}/get with:

{"id": "msg_abc123"}

data_source_content(id, format=“text”, max_bytes=None)

Fetch the content/body of an item.

# Example: Get email body as text
body = data_source_content("msg_abc123", format="text", max_bytes=50000)
# Returns: {"id": "msg_abc123", "format": "text", "content": "...", "truncated": false}

# Analyze the content
if "deadline" in body["content"].lower():
    summary = llm_query(f"Summarize this email about deadlines: {body['content']}")

Calls POST {base_url}/content with:

{"id": "msg_abc123", "format": "text", "max_bytes": 50000}

Endpoint Contract

Your wrapper must implement these POST endpoints under your base_url:

POST /search

Query and paginate items.

Request:

{
  "query": "Find contracts",
  "filters": {"type": "document", "date_after": "2024-01-01"},
  "page": {"limit": 25, "cursor": "cursor_abc"}
}
Field Type Required Description
query string Yes Search query (semantic or keyword)
filters object No Domain-specific filters
page object No Pagination (limit, cursor)

Response:

{
  "items": [
    {"id": "msg_1", "title": "Q1 Contract", "from": "legal@company.com", "date": "2024-03-15"},
    {"id": "msg_2", "title": "Contract Renewal", "from": "vendor@partner.com", "date": "2024-03-10"}
  ],
  "next_cursor": "cursor_xyz"
}
Field Type Required Description
items array Yes List of item objects with id and metadata
next_cursor string No Cursor for next page (omit if no more results)

POST /get

Fetch metadata for a single item.

Request:

{"id": "msg_1"}
Field Type Required Description
id string Yes Item identifier

Response:

{
  "id": "msg_1",
  "title": "Q1 Contract",
  "from": "legal@company.com",
  "date": "2024-03-15",
  "mime_type": "message/rfc822",
  "has_attachments": true
}

Return whatever metadata is useful for your domain. The id field is required.

POST /content

Fetch or export item content.

Request:

{
  "id": "msg_1",
  "format": "text",
  "max_bytes": 200000
}
Field Type Required Description
id string Yes Item identifier
format string No Requested format (e.g., text, html, raw)
max_bytes integer No Max content size hint

Response:

{
  "id": "msg_1",
  "format": "text",
  "content": "Hi team,\n\nPlease review the attached contract by Friday...",
  "truncated": true
}
Field Type Required Description
id string Yes Item identifier
format string Yes Actual format returned
content string Yes The item content
truncated boolean No True if content was cut off at max_bytes

Error Response

All endpoints should return errors as:

{
  "error": {
    "code": "not_found",
    "message": "Item msg_999 not found"
  }
}

Common error codes: not_found, unauthorized, rate_limited, invalid_request.

Complete Example: Gmail Wrapper

Here’s an end-to-end example showing a Gmail wrapper integration.

1. Your Wrapper Endpoints

Your wrapper translates between the wrapper_v1 contract and the Gmail API:

# Pseudocode for your wrapper service

@app.post("/search")
def search(request):
    # Translate to Gmail API
    gmail_query = request.query
    if request.filters.get("label"):
        gmail_query += f" in:{request.filters['label']}"

    messages = gmail.users().messages().list(
        userId="me",
        q=gmail_query,
        maxResults=request.page.get("limit", 25)
    ).execute()

    return {
        "items": [{"id": m["id"], "title": get_subject(m)} for m in messages],
        "next_cursor": messages.get("nextPageToken")
    }

@app.post("/get")
def get(request):
    msg = gmail.users().messages().get(userId="me", id=request.id).execute()
    return {
        "id": msg["id"],
        "title": get_header(msg, "Subject"),
        "from": get_header(msg, "From"),
        "date": get_header(msg, "Date")
    }

@app.post("/content")
def content(request):
    msg = gmail.users().messages().get(userId="me", id=request.id, format="full").execute()
    body = extract_body_text(msg)
    truncated = len(body) > request.max_bytes if request.max_bytes else False
    return {
        "id": msg["id"],
        "format": "text",
        "content": body[:request.max_bytes] if request.max_bytes else body,
        "truncated": truncated
    }

2. RLM Execute Call

curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "query": "Find all emails about project deadlines from this week and summarize key dates",
    "data_source": {
      "type": "wrapper_v1",
      "base_url": "https://your-app.example.com/gmail-wrapper",
      "token": "user_oauth_session_token",
      "limits": {
        "max_requests": 50,
        "timeout_ms": 10000
      }
    }
  }'

3. What the LLM Writes

The model generates Python code like:

# Search for deadline-related emails
results = data_source_search(
    query="deadline OR due date",
    filters={"label": "inbox"},
    page={"limit": 50}
)

print(f"Found {len(results['items'])} potential matches")

# Get content for relevant emails
deadline_info = []
for item in results["items"][:10]:  # Process top 10
    content = data_source_content(item["id"], format="text", max_bytes=10000)

    # Use LLM to extract deadline mentions
    extraction = llm_query(f"""
        Extract any project deadlines from this email.
        Return JSON: {{"deadlines": [{{"project": "...", "date": "...", "details": "..."}}]}}

        Email: {content['content']}
    """)

    if extraction.get("deadlines"):
        deadline_info.extend(extraction["deadlines"])

# Summarize findings
summary = llm_query(f"""
    Summarize these project deadlines in a clear format:
    {deadline_info}
""")

FINAL({
    "deadlines": deadline_info,
    "summary": summary,
    "emails_searched": len(results["items"])
})

4. Response

{
  "id": "rlm_abc123",
  "model": "claude-sonnet-4-5",
  "answer": {
    "deadlines": [
      {"project": "Website Redesign", "date": "2025-02-15", "details": "Final mockups due"},
      {"project": "Q1 Report", "date": "2025-02-10", "details": "Submit to finance"}
    ],
    "summary": "You have 2 upcoming deadlines: Website Redesign mockups due Feb 15, and Q1 Report due Feb 10.",
    "emails_searched": 23
  },
  "iterations": 4,
  "subcalls": 12,
  "usage": {"input_tokens": 15000, "output_tokens": 2500, "total_tokens": 17500}
}

Security Requirements

URL restrictions:

  • base_url must use HTTPS (no HTTP)
  • base_url must not be localhost or a private IP (10.x, 192.168.x, 127.x)
  • base_url must not include credentials

Token handling:

  • token is sent as Authorization: Bearer {token} to your wrapper
  • Your wrapper should validate the token and enforce per-user access
  • Store OAuth tokens securely; consider using short-lived session tokens

Allowed hosts:

  • By default, only base_url’s host is allowed for sandbox egress
  • Use allowed_hosts to enable additional hosts (e.g., CDNs for attachments)
  • Same restrictions apply: HTTPS only, no localhost, no private IPs

Best Practices

Response design:

  • Keep responses JSON-serializable
  • Stay under max_response_bytes limits
  • Use stable IDs that work across pagination

Pagination:

  • Use cursor-based pagination for consistency
  • Return next_cursor only when more results exist
  • Support reasonable limit values (e.g., 10-100)

Error handling:

  • Return descriptive error messages
  • Use appropriate error codes (not_found, unauthorized, etc.)
  • Don’t expose internal implementation details in errors

Rate limiting:

  • Enforce rate limits in your wrapper
  • Return rate_limited error code when exceeded
  • Consider per-user quotas

Limits

Limit Enforced by Description
limits.timeout_ms Sandbox Per-request timeout
limits.max_requests Sandbox Total calls across all data_source_* functions
limits.max_response_bytes Sandbox Max bytes read per response
RLM max_iterations RLM Total code generation cycles
RLM max_subcalls RLM Total llm_query/llm_batch calls

If any limit is exceeded, the sandbox raises a ValueError with a descriptive message (e.g., "data_source max_requests exceeded").