Data Source Integration

Connect external data sources (Gmail, Slack, databases, etc.) to RLM. Your wrapper service implements three endpoints; the LLM gets Python functions to search and retrieve your data.

Overview

When you include a data_source in your /rlm/execute request, the Python sandbox automatically gets helper functions:

Function	Purpose
`data_source_search(query, filters, page)`	Search or list items
`data_source_get(id)`	Fetch item metadata
`data_source_content(id, format, max_bytes)`	Fetch item content

These functions call YOUR wrapper service, which translates requests to your data source (Gmail API, database, etc.).

┌─────────────────┐    ┌──────────────────┐    ┌────────────────┐    ┌────────────┐
│   Your App      │───▶│  ModelRelay RLM  │───▶│  Your Wrapper  │───▶│ Data Source│
│                 │    │  (Python sandbox)│    │  (translates)  │    │ (Gmail, DB)│
└─────────────────┘    └──────────────────┘    └────────────────┘    └────────────┘

Quick Start

Build a wrapper with /search, /get, /content endpoints
Call /rlm/execute with data_source pointing to your wrapper
The LLM writes Python using your data via the helper functions

Configuration

The data_source parameter in /rlm/execute:

Field	Type	Required	Description
`type`	string	Yes	Must be `"wrapper_v1"`
`base_url`	string	Yes	HTTPS URL to your wrapper
`token`	string	Yes	Bearer token for authorization
`allowed_hosts`	string[]	No	Additional hosts for sandbox egress
`limits`	object	No	Request/response limits

Limits

Field	Type	Default	Description
`timeout_ms`	integer	—	Per-request timeout (ms)
`max_requests`	integer	—	Max requests per RLM execution
`max_response_bytes`	integer	—	Max response size per request

Example Request

curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "query": "Find emails about project deadlines and summarize them",
    "data_source": {
      "type": "wrapper_v1",
      "base_url": "https://your-wrapper.example.com",
      "token": "user_session_token_xxx",
      "allowed_hosts": ["attachment-cdn.example.com"],
      "limits": {
        "timeout_ms": 15000,
        "max_requests": 20,
        "max_response_bytes": 1048576
      }
    }
  }'

Python Functions

When data_source is configured, these functions become available in the sandbox:

data_source_search(query, filters=None, page=None)

Search or list items from your data source.

# Example: Search for emails about deadlines
results = data_source_search(
    query="project deadline",
    filters={"label": "inbox", "has_attachment": True},
    page={"limit": 25}
)
# Returns: {"items": [...], "next_cursor": "..."}

# The LLM might write:
for item in results["items"]:
    print(f"Found: {item['id']} - {item['title']}")

Calls POST {base_url}/search with:

{"query": "project deadline", "filters": {"label": "inbox"}, "page": {"limit": 25}}

data_source_get(id)

Fetch metadata for a single item.

# Example: Get email metadata
email = data_source_get("msg_abc123")
# Returns: {"id": "msg_abc123", "title": "Re: Project Update", "from": "alice@example.com", ...}

print(f"From: {email['from']}, Subject: {email['title']}")

Calls POST {base_url}/get with:

{"id": "msg_abc123"}

data_source_content(id, format=“text”, max_bytes=None)

Fetch the content/body of an item.

# Example: Get email body as text
body = data_source_content("msg_abc123", format="text", max_bytes=50000)
# Returns: {"id": "msg_abc123", "format": "text", "content": "...", "truncated": false}

# Analyze the content
if "deadline" in body["content"].lower():
    summary = llm_query(f"Summarize this email about deadlines: {body['content']}")

Calls POST {base_url}/content with:

{"id": "msg_abc123", "format": "text", "max_bytes": 50000}

Endpoint Contract

Your wrapper must implement these POST endpoints under your base_url:

POST /search

Query and paginate items.

Request:

{
  "query": "Find contracts",
  "filters": {"type": "document", "date_after": "2024-01-01"},
  "page": {"limit": 25, "cursor": "cursor_abc"}
}

Field	Type	Required	Description
`query`	string	Yes	Search query (semantic or keyword)
`filters`	object	No	Domain-specific filters
`page`	object	No	Pagination (`limit`, `cursor`)

Response:

{
  "items": [
    {"id": "msg_1", "title": "Q1 Contract", "from": "legal@company.com", "date": "2024-03-15"},
    {"id": "msg_2", "title": "Contract Renewal", "from": "vendor@partner.com", "date": "2024-03-10"}
  ],
  "next_cursor": "cursor_xyz"
}

Field	Type	Required	Description
`items`	array	Yes	List of item objects with `id` and metadata
`next_cursor`	string	No	Cursor for next page (omit if no more results)

POST /get

Fetch metadata for a single item.

Request:

{"id": "msg_1"}

Field	Type	Required	Description
`id`	string	Yes	Item identifier

Response:

{
  "id": "msg_1",
  "title": "Q1 Contract",
  "from": "legal@company.com",
  "date": "2024-03-15",
  "mime_type": "message/rfc822",
  "has_attachments": true
}

Return whatever metadata is useful for your domain. The id field is required.

POST /content

Fetch or export item content.

Request:

{
  "id": "msg_1",
  "format": "text",
  "max_bytes": 200000
}

Field	Type	Required	Description
`id`	string	Yes	Item identifier
`format`	string	No	Requested format (e.g., `text`, `html`, `raw`)
`max_bytes`	integer	No	Max content size hint

Response:

{
  "id": "msg_1",
  "format": "text",
  "content": "Hi team,\n\nPlease review the attached contract by Friday...",
  "truncated": true
}

Field	Type	Required	Description
`id`	string	Yes	Item identifier
`format`	string	Yes	Actual format returned
`content`	string	Yes	The item content
`truncated`	boolean	No	True if content was cut off at `max_bytes`

Error Response

All endpoints should return errors as:

{
  "error": {
    "code": "not_found",
    "message": "Item msg_999 not found"
  }
}

Common error codes: not_found, unauthorized, rate_limited, invalid_request.

Complete Example: Gmail Wrapper

Here’s an end-to-end example showing a Gmail wrapper integration.

1. Your Wrapper Endpoints

Your wrapper translates between the wrapper_v1 contract and the Gmail API:

# Pseudocode for your wrapper service

@app.post("/search")
def search(request):
    # Translate to Gmail API
    gmail_query = request.query
    if request.filters.get("label"):
        gmail_query += f" in:{request.filters['label']}"

    messages = gmail.users().messages().list(
        userId="me",
        q=gmail_query,
        maxResults=request.page.get("limit", 25)
    ).execute()

    return {
        "items": [{"id": m["id"], "title": get_subject(m)} for m in messages],
        "next_cursor": messages.get("nextPageToken")
    }

@app.post("/get")
def get(request):
    msg = gmail.users().messages().get(userId="me", id=request.id).execute()
    return {
        "id": msg["id"],
        "title": get_header(msg, "Subject"),
        "from": get_header(msg, "From"),
        "date": get_header(msg, "Date")
    }

@app.post("/content")
def content(request):
    msg = gmail.users().messages().get(userId="me", id=request.id, format="full").execute()
    body = extract_body_text(msg)
    truncated = len(body) > request.max_bytes if request.max_bytes else False
    return {
        "id": msg["id"],
        "format": "text",
        "content": body[:request.max_bytes] if request.max_bytes else body,
        "truncated": truncated
    }

2. RLM Execute Call

curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
  -H "Authorization: Bearer $MODELRELAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "query": "Find all emails about project deadlines from this week and summarize key dates",
    "data_source": {
      "type": "wrapper_v1",
      "base_url": "https://your-app.example.com/gmail-wrapper",
      "token": "user_oauth_session_token",
      "limits": {
        "max_requests": 50,
        "timeout_ms": 10000
      }
    }
  }'

3. What the LLM Writes

The model generates Python code like:

# Search for deadline-related emails
results = data_source_search(
    query="deadline OR due date",
    filters={"label": "inbox"},
    page={"limit": 50}
)

print(f"Found {len(results['items'])} potential matches")

# Get content for relevant emails
deadline_info = []
for item in results["items"][:10]:  # Process top 10
    content = data_source_content(item["id"], format="text", max_bytes=10000)

    # Use LLM to extract deadline mentions
    extraction = llm_query(f"""
        Extract any project deadlines from this email.
        Return JSON: {{"deadlines": [{{"project": "...", "date": "...", "details": "..."}}]}}

        Email: {content['content']}
    """)

    if extraction.get("deadlines"):
        deadline_info.extend(extraction["deadlines"])

# Summarize findings
summary = llm_query(f"""
    Summarize these project deadlines in a clear format:
    {deadline_info}
""")

FINAL({
    "deadlines": deadline_info,
    "summary": summary,
    "emails_searched": len(results["items"])
})

4. Response

{
  "id": "rlm_abc123",
  "model": "claude-sonnet-4-5",
  "answer": {
    "deadlines": [
      {"project": "Website Redesign", "date": "2025-02-15", "details": "Final mockups due"},
      {"project": "Q1 Report", "date": "2025-02-10", "details": "Submit to finance"}
    ],
    "summary": "You have 2 upcoming deadlines: Website Redesign mockups due Feb 15, and Q1 Report due Feb 10.",
    "emails_searched": 23
  },
  "iterations": 4,
  "subcalls": 12,
  "usage": {"input_tokens": 15000, "output_tokens": 2500, "total_tokens": 17500}
}

Security Requirements

URL restrictions:

base_url must use HTTPS (no HTTP)
base_url must not be localhost or a private IP (10.x, 192.168.x, 127.x)
base_url must not include credentials

Token handling:

token is sent as Authorization: Bearer {token} to your wrapper
Your wrapper should validate the token and enforce per-user access
Store OAuth tokens securely; consider using short-lived session tokens

Allowed hosts:

By default, only base_url’s host is allowed for sandbox egress
Use allowed_hosts to enable additional hosts (e.g., CDNs for attachments)
Same restrictions apply: HTTPS only, no localhost, no private IPs

Best Practices

Response design:

Keep responses JSON-serializable
Stay under max_response_bytes limits
Use stable IDs that work across pagination

Pagination:

Use cursor-based pagination for consistency
Return next_cursor only when more results exist
Support reasonable limit values (e.g., 10-100)

Error handling:

Return descriptive error messages
Use appropriate error codes (not_found, unauthorized, etc.)
Don’t expose internal implementation details in errors

Rate limiting:

Enforce rate limits in your wrapper
Return rate_limited error code when exceeded
Consider per-user quotas

Limits

Limit	Enforced by	Description
`limits.timeout_ms`	Sandbox	Per-request timeout
`limits.max_requests`	Sandbox	Total calls across all `data_source_*` functions
`limits.max_response_bytes`	Sandbox	Max bytes read per response
RLM `max_iterations`	RLM	Total code generation cycles
RLM `max_subcalls`	RLM	Total `llm_query`/`llm_batch` calls

If any limit is exceeded, the sandbox raises a ValueError with a descriptive message (e.g., "data_source max_requests exceeded").