Data Source Integration
Connect external data sources (Gmail, Slack, databases, etc.) to RLM. Your wrapper service implements three endpoints; the LLM gets Python functions to search and retrieve your data.
Overview
When you include a data_source in your /rlm/execute request, the Python sandbox automatically gets helper functions:
| Function | Purpose |
|---|---|
data_source_search(query, filters, page) |
Search or list items |
data_source_get(id) |
Fetch item metadata |
data_source_content(id, format, max_bytes) |
Fetch item content |
These functions call YOUR wrapper service, which translates requests to your data source (Gmail API, database, etc.).
┌─────────────────┐ ┌──────────────────┐ ┌────────────────┐ ┌────────────┐
│ Your App │───▶│ ModelRelay RLM │───▶│ Your Wrapper │───▶│ Data Source│
│ │ │ (Python sandbox)│ │ (translates) │ │ (Gmail, DB)│
└─────────────────┘ └──────────────────┘ └────────────────┘ └────────────┘
Quick Start
- Build a wrapper with
/search,/get,/contentendpoints - Call
/rlm/executewithdata_sourcepointing to your wrapper - The LLM writes Python using your data via the helper functions
Configuration
The data_source parameter in /rlm/execute:
| Field | Type | Required | Description |
|---|---|---|---|
type |
string | Yes | Must be "wrapper_v1" |
base_url |
string | Yes | HTTPS URL to your wrapper |
token |
string | Yes | Bearer token for authorization |
allowed_hosts |
string[] | No | Additional hosts for sandbox egress |
limits |
object | No | Request/response limits |
Limits
| Field | Type | Default | Description |
|---|---|---|---|
timeout_ms |
integer | — | Per-request timeout (ms) |
max_requests |
integer | — | Max requests per RLM execution |
max_response_bytes |
integer | — | Max response size per request |
Example Request
curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
-H "Authorization: Bearer $MODELRELAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-5",
"query": "Find emails about project deadlines and summarize them",
"data_source": {
"type": "wrapper_v1",
"base_url": "https://your-wrapper.example.com",
"token": "user_session_token_xxx",
"allowed_hosts": ["attachment-cdn.example.com"],
"limits": {
"timeout_ms": 15000,
"max_requests": 20,
"max_response_bytes": 1048576
}
}
}'
Python Functions
When data_source is configured, these functions become available in the sandbox:
data_source_search(query, filters=None, page=None)
Search or list items from your data source.
# Example: Search for emails about deadlines
results = data_source_search(
query="project deadline",
filters={"label": "inbox", "has_attachment": True},
page={"limit": 25}
)
# Returns: {"items": [...], "next_cursor": "..."}
# The LLM might write:
for item in results["items"]:
print(f"Found: {item['id']} - {item['title']}")
Calls POST {base_url}/search with:
{"query": "project deadline", "filters": {"label": "inbox"}, "page": {"limit": 25}}
data_source_get(id)
Fetch metadata for a single item.
# Example: Get email metadata
email = data_source_get("msg_abc123")
# Returns: {"id": "msg_abc123", "title": "Re: Project Update", "from": "alice@example.com", ...}
print(f"From: {email['from']}, Subject: {email['title']}")
Calls POST {base_url}/get with:
{"id": "msg_abc123"}
data_source_content(id, format=“text”, max_bytes=None)
Fetch the content/body of an item.
# Example: Get email body as text
body = data_source_content("msg_abc123", format="text", max_bytes=50000)
# Returns: {"id": "msg_abc123", "format": "text", "content": "...", "truncated": false}
# Analyze the content
if "deadline" in body["content"].lower():
summary = llm_query(f"Summarize this email about deadlines: {body['content']}")
Calls POST {base_url}/content with:
{"id": "msg_abc123", "format": "text", "max_bytes": 50000}
Endpoint Contract
Your wrapper must implement these POST endpoints under your base_url:
POST /search
Query and paginate items.
Request:
{
"query": "Find contracts",
"filters": {"type": "document", "date_after": "2024-01-01"},
"page": {"limit": 25, "cursor": "cursor_abc"}
}
| Field | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | Search query (semantic or keyword) |
filters |
object | No | Domain-specific filters |
page |
object | No | Pagination (limit, cursor) |
Response:
{
"items": [
{"id": "msg_1", "title": "Q1 Contract", "from": "legal@company.com", "date": "2024-03-15"},
{"id": "msg_2", "title": "Contract Renewal", "from": "vendor@partner.com", "date": "2024-03-10"}
],
"next_cursor": "cursor_xyz"
}
| Field | Type | Required | Description |
|---|---|---|---|
items |
array | Yes | List of item objects with id and metadata |
next_cursor |
string | No | Cursor for next page (omit if no more results) |
POST /get
Fetch metadata for a single item.
Request:
{"id": "msg_1"}
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Item identifier |
Response:
{
"id": "msg_1",
"title": "Q1 Contract",
"from": "legal@company.com",
"date": "2024-03-15",
"mime_type": "message/rfc822",
"has_attachments": true
}
Return whatever metadata is useful for your domain. The id field is required.
POST /content
Fetch or export item content.
Request:
{
"id": "msg_1",
"format": "text",
"max_bytes": 200000
}
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Item identifier |
format |
string | No | Requested format (e.g., text, html, raw) |
max_bytes |
integer | No | Max content size hint |
Response:
{
"id": "msg_1",
"format": "text",
"content": "Hi team,\n\nPlease review the attached contract by Friday...",
"truncated": true
}
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Item identifier |
format |
string | Yes | Actual format returned |
content |
string | Yes | The item content |
truncated |
boolean | No | True if content was cut off at max_bytes |
Error Response
All endpoints should return errors as:
{
"error": {
"code": "not_found",
"message": "Item msg_999 not found"
}
}
Common error codes: not_found, unauthorized, rate_limited, invalid_request.
Complete Example: Gmail Wrapper
Here’s an end-to-end example showing a Gmail wrapper integration.
1. Your Wrapper Endpoints
Your wrapper translates between the wrapper_v1 contract and the Gmail API:
# Pseudocode for your wrapper service
@app.post("/search")
def search(request):
# Translate to Gmail API
gmail_query = request.query
if request.filters.get("label"):
gmail_query += f" in:{request.filters['label']}"
messages = gmail.users().messages().list(
userId="me",
q=gmail_query,
maxResults=request.page.get("limit", 25)
).execute()
return {
"items": [{"id": m["id"], "title": get_subject(m)} for m in messages],
"next_cursor": messages.get("nextPageToken")
}
@app.post("/get")
def get(request):
msg = gmail.users().messages().get(userId="me", id=request.id).execute()
return {
"id": msg["id"],
"title": get_header(msg, "Subject"),
"from": get_header(msg, "From"),
"date": get_header(msg, "Date")
}
@app.post("/content")
def content(request):
msg = gmail.users().messages().get(userId="me", id=request.id, format="full").execute()
body = extract_body_text(msg)
truncated = len(body) > request.max_bytes if request.max_bytes else False
return {
"id": msg["id"],
"format": "text",
"content": body[:request.max_bytes] if request.max_bytes else body,
"truncated": truncated
}
2. RLM Execute Call
curl -X POST https://api.modelrelay.ai/api/v1/rlm/execute \
-H "Authorization: Bearer $MODELRELAY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-5",
"query": "Find all emails about project deadlines from this week and summarize key dates",
"data_source": {
"type": "wrapper_v1",
"base_url": "https://your-app.example.com/gmail-wrapper",
"token": "user_oauth_session_token",
"limits": {
"max_requests": 50,
"timeout_ms": 10000
}
}
}'
3. What the LLM Writes
The model generates Python code like:
# Search for deadline-related emails
results = data_source_search(
query="deadline OR due date",
filters={"label": "inbox"},
page={"limit": 50}
)
print(f"Found {len(results['items'])} potential matches")
# Get content for relevant emails
deadline_info = []
for item in results["items"][:10]: # Process top 10
content = data_source_content(item["id"], format="text", max_bytes=10000)
# Use LLM to extract deadline mentions
extraction = llm_query(f"""
Extract any project deadlines from this email.
Return JSON: {{"deadlines": [{{"project": "...", "date": "...", "details": "..."}}]}}
Email: {content['content']}
""")
if extraction.get("deadlines"):
deadline_info.extend(extraction["deadlines"])
# Summarize findings
summary = llm_query(f"""
Summarize these project deadlines in a clear format:
{deadline_info}
""")
FINAL({
"deadlines": deadline_info,
"summary": summary,
"emails_searched": len(results["items"])
})
4. Response
{
"id": "rlm_abc123",
"model": "claude-sonnet-4-5",
"answer": {
"deadlines": [
{"project": "Website Redesign", "date": "2025-02-15", "details": "Final mockups due"},
{"project": "Q1 Report", "date": "2025-02-10", "details": "Submit to finance"}
],
"summary": "You have 2 upcoming deadlines: Website Redesign mockups due Feb 15, and Q1 Report due Feb 10.",
"emails_searched": 23
},
"iterations": 4,
"subcalls": 12,
"usage": {"input_tokens": 15000, "output_tokens": 2500, "total_tokens": 17500}
}
Security Requirements
URL restrictions:
base_urlmust use HTTPS (no HTTP)base_urlmust not be localhost or a private IP (10.x, 192.168.x, 127.x)base_urlmust not include credentials
Token handling:
tokenis sent asAuthorization: Bearer {token}to your wrapper- Your wrapper should validate the token and enforce per-user access
- Store OAuth tokens securely; consider using short-lived session tokens
Allowed hosts:
- By default, only
base_url’s host is allowed for sandbox egress - Use
allowed_hoststo enable additional hosts (e.g., CDNs for attachments) - Same restrictions apply: HTTPS only, no localhost, no private IPs
Best Practices
Response design:
- Keep responses JSON-serializable
- Stay under
max_response_byteslimits - Use stable IDs that work across pagination
Pagination:
- Use cursor-based pagination for consistency
- Return
next_cursoronly when more results exist - Support reasonable
limitvalues (e.g., 10-100)
Error handling:
- Return descriptive error messages
- Use appropriate error codes (
not_found,unauthorized, etc.) - Don’t expose internal implementation details in errors
Rate limiting:
- Enforce rate limits in your wrapper
- Return
rate_limitederror code when exceeded - Consider per-user quotas
Limits
| Limit | Enforced by | Description |
|---|---|---|
limits.timeout_ms |
Sandbox | Per-request timeout |
limits.max_requests |
Sandbox | Total calls across all data_source_* functions |
limits.max_response_bytes |
Sandbox | Max bytes read per response |
RLM max_iterations |
RLM | Total code generation cycles |
RLM max_subcalls |
RLM | Total llm_query/llm_batch calls |
If any limit is exceeded, the sandbox raises a ValueError with a descriptive message (e.g., "data_source max_requests exceeded").