How to Monitor OpenAI Agents SDK in Production (2026 Guide)

Running OpenAI Agents SDK in development is very different from running it in production.

In development, you have the OpenAI trace dashboard open in one tab and your terminal in another. You watch each step fire, catch tool failures immediately, and iterate fast. In production, agents run across many users in parallel. You're not watching. When something breaks — or starts costing three times what you expected — you find out from a user complaint, not a dashboard alert.

This guide covers how to set up production monitoring for OpenAI Agents SDK using LumiqTrace: what you'll see, how to track multi-agent handoffs, how to configure cost alerts, and how to use LumiqPilot for ops.

Why OpenAI's Built-In Tracing Isn't Enough for Production

The OpenAI Agents SDK ships with a tracing integration that surfaces data in OpenAI's trace dashboard. It's useful during development. In production, it has meaningful gaps.

What OpenAI's tracing shows you:

OpenAI API calls and their latencies
Token counts per call
Model used per call

What it doesn't show you:

Which agent in your multi-agent graph ran at each step
Which custom tools were invoked and whether they succeeded
The handoff path — which agent delegated to which, what context was passed, how long each downstream agent took
Cost attribution per span (not just per API call, but per tool, per agent, per user)
Evaluation scores for response quality
Anomaly detection across runs over time

OpenAI's trace dashboard captures what the OpenAI API sees. LumiqTrace captures everything at the SDK level: which agent object ran, which Python/TypeScript function handled the tool call, the full handoff chain with context passed between agents, and per-span cost attribution.

For a single-agent prototype with a couple of tool calls, the built-in tracing is fine. For a production system with multiple agents delegating to each other, custom tools, and real users, you need the full execution tree. For a primer on what agent observability covers, see What is AI Agent Observability.

Setup: 2 Lines of Code

LumiqTrace uses auto-discovery. You don't need to wrap every agent or decorate every tool — just initialize the SDK at startup.

Install:

# Python
pip install lumiqtrace

# Node.js / TypeScript
npm install @lumiqtrace/sdk

Initialize:

import lumiqtrace
from agents import Agent, Runner

lumiqtrace.init(api_key="YOUR_API_KEY")

# No other changes — auto-discovery instruments all agents and tools
result = await Runner.run(agent, "What's my account status?")

import { lumiqtrace } from "@lumiqtrace/sdk";

lumiqtrace.init({ apiKey: process.env.LT_KEY });

LumiqTrace auto-patches OpenAI.Completions.create at the provider level on init. Since the OpenAI Agents SDK routes all LLM calls through the OpenAI provider, every agent run — including tool calls and handoffs — is captured automatically. No framework-specific handler is needed for the OpenAI Agents SDK; the provider-level patch covers the entire execution tree.

On first run, LumiqTrace maps your agent graph — every agent, every tool, every handoff path. On every subsequent run, it captures full agentic traces automatically. Your agents continue to work exactly as before.

What You See in the Dashboard

Agentic Traces

Every execution is captured as an agentic trace. Every span carries agent identity — you always know which agent produced which output. For an OpenAI agent with tool calls, a typical trace looks like this:

Agent Run: Support Agent [380ms, $0.014]
├── Tool: search_knowledge_base [45ms]
├── LLM: gpt-4o reasoning [180ms, $0.009]
├── Handoff: → Billing Agent [context passed]
│   ├── LLM: gpt-4o billing reasoning [95ms, $0.004]
│   └── Tool: get_invoice [18ms]
└── LLM: gpt-4o-mini final response [42ms, $0.001]

You can see at a glance where time and cost were spent across every agent and tool. Click any span to see the full input/output for that step, including the exact prompt sent to each model call.

Tool Call Analytics

LumiqTrace aggregates tool call data across all traces:

Which tools are called most frequently
Success and failure rates per tool
Average latency per tool
Cost attribution per tool type

If your search_knowledge_base tool is timing out on 6% of runs, you'll see it in tool analytics before it starts visibly degrading user experience.

Cost Breakdown Per Span

Costs are attributed at the span level, not just at the API call level. For a typical multi-agent run, you might see:

Tool calls: 8% of total cost
Primary agent reasoning: 64% of total cost
Delegated agent execution: 28% of total cost

When you can see cost per agent and per tool, model swap decisions become concrete. If the primary agent's reasoning step accounts for 64% of your cost, that's where a model downgrade has the highest leverage.

Monitoring Multi-Agent Handoffs

Handoffs are one of the most important features in the OpenAI Agents SDK. They're also the most common source of production bugs.

Here's a typical multi-agent setup:

billing_agent = Agent(name="Billing Agent", instructions="Handle billing questions")
support_agent = Agent(
    name="Support Agent",
    instructions="Handle general support. Handoff billing to billing_agent.",
    handoffs=[billing_agent]
)

When support_agent transfers control to billing_agent, you want to know:

Did the handoff happen at all?
What context was passed to the downstream agent?
Did the downstream agent succeed or fail?
How much did the downstream agent cost?

With OpenAI's built-in tracing, a handoff appears as a gap — you see the support agent's API calls and the billing agent's API calls, but not the delegation itself as a traceable event.

With LumiqTrace, handoffs are first-class spans. The trace tree shows the full delegation path with timing and cost for every branch. If your support agent is handing off to billing incorrectly — passing incomplete context, triggering unnecessary delegations, or looping back — you'll see it in the trace before users report it. For a comparison of how LangSmith, Langfuse, and LumiqTrace handle multi-agent tracing, see our observability tools comparison.

Setting Cost Alerts

In the LumiqTrace dashboard, navigate to Alerts → Cost Guardrails.

You can set:

Per-trace budget: alert when a single trace exceeds a cost threshold (e.g., $0.08 per conversation)
Hourly spend rate: alert when spend rate exceeds a rolling average (e.g., $4/hour)
Per-user budget: alert when a single user's cumulative trace cost exceeds a threshold (requires passing user ID in trace context)

To pass user context:

with lumiqtrace.context(user_id="user_123", session_id="sess_456"):
    result = await Runner.run(agent, user_message)

await lumiqtrace.withContext({ userId: "user_123", sessionId: "sess_456" }, async () => {
  const result = await Runner.run(agent, userMessage);
});

Alerts can route to email, Slack, or PagerDuty (Team plan).

Using LumiqPilot

LumiqPilot is a conversational AI ops assistant built into your dashboard. It has three distinct capabilities that work together in a single conversation thread.

Deep data analysis. Ask in plain language and Pilot reads your live traces to find the root cause:

"Which agent is responsible for the latency spike today?" → Pilot surfaces the exact agent, run, and tool call
"Are handoffs to the billing agent completing successfully?"
"What changed between this week and last week in cost per trace?"

Instant action from insight. Continue the same conversation to act on what Pilot just found — no tab-switching:

Create an alert for the pattern Pilot identified
Switch the model on a specific agent to a cheaper option
Adjust handoff thresholds on an agent deployment

Proactive auto-remediation (Scale). On the Scale plan, Pilot surfaces anomalies and cost opportunities before you ask. Define rules and Pilot auto-remediates incidents without human-in-the-loop: a latency spike triggers a model downgrade automatically, a cost threshold breach creates an alert, a tool failure rate crossing a threshold pages the on-call rotation.

LumiqPilot is available on the Pro plan and above. If you're evaluating alternatives, see how LumiqTrace compares to other tools in our LangSmith alternatives breakdown. Teams running LangChain agents alongside OpenAI Agents SDK can follow the LangChain production monitoring guide for framework-specific setup.

Setting Up Automatic Evaluations

LumiqTrace runs LLM-as-judge evaluation on every trace automatically. The 12 built-in templates include:

Faithfulness: Is the response grounded in retrieved context?
Relevance: Does the response address what the user actually asked?
Instruction following: Did the agent follow system prompt constraints?
Handoff appropriateness: Did the agent delegate when it should have, and not delegate when it shouldn't have?
Toxicity: Does the response contain harmful content?

To activate auto-evaluations, navigate to Evaluations → Auto-Eval Settings and select which templates to run on which agent types.

Evaluation scores appear in the trace view alongside every span. When a score drops below your configured threshold, you get an alert. This is how you catch regressions from a system prompt change before they reach users at scale — the score trend surfaces in the dashboard within hours of a deployment.

What a Healthy Production OpenAI Agent Looks Like

After one week of production traces, you'll have enough data to establish baselines:

P50/P95/P99 latency per agent type — what's normal for your workload
Cost per conversation — is it stable or drifting upward?
Handoff rate — what percentage of runs trigger a delegation?
Tool error rates — under 1% is healthy for most tools
Eval score trend — should be stable week over week
Anomaly alerts — zero is the target

LumiqTrace's anomaly detection starts alerting on deviations from these baselines automatically. After the first two weeks, the alerts are almost entirely actionable rather than noise.

Frequently Asked Questions

Does LumiqTrace work with the OpenAI Agents SDK out of the box?

Yes. Two lines of code — pip install lumiqtrace (Python) or npm install @lumiqtrace/sdk (Node.js), then lumiqtrace.init() — auto-discover all agents, tools, and handoffs. No manual instrumentation required. Your existing agent code stays unchanged.

What does OpenAI's built-in tracing miss that LumiqTrace captures?

OpenAI's trace dashboard shows OpenAI API calls. It does not show which agent ran, which custom tools were called, where a handoff happened, what context was passed between agents, or what each span cost. LumiqTrace captures the full execution tree including handoff context and cost per span.

Does LumiqTrace instrument multi-agent handoffs automatically?

Yes. Handoffs in the OpenAI Agents SDK are auto-discovered and appear as first-class spans in LumiqTrace, showing which agent delegated to which, what context was passed, and how much time and cost each downstream agent incurred.

Is there a free tier?

Yes. The free plan includes 10,000 traces per month with no credit card required. Setup takes under 5 minutes.

OpenAI Agents SDK in production without observability is flying blind at exactly the wrong time. The built-in tracing is sufficient for development. For production — real users, real costs, real handoff paths, real failures — you need the full execution tree.

Two lines of code gives you agentic traces, handoff spans, automated evals, and cost tracking. Free tier includes 10,000 traces per month with no credit card required.