Phase 14 - Lesson 23

OpenTelemetry GenAI Semantic Conventions

OpenTelemetry's GenAI SIG (launched April 2024) defines the standard schema for agent telemetry. Span names, attributes, and content-capture rules converge across vendors so agent traces mean the same thing in Datadog, Grafana, Jaeger, and Honeycomb.

Type: Learn + Build Languages: Python (stdlib) Prerequisites: Phase 14 · 13 (LangGraph), Phase 14 · 24 (Observability Platforms) Time: ~60 minutes

Learning Objectives

  • Name the GenAI span categories: model/client, agent, tool.
  • Distinguish invoke_agent CLIENT vs INTERNAL spans and when each applies.
  • List the top-level GenAI attributes: provider name, request model, data-source ID.
  • Explain the content-capture contract: opt-in, OTEL_SEMCONV_STABILITY_OPT_IN, external-reference recommendation.

The Problem

Every vendor invents their own span names. Ops teams end up building per-framework dashboards. OpenTelemetry's GenAI SIG fixes this by defining one standard the whole ecosystem targets.

The Concept

Span categories

  1. Model / client spans. Cover raw LLM calls. Emitted by provider SDKs (Anthropic, OpenAI, Bedrock) and framework model adapters.
  2. Agent spans. create_agent (when the agent is constructed) and invoke_agent (when it runs).
  3. Tool spans. One per tool invocation; connected to the agent span by parent-child relation.

Agent span naming

  • Span name: invoke_agent {gen_ai.agent.name} if named; fallback to invoke_agent.
  • Span kind:
    • CLIENT — for remote agent services (OpenAI Assistants API, Bedrock Agents).
    • INTERNAL — for in-process agent frameworks (LangChain, CrewAI, local ReAct).

Key attributes

  • gen_ai.provider.nameanthropic, openai, aws.bedrock, google.vertex.
  • gen_ai.request.model — the model ID.
  • gen_ai.response.model — the resolved model (may differ from request due to routing).
  • gen_ai.agent.name — agent identifier.
  • gen_ai.operation.namechat, completion, invoke_agent, tool_call.
  • gen_ai.data_source.id — for RAG: which corpus or store was consulted.

Technology-specific conventions exist for Anthropic, Azure AI Inference, AWS Bedrock, OpenAI.

Content capture

The default rule: instrumentations SHOULD NOT capture inputs/outputs by default. Capture is opt-in via:

  • gen_ai.system_instructions
  • gen_ai.input.messages
  • gen_ai.output.messages

Recommended production pattern: store content externally (S3, your log store), record references on spans (pointer IDs, not prose). This is the Lesson 27 content-poisoning defense wired into observability.

Stability

Most conventions are experimental as of March 2026. Opt in to the stable preview with:

OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental

Datadog v1.37+ maps GenAI attributes natively into its LLM Observability schema. Other backends (Grafana, Honeycomb, Jaeger) support the raw attributes.

Where this pattern goes wrong

  • Capturing full prompts in spans. PII, secrets, customer data in traces that ops can read. Store externally.
  • No gen_ai.provider.name. Multi-provider dashboards break when attribution is missing.
  • Spans without parent links. Orphaned tool spans. Always propagate context.
  • Not setting stability opt-in. Your attributes may get renamed on backend upgrade.

Build It

code/main.py implements a stdlib span emitter matching GenAI conventions:

  • Span with GenAI attribute schema.
  • Tracer with start_span, nested contexts.
  • A scripted agent run that emits: create_agent, invoke_agent (INTERNAL), per-tool spans, chat spans for LLM calls.
  • A content-capture mode that stores prompts externally and records IDs on spans.

Run it:

python3 code/main.py

Output: a span tree with all required GenAI attributes, and an "external store" showing the opt-in content references.

Use It

  • Datadog LLM Observability (v1.37+) maps attributes natively.
  • Langfuse / Phoenix / Opik (Lesson 24) — auto-instrument the ecosystem.
  • Jaeger / Honeycomb / Grafana Tempo — raw OTel traces; build dashboards from GenAI attributes.
  • Self-hosted — run the OTel Collector with a GenAI processor.

Ship It

outputs/skill-otel-genai.md wires OTel GenAI spans into an existing agent with content-capture defaults and external-reference storage.

Exercises

  1. Instrument your Lesson 01 ReAct loop with invoke_agent (INTERNAL) + per-tool spans. Send to a Jaeger instance.
  2. Add content capture in "references only" mode: prompts to SQLite, span attributes carry only row IDs.
  3. Read the spec for gen_ai.data_source.id. Wire it into your Lesson 09 Mem0 search.
  4. Set OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental and verify your attributes don't get renamed by the collector.
  5. Build a dashboard: "which tool errors correlate with which models" from GenAI attributes alone.

Key Terms

Term What people say What it actually means
GenAI SIG "OpenTelemetry GenAI group" OTel working group defining the schema
invoke_agent "Agent span" Name of the span representing an agent run
CLIENT span "Remote call" Span for a call to a remote agent service
INTERNAL span "In-process" Span for an in-process agent run
gen_ai.provider.name "Provider" anthropic / openai / aws.bedrock / google.vertex
gen_ai.data_source.id "RAG source" Which corpus/store a retrieval hit
Content capture "Prompt logging" Opt-in capture of messages; store externally in prod
Stability opt-in "Preview mode" Env var to pin experimental conventions

Further Reading

0 lifetime access. Curriculum based on AI Engineering from Scratch by Rohit Ghumare (MIT, used under attribution).