All Articles

Extending OpenTelemetry to Pinpoint Code Elements: Our Journey to Close the Gap

Enhancing OpenTelemetry's capabilities to pinpoint specific code elements responsible for system behaviors.

Research
11.10.2024
Anton Grübel, Software Engineer
3 min

When building a modern observability stack, correlating application data to the exact lines of code that generate it is the ultimate goal. Developers want answers fast: which line is throwing errors, which file is causing the latency spike, which component is the culprit?

When the Baz team set out to build comprehensive observability into our applications, we brought our experiences building observability for cloud-deployed applications at companies like Bridgecrew and Prisma Cloud by Palo Alto Networks. The key learning being the importance of having deep, code-level insights when debugging.  

Using OpenTelemetry (OTel), the open-source standard for observability, brings us closer to that goal, but it falls short in a critical way: in most languages, it doesn't capture the specific details needed to pinpoint code elements critical for debugging. This limitation creates a significant gap in the observability experience. 

So we built a solution to extend OpenTelemetry's functionality with two new open source libraries for Python and Go. Read on to learn our approach with these libraries and how to implement them for your observability needs.

The Problem: A missing correlation detail

OpenTelemetry offers a robust set of tools for tracing, metrics, and logging. However, as we set out to build comprehensive observability into our applications, we quickly discovered that there was a crucial gap in the default implementation for most languages: the lack of automatic collection of file names, function names, and line numbers.

While Rust's OpenTelemetry implementation does include these fields, most other languages like Python and Go do not. This means that, out of the box, OTel offers visibility into operations (e.g., database queries or HTTP routes), but doesn't always allow you to drill down to the exact point in the codebase that is responsible for any given piece of data. Without this level of granularity, debugging becomes guesswork, especially in large, complex systems.

We found that the OpenTelemetry documentation does highlight the flexibility to add custom attributes, the team realized it would be an intensive exercise. To take on the manual responsibility of implementing these details at every point of interest could result in errors and inconsistencies. In the end our goal is observability that's seamless and automatic, providing developers with context without requiring extensive instrumentation overhead

Extending OpenTelemetry: Our Approach

Taking an opportunity here to extend OpenTelemetry and make the developer experience significantly better. These libraries automatically add file name, function name and line number metadata to spans. By doing this, we close the gap between tracing data and the code responsible for it, making troubleshooting faster and more effective while also providing a comprehensive view of the codebase flows.

Let’s dig into how to implement this with the libraries below.

Implementing the Extension in Python

We aimed to make our library easy to integrate while ensuring that it added as little overhead as possible. Python's dynamic nature allows us to monkey patch the added attributes on top of the existing instrumentation setup with minimal changes. 

One related thing to note, developers might notice OpenTelemetry’s documentation has both native and Datadog tracing references. There are opportunities to leverage both though keep it in mind if you’re contributing or building on top of the APIs as there are some inconsistencies. 

Code Snippet: Adding File and Line Metadata in Python

work_dir = os.getcwd()
    for frame_info in inspect.stack()[1:]:  # first frame is itself
        if frame_info.function == "func_wrapper" and "ddtrace" in frame_info.filename:
            # trying to get the actual definition of the callable
            if (func := frame_info.frame.f_locals.get("f")) or (func := frame_info.frame.f_locals.get("coro")):
                file_path = inspect.getsourcefile(func) or ""
                if file_path.startswith(work_dir) and "site-packages" not in file_path:
                    _, lineno = inspect.getsourcelines(func)
                    span.set_tag("code.filepath", normalize_path(path=file_path, path_prefix=work_dir))
                    span.set_tag("code.lineno", str(lineno + 1))
                    span.set_tag("code.func", func.__name__)

We achieved this by hooking into the otel. trace instrumentation process and extending each span to include attributes for code.filepath, code.func and code. lineno. The library uses Python’s inspect module to identify the caller context at runtime and automatically attaches these details to the generated spans. This enables developers to get precise visibility without modifying their tracing logic.

Implementation Details:

  1. Caller Context Identification: We used Python's inspect.stack() to retrieve the stack frames of the current execution. This allowed us to walk up the stack and find the appropriate context, ensuring we captured the correct file and line number for each span.
  2. Filtering Out Unnecessary Frames: To avoid capturing irrelevant frames (e.g., those from third-party libraries), we added logic to filter out frames originating from directories like site-packages. This ensures that we only capture frames relevant to the user's code.
  3. Efficient Integration with Datadog: By patching Datadog's Tracer with our own logic, we ensured that every span created by the tracer would automatically have file, func and line attributes attached. This integration required minimal changes to existing tracing code.

Implementing the Extension in Go

Go, with its statically-typed and compiled nature, provided a different set of constraints and opportunities. In Go, the runtime package provides access to Go Runtime’s stack information, which we used to capture the running file, function and line number for each span.

Code Snippet: Adding File, Function and Line Metadata in Go

In our Go extension, we injected custom attributes into OpenTelemetry’s Span interface by wrapping the default span creation with additional logic. Each time a span is started, our wrapper extracts the caller information and adds it to the span attributes, capturing code.filepath, code.lineno and code.func. This provides developers with the immediate benefit of knowing precisely where issues originate without sacrificing performance.

Implementation Details:

  1. Using the runtime Package: The Go runtime package provided functions like Caller() to retrieve the program counter, file name, and line number of the calling function. This allowed us to efficiently add metadata without significant overhead.
  2. Performance Considerations: Go is often used for performance-critical applications, so we focused on minimizing the impact of our extension. It adds an overhead of ~1.5μs per span.
  3. Span Attribute Injection: By extending the Span interface, we could add custom attributes directly during span creation. This approach meant that our changes were fully compatible with existing OpenTelemetry-based tooling, requiring no modifications to downstream systems.

Results and Impact

Since implementing these extensions, we’ve observed significant improvements in our ability to trace issues down to specific code elements. Instead of vague error messages or ambiguous stack traces, developers get actionable insights with pinpoint accuracy. This has led to faster debugging, fewer context switches, and an overall more pleasant developer experience.

We’ve also shared our extensions with the community as open source libraries, and the feedback has been fantastic. We're thrilled to be able to ship something that helps solve a growing pain point and continue to improve the incredible feature set for Otel

Many teams expressed similar frustrations with OpenTelemetry’s default behavior, and we’re thrilled that our work is helping bridge that gap for others.

Conclusion

Observability is about connecting dots—from system behavior all the way back to the source code. By extending OpenTelemetry with automatic file name and line number collection, we’ve taken a big step forward in making observability more meaningful and effortless for developers. We believe that every language should offer this capability by default, and we hope that our work inspires the OpenTelemetry community to make this a standard part of tracing across all implementations.

We’re excited to keep pushing the boundaries of what observability can do. If you’re facing similar challenges or want to contribute to the effort, check out our GitHub repositories and join the conversation!