Correlate Profiling and Distributed Tracing in Async Rust

Correlate profiles with arbitrary application-specific metadata in async Rust applications

April 17, 2025

tl;dr Starting with Parca Agent v0.38.0 and custom-labels v0.4.1 we are introducing Async-Rust-aware APIs to attach arbitrary metadata to CPU profiling data, which allows correlating, for example, distributed tracing IDs with CPU time spent on a single request as it traverses through multiple services.

The World So Far

Regular readers of this blog will remember the earlier post on Custom Labels for Rust, Go, and C++, a feature that allows Parca (or Polar Signals Cloud) users to annotate sections in CPU profiles with metadata about what is going on in their application; for example, the currently active trace or span ID for distributed tracing.

When that article was written, the API for Rust was based on system threads: a thread-local label is applied, some code is run, and then the label is removed. For example:

custom_labels::with_labels(
    [("myKey", "myValue"), ("myOtherKey", "myOtherValue")],
    || {
        // Do some stuff! ...
    },
);

Whenever the code represented by "Do some stuff!" is running; that is, whenever its thread is running on the CPU, the Parca Agent is able to see that that thread's value for the myKey and myOtherKey labels is myValue and myOtherValue, respectively.

This works well for traditional synchronous code ...

Limitations Of The Former Approach

... but that wasn't good enough! Asynchronous code is increasingly common in the Rust ecosystem.

Imagine the code of interest is asynchronous:

async fn f() {
    // Do some stuff!
}

and we'd like to call it from an asynchronous context:

async fn g() {
    f().await;
}

No matter how we try to wrap this in custom_labels::with_labels, it doesn't work. A first attempt:

async fn f() {
    // Do some stuff!
}

async fn g() {
    custom_labels::with_labels(
        [("myKey", "myValue"), ("myOtherKey", "myOtherValue")],
        || {
            f().await;
        },
    );
}

fails with this error:

    Checking test_rust v0.1.0 (/home/brennan/test_rust)
error[E0728]: `await` is only allowed inside `async` functions and blocks
 --> src/main.rs:9:17
  |
8 |         || {
  |         -- this is not `async`
9 |             f().await;
  |                 ^^^^^ only allowed inside `async` functions and blocks

If we try to force the code into an async context:

async fn f() {
    // Do some stuff!
}

async fn g() {
    custom_labels::with_labels(
        [("myKey", "myValue"), ("myOtherKey", "myOtherValue")],
        || async {
            f().await;
        },
    );
}

the code compiles, but with the following warning:

warning: unused implementer of `Future` that must be used
  --> src/main.rs:6:5
   |
6  | /     custom_labels::with_labels(
7  | |         [("myKey", "myValue"), ("myOtherKey", "myOtherValue")],
8  | |         || async {
9  | |             f().await;
10 | |         },
11 | |     );
   | |_____^
   |
   = note: futures do nothing unless you `.await` or poll them
   = note: `#[warn(unused_must_use)]` on by default

and indeed, the code in f will never run: custom_labels::with_labels knows nothing about futures or how to poll them, so we've just written something that creates a future and then discards it without ever actually running the code it represents.

To sumamarize: there is no way to wrap asynchronous code in custom_labels::with_labels. And this limitation is not due to some minor implementation detail, but to a conceptual limitation. Most futures are Send, meaning they can sent between threads and possibly polled on multiple different ones, and in fact this is required by popular async runtimes like Tokio or Smol. So there's no way a label scoped to a single thread could possibly apply to the entire lifetime of a future.

Introducing Asynchronous Native Custom Labels

As suggested by the previous section, we needed to create a separate API that is aware of futures. And indeed we have, in collaboration with engineers at turbopuffer, who are committed users of both asynchronous Rust and Polar Signals Cloud.

use custom_labels::asynchronous::Label;

async fn f() {
    // Do some stuff!
}

async fn g() {
    async {
        f().await;
    }
    .with_labels([("myKey", "myValue"), ("myOtherKey", "myOtherValue")])
    .await;
}

The with_labels trait method is an extension to Future that wraps it in an adapter. This adapter future, whenever it is polled, applies its label set to the current thread, polls its underlying future, and then un-applies the labels. This works because polling is a synchronous operation: whenever Future::poll is called, it, and all the poll calls it makes, run on the same thread until it returns.

Usage

The new feature is available in version 0.4.1 of the custom-labels library. As always, the library needs to be both a build dependency and a normal dependency:

[package]
name = "test_rust"
version = "0.1.0"
edition = "2021"

[dependencies]
custom-labels = "0.4.1"

[build-dependencies]
custom-labels = "0.4.1"

and the build instructions for custom labels need to be emitted in build.rs:

fn main() {
    custom_labels::build::emit_build_instructions();
}

Then the extension trait in the custom_labels::asynchronous module can be used, as described above:

use custom_labels::asynchronous::Label;

async fn f() {
    // Do some stuff!
}

async fn g() {
    async {
        f().await;
    }
    .with_labels([("myKey", "myValue"), ("myOtherKey", "myOtherValue")])
    .await;
}

The asynchronous labels feature interacts with the old API in the way one would expect:

use custom_labels::asynchronous::Label;

async fn f() {
    custom_labels::with_label("foo", "bar", || {
        // Do some stuff!
    })
}

async fn g() {
    async {
        f().await;
    }
    .with_labels([("myKey", "myValue"), ("myOtherKey", "myOtherValue")])
    .await;
}

In the synchronous code represented by "Do some stuff!", the labels "foo", "myKey", and "myOtherKey" will be active.

Worked Example

In this example, we have several Markdown files stored in the examples/testdata directory. In response to HTTP queries, we render the files and return them as HTML.

The profiling data for each request is annotated with the filename of the Markdown document currently being rendered. We run the rendering function many times in order to make the process more obviously CPU-bound.

use axum::extract::Path;
use axum::http::StatusCode;
use axum::response::Html;
use axum::routing::get;
use axum::Router;
use custom_labels::asynchronous::Label;
use markdown::Options;
use rand::rng;
use rand::seq::IndexedMutRandom;
use tokio::net::TcpListener;

#[tokio::main]
async fn main() {
    let r = Router::new().route("/{fname}", get(handler));
    let listener = TcpListener::bind("0.0.0.0:3000").await.unwrap();
    axum::serve(listener, r).await.unwrap();
}

#[cfg(debug_assertions)]
const TRIALS: usize = 100;

#[cfg(not(debug_assertions))]
const TRIALS: usize = 1000;

async fn handler(Path(fname): Path<String>) -> Result<Html<String>, StatusCode> {
    let fname2 = fname.clone();
    async move {
        async fn inner(fname: String) -> Result<Html<String>, anyhow::Error> {
            let contents = tokio::fs::read_to_string(format!("examples/testdata/{fname}")).await?;
            let mut trials = (0..TRIALS)
                .map(|i| {
                    custom_labels::with_label("trial", format!("{i}"), || {
                        markdown::to_html_with_options(&contents, &Options::gfm()).unwrap()
                    })
                })
                .collect::<Vec<_>>();

            Ok(Html(std::mem::take(trials.choose_mut(&mut rng()).unwrap())))
        }
        inner(fname).await.map_err(|e| {
            eprintln!("Error: {e}");
            StatusCode::INTERNAL_SERVER_ERROR
        })
    }
    .with_label("fname", fname2)
    .await
}

Propagation

When new tasks are spawned, one might want the labels from the original future to be automatically copied to the new one.

This doesn't happen automatically, but using the try_clone_from_current API, one can make a custom spawn function that handles label propagation. For example:

use std::future::Future;

use custom_labels::asynchronous::Label;
use custom_labels::with_label;
use custom_labels::Labelset;
use tokio::spawn;
use tokio::task::JoinHandle;

pub fn my_spawn<Fut>(future: Fut) -> JoinHandle<Fut::Output>
where
    Fut: Future + Send + 'static,
    Fut::Output: Send + 'static,
{
    match Labelset::try_clone_from_current() {
        None => spawn(future),
        Some(ls) => spawn(future.with_labelset(ls)),
    }
}

async fn f() {
    // Do some stuff!
}

fn g() {
    with_label("foo", "bar", || my_spawn(f()));
}

Conclusion

By adding APIs that make Parca's Custom Labels feature work seamlessly with the Rust async ecosystem, we hope to enable exciting new profiling use-cases for all users of Parca and Polar Signals Cloud, enabling correlation of CPU profiles with distributed tracing IDs or any other application-specific metadata. As always, happy profiling!

Discuss:
Sign up for the latest Polar Signals news