Skip to main content

Profiling 101

Understand basics of CPU profiling in 5 minutes ⏱️!

Example

This guide will demonstrate CPU profiling using Go, but these fundamentals apply to any language. Let's walk through an example of CPU profiling.

Capturing data

Take the following example of a Go program that has a main function, that first calls iterateLong which calls iterate with 9 billion iterations, and then iterateShort, which calls iterate with 1 billion iterations.

package main

func main() {
iterateLong()
iterateShort()
}

func iterateLong() {
iterate(9_000_000_000)
}

func iterateShort() {
iterate(1_000_000_000)
}

func iterate(iterations int) {
for i := 0; i < iterations; i++ {
}
}

Note: Save this snippet to iterate.go and run it with time go run iterate.go.

When executed this program takes 5 seconds to execute in total (on an AMD Ryzen 5 3400GE CPU). With profiling we can understand what was executing during those 5 seconds and for how long. For the sake of simplicity, a sampling CPU profiler looks at the "current" stack trace 100x per second (the sampling rate is typically configurable, but 100x is both common and easier to calculate with).

Data format

With a profiler running during the execution of the above program it records a profile, that produces the following data in folded stack trace format:

iterate;iterateLong;main    450
iterate;iterateShort;main 50

Parca uses the open standard pprof, which is optimized to use as little space as possible, but folded stack traces are great for human readability.

10% (50 samples observed out of 500) of the time was spent in the iterate function called by iterateShort and 90% (450 samples observed out of 500) of the time was spent in the iterate function called by iterateLong.

Visualizing

Using this data, a popular way to visualize profiling data is using flamegraphs.
If they are built with the root at the top: icicle graphs.

Profiling 101 Icicle Graph

Recap

In this guide you have learned the fundamentals of CPU profiling:

1) How data is captured: by observing the executed stack traces 100x per second. 2) What the raw data looks like: folded stack traces, and the optimized pprof format. 3) Useful ways to visualize data: flame graphs/icicle graphs.

Congrats and happy profiling! 🎉