In this blog post, we'll dive into the world of performance profiling, flame graphs, and icicle graphs, exploring their impact on optimizing application performance.
But before we get started, I have a fun little tidbit to share with you about the inspiration behind the title of this post. To unravel the connection between the popular book series "A Song of Ice and Fire" and performance visualization, be sure to check out the final section of this post.
Now, without further ado, let's embark on our journey to discover the secrets of flame and icicle graphs!
Shall we begin?
Have you ever encountered performance problems while creating or sustaining a software application? Analyzing and visualizing performance data can be demanding, mainly when working with intricate systems. Hence data visualization is essential in understanding complex systems and processes, allowing us to decode patterns and trends from a vast array of information.
Icicle and flame graphs are among the various charts and graphs that have attracted significant interest, particularly for their effectiveness in profiling software performance and displaying call stacks.
This blog post, examines the structure of these distinct graphical representations, discussing their importance, how they work, and how to use them to improve your software's performance.
Enter the world of ice and fire
Let's start with a quick introduction to icicle and flame graphs.
Icicle and flame graphs are essential visualization tools for analyzing software performance. They are powerful visualization tools that help you identify bottlenecks and optimize your code.
Flame Graphs: Ablaze with Performance Insights
Flame graphs were first introduced in 2013 by Brendan Gregg, a performance analysis expert and author, as a visualization tool for effectively analyzing and identifying performance issues in software applications. These graphs have become an indispensable tool for profiling software performance.
Brendan Gregg invented flame graphs while working on a MySQL performance issue, seeking a quick and in-depth understanding of CPU usage. Until then traditional profilers and tracers produced overwhelming text outputs, so he explored visualizations of CPU function calls, inspired by Neelakanth Nadgir, Roch Bourbonnais, and Jan Boerhout. However, these visualizations had high overhead and were too dense to read when spanning multiple seconds. Gregg switched from a tracing-based approach to a timed sampling-based profiling approach to reduce overhead and reordered samples to maximize frame merging, resulting in a more readable visualization. He chose a warm color palette to indicate busy CPUs, and since the visualization resembled flames, it became known as flame graphs.
Flame graphs have become a widely adopted method for performance analysis, as they provide a clear and intuitive representation of complex data. By displaying the hierarchical relationship between function calls and their resource consumption, flame graphs facilitate the identification of performance bottlenecks and optimization opportunities in various systems, including CPU, memory, and I/O subsystems. This innovative approach has significantly impacted the field of software performance engineering, making it easier for developers to optimize and fine-tune their applications.
Icicle Graphs: Uncovering Performance Treasures from the Frozen Depths
Icicle graphs, a variation of flame graphs, were developed to offer an alternative visualization for performance analysis. While flame graphs display the call stack hierarchy with the root at the bottom and the leaves at the top, icicle graphs invert this layout by placing the root at the top and the leaves at the bottom. Icicle graphs were derived from flame graphs, maintaining the core concepts of aggregating and visualizing call stacks. This layout is more consistent with traditional top-down tree diagrams, which some users may find more intuitive.
I like the symmetry of the two graphs. They visualize the same data using the same transformation but are mirror images of each other. These visualizations can be seen as two complementary aspects of performance analysis, harmoniously working together to provide a comprehensive understanding. Like ice and fire (too much? 🤪)
How to Read Icicle and Flame Graphs
The flame graph uses red to indicate the hottest parts of the graph. And the blue in the icicle graph indicates the coldest parts of the graph. And the leftmost rectangle within a level in the graph represents the first encountered function calls.
Right?! Unfortunately, that's NOT
the case.
Let's take a look at what are the icicle and flame graphs.
Here's a quick summary of how to interpret icicle graphs:
- Each rectangle represents a function call in the stack. The width of each rectangle represents the relative resource usage for a function, including the resource used in its child functions.
- The y-axis shows stack depth (number of frames on the stack). The bottom rectangle indicates the function that was consuming the most resource. Everything above represents ancestry. The function above a function is its parent.
- The x-axis spans the samples. It does not show the passing of time, as most of the time-series graphs do. It's sorted alphabetically to maximize merging.
- The width of the rectangle shows the total resource usage. Functions with wide rectangles may consume more resources per execution than narrow rectangles or be called more often.
The colors do not represent anything significant.
This visualization is called an "icicle graph". The top-to-bottom order of the call stacks looks like icicles. And the color palette is chosen randomly from rather cold colors.
The description provided above should offer a solid foundation for interpreting icicle and flame graphs. However, if you're anything like me, this knowledge alone may not be sufficient to develop a true intuition for deciphering the intricacies of these graphs. To truly grasp their essence, a deeper understanding of the graph's structure and data representation is indispensable. So, let's continue on a journey to unveil the inner workings and structure of these fascinating visualizations.
Anatomy of a single profile
In the simplest terms, an icicle graph or a flame graph shows a program's call stack. But what does that mean?
Structure
The graphs represent the call stack data as a series of horizontally-aligned rectangles. Each rectangle corresponds to a function, and its width represents the time spent in that function, including the resource (time, memory, etc.) used in its child functions.
The rectangles are stacked on top of one another. For flame graphs with the parent function at the bottom and the child functions above it, and for icicle graphs with the parent function at the top and the child functions below it. The hierarchy and the relationship between different functions can be easily understood by following the vertical alignment of the rectangles.
The critical aspects of icicle and flame graphs are:
- The width of each rectangle represents the relative resource usage for a function, including the time spent in its child functions.
- The height of the rectangles does NOT represent anything. The vertical alignment of the rectangles is used to describe the call stack hierarchy.
Easy, right? Let's see how we can visualize a single call stack collected from a program.
Walkthrough
Let us see how we can visualize a single call stack collected from a program.
The following program calls a function a
, which calls a function b
, which calls a function c
and then loops forever.
package main
func c() {
for {}
}
func b() {
c()
}
func a() {
b()
}
func main() {
a()
}
The colors usually do not carry any specific meaning related to performance data. We will get to that later.
When we profile the above program for, let's say, 50ms
, we get the following profile represented as a flame or icicle graph.
Even though, in this example, the majority of time is spent in the function c
, all the rectangles have the same width.
This is because the width of the rectangle for a function represents the time spent in that function and its child functions.
Let's add more calls to c
and see how the graph changes.
package main
func c() {
- for {}
+ for i := 0; i < 1_000_000; i++ {}
}
func b() {
c()
+ c()
+ c()
}
func a() {
b()
}
func main() {
a()
}
It doesn't seem like something has changed, right? Why is that?
This is called merging
. Even though, it takes more time to execute the function calls in total, relative to the other functions, the width of the rectangle for c
remains the same.
It doesn't matter how many times we call the same function at the same level.
We will discuss merging in detail later.
Let's go back to the original program and add a call to a new function d
and e
in the function b
.
```diff
package main
func c() {
for i := 0; i < 1_000_000; i++ {}
}
+func d() {
+ for i := 0; i < 1_000_000; i++ {}
+}
+
+func e() {
+ for i := 0; i < 1_000_000; i++ {}
+}
+
func b() {
+ e()
+ d()
c()
}
func a() {
b()
}
func main() {
a()
}
You can see that the rectangle for the function b
now has three rectangles as its children.
And the rectangle for the function b
covers all the calls.
This is because the width of the rectangle for the function
b
represents the time spent in the functionb
and its child functions.
Also, a keen eye will notice that the rectangles for the functions e
and d
are not represented in the graph in their call order.
This is because the graph ordering is NOT
based on the call order, or any other metric. They are just sorted alphabetically.
Remember: The y-axis does not show the passing of time, as most of the time-series graphs do.
Similarly, let's re-arrange the code a bit. We will move the call to c
from the function b
to the function a
.
And we will add a call to the function f
in the function b
. And have a c
the same as b
.
The code will look like this:
package main
func c() {
- for i := 0; i < 1_000_000; i++ {}
+ d()
+ e()
+ f()
}
func d() {
for i := 0; i < 1_000_000; i++ {}
}
func e() {
for i := 0; i < 1_000_000; i++ {}
}
+
+func f() {
+ for i := 0; i < 1_000_000; i++ {}
+}
+
func b() {
- c()
d()
e()
+ f()
}
func a() {
b()
+ c()
}
func main() {
a()
}
You can see that the rectangle for the function a
now has two rectangles as its children.
And the b
and c
rectangles are now children of the a
rectangle. And the d
, e
and f
rectangles are now children of the b
and c
rectangles.
But the leaf rectangles are not merged.
Hmm, why is that?
It's because even though the functions
d
,e
andf
are the same and are at the same level, they are not called from the same function.
They have different parents/ancestry. So, they are not merged.
Through a step-by-step examination of a single, straightforward program, we have unraveled the construction of icicle and flame graphs.
Now, let us venture further and delve into more intricate scenarios to deepen our understanding of these powerful visualizations.
Colors
But wait, how about colors?
In both icicle and flame graphs, colors are used to visually differentiate functions and make the graphs more readable. The colors usually do not carry any specific meaning related to performance data. Instead, they are chosen to make distinguishing between different functions and their relationships within the call stack hierarchy easier.
Typically, colors are assigned randomly or based on a hashing algorithm applied to the function names. This ensures that adjacent rectangles representing different functions have contrasting colors, making it easier to identify and trace individual functions within the graph.
However, some tools may allow you to customize the color scheme or use colors to represent additional information, such as memory usage or specific categories of functions (e.g., I/O-bound, CPU-bound, etc.).
In such cases, the color assignments and their meanings depend on the specific tool or configuration used to generate the icicle or flame graph. Always refer to the documentation or legend accompanying the graph to understand the color representation if it carries any specific meaning in your particular case.
Conventionally, the top-to-bottom order of the call stacks looks like icicles. And the color palette is chosen from rather cold colors.
On the other hand, the bottom-to-top order of the call stacks looks like flames. And the color palette is chosen from warm colors to represent a flame.
The colors usually do not carry any specific meaning related to performance data. If the rectangles are red, it does NOT mean that the function is taking relatively more resources.
So, where were we?
Now, let us venture further and delve into more intricate scenarios to deepen our understanding of these powerful visualizations.
We've learned how to visualize a single call stack collected from a program. But what happens when we extend our focus to multiple call stacks collected over time on a busy host with thousands of programs running simultaneously?
Let's dive in and find out.
Anatomy of Continuous profiling
In the life cycle of a program, there are many call stacks. Even in a second, a program can have thousands of calls and in a profile that we collected from a program for a minute, there can be millions of call stacks. And when you profile a host with multiple programs running, there can be billions of call stacks.
We need to reduce collected data to feasibly manage it. For that, we take samples of call stacks for a given interval and then aggregate them.
There are three major steps in the process of aggregating call stacks and building a profile.
Let's see them one by one. And visualize the process with animations using the graphs.
Sampling
Sampling gives us statistically significant and sufficient data for performance analysis. It provides a representative overview of a program's behavior without the need to collect every single event. By continuously capturing call stacks, we obtain a snapshot of the program's execution that reflects the overall trends and patterns.
This approach minimizes the performance overhead and reduces the amount of data to be analyzed, making it more manageable and efficient. While collecting all events might provide a more comprehensive view, it can lead to high overhead and an overwhelming amount of data, often making it impractical and counterproductive for performance optimization purposes. Sampling strikes the right balance between data collection and practicality, ensuring a reliable and efficient method for performance analysis.
You can see below that we take samples for a given interval (e.g. 50ms), over some time (e.g. 1 minute).
Sorting
Stack traces in a graph are sorted alphabetically.
We sort the samples alphabetically because even the samples taken for a given interval are aggregated and the order of the samples is not guaranteed. And our goal is to determine the bottlenecks of the programs. Thus we do not need to keep the order of the samples.
Merging (Grouping)
Lastly, we merge the samples that have the same call stack. We have already seen a similar process in the previous section on a single call stack when we merged the functions in the same level with the same name and same ancestry. In the case of continuous profiling, we merge the samples that have the same call stack for a given interval. Remember our goal is to determine the bottlenecks of the programs. Thus we do not need to keep the order of the samples.
We only merge the samples that have the same call stack.
This is how a profile collected over time and represented in icicle graphs or flame graphs can provide valuable insights into the performance of your software, enabling you to identify bottlenecks and optimize code effectively.
Understanding the icicle and flame graphs
As we continue our journey, we will continue building up a deeper understanding of icicle and flame graphs, shedding light on their intricacies and further emphasizing their importance in performance analysis.
Let's dive in and solidify our knowledge by looking at an example profile:
The graph is a tree. The root of the tree is the function that was consuming the most resource. But it's cumulative. It's the sum of all the resources consumed by the function and its children.
The bottom edge of the graph shows the function that was solely consuming the mostly resource.
In this case, they are d
, f
, and c
.
The vertical axis shows the stack depth.
For example, one of the leaves is f
function.
The function above it is f
's parent, which is f
's caller. In this case, it's e
. The function above e
is b
, and so on.
We can visually compare the length of the rectangles to see which function is consuming more resources.
For example, in this case, b
is consuming more resources than c
.
The horizontal axis shows the samples.
It does not show the passing of time, as most of the time-series graphs do.
What to do with this knowledge?
Now that we've delved into the captivating realm of icicle and flame graphs, it's time to turn our attention to applying this newfound knowledge.
Performance optimization is crucial for any software application, as it can greatly impact the user experience and system resources. However, identifying the causes of performance issues can be a daunting task, especially when dealing with large codebases and complex call stacks. This is where icicle and flame graphs come in handy.
-
Visualizing complex data: Icicle and flame graphs provide a graphical representation of the call stack data, making it easier to visualize and understand the relationships between different functions and their resource consumption.
-
Identifying bottlenecks: By visualizing the call stack data, icicle and flame graphs enable developers to identify performance bottlenecks, such as functions consuming a significant amount of CPU time or memory.
-
Comparing different profiles: Icicle and flame graphs can be used to compare different profiles, helping developers understand the impact of changes in the codebase and make informed decisions about potential optimizations.
- Platform-agnostic: Both icicle and flame graphs can be generated from various programming languages and platforms, making them versatile tools for developers working on diverse projects.
Finding Performance Issues
- Look for wide blocks: Start by analyzing the icicle or flame graph to identify functions that consume a significant amount of resources, such as CPU time or memory. These functions are usually represented by wider rectangles in the graph, indicating that they are taking up a large portion of the total execution time. Make a list of these functions as potential targets for optimization.
Wider blocks indicate a higher resource usage or longer duration. Focus on these blocks to identify potential bottlenecks.
-
Examine the call stack: Examine the call stack hierarchy to understand the relationship between different functions. This will help you identify if the performance issue is caused by a single function or a combination of functions. Understanding the call stack can also reveal opportunities for optimization, such as refactoring or eliminating redundant calls.
-
Look for tall stacks: In these graphs, tall stacks indicate deep call hierarchies, which may signify complex and inefficient code.
- Examine recurring patterns: Repeated patterns in the graph may indicate redundant or repetitive code that could be optimized.
Comparing Versions
-
Overlay graphs: Overlay two graphs to compare different versions of the same system, making it easy to identify improvements or regressions.
-
Analyze differences: Look for changes in block width or stack height to determine if specific functions have become more or less efficient.
You're lucky! You can easily compare graphs using Parca and filter by functions. Go and check it out! https://www.youtube.com/watch?v=eN0lA91xwn8
Optimize the code
Once you have identified the problematic functions and analyzed the call stack, start optimizing the code. This may involve:
a. Optimizing algorithms: Replace inefficient algorithms with more efficient ones or use data structures that provide better performance for the specific use case.
b. Reducing function call overhead: Minimize the number of function calls, especially in performance-critical code paths.
c. Parallelizing code: If the performance bottleneck is caused by CPU-bound operations, consider parallelizing the code to take advantage of multiple processor cores.
d. Reducing memory usage: Optimize memory allocation and deallocation, and minimize memory fragmentation.
Measure the impact
After making optimizations, collect new profiling data and generate updated icicle or flame graphs. Compare the new graphs with the previous ones to measure the impact of your changes.
If the optimizations have been successful, you should see a reduction in the width of the problematic functions.
Iterate and refine
Performance optimization is an iterative process. Continue to analyze the updated graphs, identify new bottlenecks, and optimize your code accordingly.
Keep refining your optimizations until you achieve the desired performance level.
Conclusion
Through the clear and intuitive visualization of call stack data, these graphs facilitate a better understanding of the intricate relationships between functions and their resource consumption for developers. Regardless of your choice between icicle graphs, or flame graphs, these potent visualization tools can notably augment your capacity to scrutinize performance data. By identifying bottlenecks, understanding the call stack hierarchy, optimizing code, measuring the impact, and iterating on the process, you can significantly enhance the performance.
Ultimately, this empowers you to elevate your user experience.
As always, we've got you covered. You can easily compare graphs using Parca and Polar Signals Cloud.
If you would like to effortlessly gather performance insights from your production systems, sign up at https://www.polarsignals.com
Check out the latest version of Parca UI:
Special thanks to Seray AI for her amazing work on visualizations and animations.
As we reach the conclusion of our exploration into performance profiling, flame graphs, and icicle graphs, I'd like to share a personal tidbit that inspired the title of this post.
A song of Ice and Fire is a series of epic fantasy novels by the American novelist and screenwriter George R. R. Martin.
As a devoted fan of the "A Song of Ice and Fire" book series (the books, not the TV show), I've immersed myself in each novel and, unfortunately, watched every episode of the show, including the less popular final season. I'm equally passionate about performance profiling, flame graphs, and icicle graphs, and have been using them for years to improve my applications.
Embracing my love for both literature and technology, I decided to name this blog post after the enthralling "A Song of Ice and Fire" series, which held a special place in my heart until I discovered the captivating world of "The Wheel of Time". When it came time to write a blog post on flame and icicle graphs, I couldn't resist blending their story with the series title.
Also, my recent trip to the stunning land of Iceland added to my inspiration, as I fell in love with the country's majestic beauty. Coincidentally, Iceland's motto is "Ice and Fire," and the landscapes were even used for some memorable scenes in the Game of Thrones series. This delightful connection adds another layer to the intricate interplay of ice, fire, and performance analysis that this blog post explores.
If you enjoyed this post and are eager to delve deeper into similar topics, don't hesitate to explore my other blog posts.
Fantastic Symbols and Where to Find Them - Part 1 and Part 2.
Sources
- https://www.brendangregg.com/flamegraphs.html
- https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
- https://queue.acm.org/detail.cfm?id=2927301
- https://youtu.be/6uKZXIwd6M0
- https://youtu.be/6uKZXIwd6M0
- https://www.webperf.tips/tip/understanding-flamegraphs/
If you have any questions or comments, feel free to reach out to us from the links below.