Doubling Throughput with Continuous Profiling and Optimization

“68.37% of CPU [...] with a one-line code change [...] went down to 31.82%”

February 11, 2025

Overview

S2 is a serverless API designed for streaming data with a mission to make streams a cloud storage primitive. As a data-intensive cloud infrastructure, S2 sought ways to optimize performance and efficiently manage system bottlenecks. By leveraging Polar Signals Cloud, S2 was able to implement continuous profiling, leading to significant performance improvements and cost savings.

Challenge

S2 needed a solution to identify optimization opportunities during load testing and gain deeper insights into system bottlenecks under normal operations. The inefficiencies in CPU usage directly impacted the number of users S2 could serve and their operational costs. Without an efficient profiling tool, troubleshooting and optimizing performance became cumbersome and time-consuming.

Solution

Polar Signals Cloud provided S2 with continuous profiling capabilities, allowing the team to easily analyze performance data rather than relying on ad hoc tools like perf. Key features that stood out during the selection process included:

  • pprof.me: An immutable long-term snapshot for profiling.
  • Inverting call stacks: A powerful feature to pinpoint the cumulative impact of specific functions.

With these capabilities, S2 was able to identify and implement optimizations that significantly improved system efficiency.

Why Polar Signals

Polar Signals Cloud stood out due to its ability to provide continuous profiling with intuitive slicing and dicing of data. The flexibility in pricing also played a crucial role in addressing S2's unique challenges, as resource usage was influenced not just by CPU cores but also by code deployment frequency.

Results

By utilizing Polar Signals Cloud, S2 achieved measurable performance improvements, including:

  • CPU Optimization: During load testing, S2 discovered that computing SHA256 checksums consumed 68.37% of CPU when writing to three S3 Express directory buckets for regional durability. A one-line code change enabled hardware acceleration on Graviton via the sha2 library, reducing CPU usage to 31.82%. This change effectively doubled throughput without increasing compute costs.
“ 68.37% of CPU was spent computing these checksums. With a one-line code change to enable hardware-acceleration on Graviton via the sha2 library, this went down to 31.82%. This improvement allows us to push at least 2x more throughput from these processes without increasing our compute spend.” - Shikhar, CEO of S2

Before

After

  • Checksum Processing Efficiency: The AWS S3 Rust SDK was found to be unnecessarily recomputing CRC32C checksums. Identifying this issue led to the implementation of a simple workaround, further improving efficiency.
  • Memory Allocation Improvements: Profiling also highlighted excessive CPU time spent in reallocating memory, which was resolved by reserving memory upfront.

By implementing these optimizations, S2 was able to enhance system performance, reduce operational costs, and improve overall efficiency. The insights provided by Polar Signals Cloud were instrumental in achieving these outcomes.


Discuss:
Sign up for the latest Polar Signals news