Skip to main content

Setup Continuous Profiling for GPUs using Kubernetes

Prerequisites:

  • Kubernetes Cluster.
  • Kubernetes Nodes are running Linux 5.4 or newer.
  • A NVIDIA GPU

Get an API token

To be able to send any data to Polar Signals Cloud, we're going to need an API token for the collection mechanism to authenticate with the Polar Signals API.

Please refer to the Generating Tokens documentation that has more details.

Instructions

The Kubernetes manifest below, will deploy the Polar Signals Agent and Polar Signals GPU Metrics Agent as DaemonSet to a Kubernetes cluster. Here's what it does in summary:

  1. Creates a namespace called polarsignals to deploy the agent into.
  2. Creates a secret containing the Polarsignals API token (which you should have created above) for authentication.
  3. Defines a ClusterRole and ClusterRoleBinding to grant the agent permissions to list pods and get node info across the cluster.
  4. Deploys the agent and gpu-metrics-agent as a DaemonSet. This will deploy two pod to each nodes in the cluster. The agent container runs with some privileged settings to enable profiling via eBPF.

The agent will then profile all applications running on the nodes and send the profiling data to the Polar Signals Cloud for queries and analysis.

The Polar Signals and GPU Agents

Download the both agents to YAML files.

curl -o polarsignals-agent.yaml "https://api.polarsignals.com/api/manifests.yaml?token=<your-token>"
curl -o gpu-metrics-agent.yaml "https://api.polarsignals.com/api/manifests.yaml?token=<your-token>&type=gpu-metrics-agent"

You can also Use the command below to apply the manifest directly from the Polar Signals API.

kubectl apply -f "https://api.polarsignals.com/api/manifests.yaml?token=<your-token>"
kubectl apply -f "https://api.polarsignals.com/api/manifests.yaml?token=<your-token>&type=gpu-metrics-agent"