Implementing Performance Monitoring

Last updated: May 10, 2025

Overview

GPU performance can vary over time with factors like utilization and temperature. To ensure your application runs smoothly, it’s important to monitor these metrics and take action if they exceed certain thresholds. This can be accomplished easily using python, nvidia-smi and the psutil library. We recommend instrumenting this as a background process, so that it runs independently of your application. This way, you can monitor the performance of your application without affecting its performance.

Getting GPU Stats

To get GPU stats, you can use the nvidia-smi command which is available by default in all gpu instances. This command provides a wealth of information about the GPU, including utilization, memory usage, temperature, and more. You can see the complete list of options by running nvidia-smi --help-query-gpu. To get this information in python, we can use the subprocess library to run the command and capture its output. Here’s an example of how to do this:

import subprocess
import csv

def get_cuda_version():
    try:
        # Run nvidia-smi -q command
        nvidia_output = subprocess.run(
            ["nvidia-smi", "-q"],
            capture_output=True,
            text=True
        ).stdout

        # Filter for lines containing "CUDA Version"
        cuda_lines = [line.strip() for line in nvidia_output.split('\n')
                      if "CUDA Version" in line]

        if cuda_lines:
            # Extract version from the first matching line
            # Typically in format "    CUDA Version    : 12.6"
            return cuda_lines[0].split(':')[1].strip()
        else:
            raise ValueError("CUDA version not found in nvidia-smi output")

    except Exception as e:
        return f"An error occurred: {str(e)}"


def get_gpu_stats(stats_to_query: list = []):
    """
    Run nvidia-smi with JSON output format and return the parsed data
    All options: `nvidia-smi --help-query-gpu`
    """

    # Run nvidia-smi with csv output format
    result = subprocess.run(
        ["nvidia-smi",
            f"--query-gpu={','.join(stats_to_query)}", "--format=csv,nounits"],
        capture_output=True,
        text=True,
        check=True
    )

    cuda_version = get_cuda_version()

    # Process the CSV output
    csv_data = result.stdout.strip().splitlines()
    reader = csv.DictReader(csv_data)
    stats = []
    for row in reader:
        data = {key.strip(): value.strip()
                for key, value in row.items()}
        data["cuda_version"] = cuda_version
        for key, value in data.items():
            # Convert numeric values to appropriate types
            if value.isdigit():
                data[key] = int(value)
            else:
                try:
                    data[key] = float(value)
                except ValueError:
                    pass
        stats.append(data)
    return stats


# Full list of options available with `nvidia-smi --help-query-gpu`
gpu_stats_to_query = [
    "timestamp",
    "index",
    "gpu_name",
    "driver_version",
    "memory.total",
    "memory.free",
    "memory.used",
    "memory.reserved",
    "compute_cap",
    "utilization.gpu",
    "utilization.memory",
    "temperature.gpu",
    "temperature.memory",
    "power.draw",
    "power.limit"
]
gpu_stats = get_gpu_stats(gpu_stats_to_query)
print(gpu_stats)

You will get an output like this:

[
  {
    "timestamp": "2025/03/31 15:21:49.377",
    "index": 0,
    "name": "NVIDIA GeForce RTX 3080 Ti Laptop GPU",
    "driver_version": 561.19,
    "memory.total [MiB]": 16384,
    "memory.free [MiB]": 14804,
    "memory.used [MiB]": 1372,
    "memory.reserved [MiB]": 209,
    "compute_cap": 8.6,
    "utilization.gpu [%]": 4,
    "utilization.memory [%]": 12,
    "temperature.gpu": 49,
    "temperature.memory": "N/A",
    "power.draw [W]": 21.09,
    "power.limit [W]": "[N/A]",
    "cuda_version": 12.6
  }
]

Getting System Stats

Other system information, such as CPU and Memory utilization, can be obtained using the psutil library. This library is a cross-platform library for retrieving information on running processes and system utilization. It is not included by default in the base python installation, so you will need to install it separately. You can do this using pip:

pip install psutil

Once you have psutil installed, you can use it to get system stats like CPU and Memory utilization. Here’s an example of how to do this:

import psutil

def get_system_stats():
    cpu_percent = psutil.cpu_percent(interval=1, percpu=True)
    cpu_freq = psutil.cpu_freq()

    # Get memory stats
    virtual_mem = psutil.virtual_memory()
    swap_mem = psutil.swap_memory()

    return {
        "cpu_percent": cpu_percent,
        "cpu_freq": cpu_freq,
        "virtual_memory": {
            "total": virtual_mem.total,
            "available": virtual_mem.available,
            "used": virtual_mem.used,
            "free": virtual_mem.free,
            "percent": virtual_mem.percent
        },
        "swap_memory": {
            "total": swap_mem.total,
            "used": swap_mem.used,
            "free": swap_mem.free,
            "percent": swap_mem.percent
        }
    }

system_stats = get_system_stats()
print(system_stats)

You will get an output like this:

{
  "cpu_percent": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0],
  "cpu_freq": [2918.3989999999994, 0.0, 0.0],
  "virtual_memory": {
    "total": 25329766400,
    "available": 20256694272,
    "used": 4660776960,
    "free": 19216257024,
    "percent": 20.0
  },
  "swap_memory": {
    "total": 7516192768,
    "used": 0,
    "free": 7516192768,
    "percent": 0.0
  }
}

Combined example

import psutil
import subprocess
import csv


def get_cuda_version():
    try:
        # Run nvidia-smi -q command
        nvidia_output = subprocess.run(
            ["nvidia-smi", "-q"],
            capture_output=True,
            text=True
        ).stdout

        # Filter for lines containing "CUDA Version"
        cuda_lines = [line.strip() for line in nvidia_output.split('\n')
                      if "CUDA Version" in line]

        if cuda_lines:
            # Extract version from the first matching line
            # Typically in format "    CUDA Version    : 12.6"
            return cuda_lines[0].split(':')[1].strip()
        else:
            raise ValueError("CUDA version not found in nvidia-smi output")

    except Exception as e:
        return f"An error occurred: {str(e)}"


def get_gpu_stats(stats_to_query: list = []):
    """
    Run nvidia-smi with JSON output format and return the parsed data
    All options: `nvidia-smi --help-query-gpu`
    """

    # Run nvidia-smi with csv output format
    result = subprocess.run(
        ["nvidia-smi",
            f"--query-gpu={','.join(stats_to_query)}", "--format=csv,nounits"],
        capture_output=True,
        text=True,
        check=True
    )

    cuda_version = get_cuda_version()

    # Process the CSV output
    csv_data = result.stdout.strip().splitlines()
    reader = csv.DictReader(csv_data)
    stats = []
    for row in reader:
        data = {key.strip(): value.strip()
                for key, value in row.items()}
        data["cuda_version"] = cuda_version
        for key, value in data.items():
            # Convert numeric values to appropriate types
            if value.isdigit():
                data[key] = int(value)
            else:
                try:
                    data[key] = float(value)
                except ValueError:
                    pass
        stats.append(data)
    return stats


def get_system_stats():
    cpu_percent = psutil.cpu_percent(interval=1, percpu=True)
    cpu_freq = psutil.cpu_freq()

    # Get memory stats
    virtual_mem = psutil.virtual_memory()
    swap_mem = psutil.swap_memory()

    return {
        "cpu_percent": cpu_percent,
        "cpu_freq": cpu_freq,
        "virtual_memory": {
            "total": virtual_mem.total,
            "available": virtual_mem.available,
            "used": virtual_mem.used,
            "free": virtual_mem.free,
            "percent": virtual_mem.percent
        },
        "swap_memory": {
            "total": swap_mem.total,
            "used": swap_mem.used,
            "free": swap_mem.free,
            "percent": swap_mem.percent
        }
    }


def get_all_stats():
    gpu_stats = get_gpu_stats([
        "timestamp",
        "index",
        "gpu_name",
        "driver_version",
        "memory.total",
        "memory.free",
        "memory.used",
        "memory.reserved",
        "compute_cap",
        "utilization.gpu",
        "utilization.memory",
        "temperature.gpu",
        "temperature.memory",
        "power.draw",
        "power.limit"
    ])
    system_stats = get_system_stats()
    system_stats["gpu"] = gpu_stats
    return system_stats


if __name__ == "__main__":
    stats = get_all_stats()
    print(stats)

You will get an output like this:

{
  "cpu_percent": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0],
  "cpu_freq": [2918.3989999999994, 0.0, 0.0],
  "virtual_memory": {
    "total": 25329766400,
    "available": 20256694272,
    "used": 4660776960,
    "free": 19216257024,
    "percent": 20.0
  },
  "swap_memory": {
    "total": 7516192768,
    "used": 0,
    "free": 7516192768,
    "percent": 0.0
  },
  "gpu": [
    {
      "timestamp": "2025/03/31 15:21:49.377",
      "index": 0,
      "name": "NVIDIA GeForce RTX 3080 Ti Laptop GPU",
      "driver_version": 561.19,
      "memory.total [MiB]": 16384,
      "memory.free [MiB]": 14804,
      "memory.used [MiB]": 1372,
      "memory.reserved [MiB]": 209,
      "compute_cap": 8.6,
      "utilization.gpu [%]": 4,
      "utilization.memory [%]": 12,
      "temperature.gpu": 49,
      "temperature.memory": "N/A",
      "power.draw [W]": 21.09,
      "power.limit [W]": "[N/A]",
      "cuda_version": 12.6
    }
  ]
}

Reallocating Under-Performing Nodes

Now that we know how to get the stats, we can use this information to reallocate under-performing nodes. This can be done by checking the stats and if they exceed certain thresholds, we can reallocate the node using the IMDS Reallocation Endpoint. You must provide a reason to the reallocation endpoint. We use this data to continuously improve the quality of our network. Here’s an example of how to do this:

import requests
import os

max_gpu_temp = int(os.getenv("MAX_GPU_TEMP", "80"))



def reallocate_me(reason: str):
    url = 'http://169.254.169.254/v1/reallocate'
    headers = {
      'Content-Type': 'application/json',
      "Metadata": "true"
    }
    data = {
      "reason": reason
    }
    response = requests.post(url, headers=headers, json=data)
    if response.status_code == 200:
      print("Reallocation successfully requested.")
    else:
      print(f"Reallocation failed: {response.status_code} - {response.text}")

# Example usage
stats = get_all_stats()
if stats["gpu"][0]["temperature.gpu"] > max_gpu_temp:
    reallocate_me(f"GPU temperature exceeded threshold: {stats["gpu"][0]["temperature.gpu"]} / {max_gpu_temp}C")

Putting it All Together

You can put all of this together in a single script that runs continuously and checks the stats at regular intervals, taking any required actions. We recommend taking all configuration values from environment variables, so that you can adjust these values easily without rebuilding your container. monitor.py

import subprocess
import csv
import psutil
import requests
import os
import time

max_gpu_temp = int(os.getenv("MAX_GPU_TEMP", "80"))
max_vram_usage = float(os.getenv("MAX_VRAM_USAGE", "95"))
check_interval = int(os.getenv("CHECK_INTERVAL", "60"))  # in seconds


def get_cuda_version():
    try:
        # Run nvidia-smi -q command
        nvidia_output = subprocess.run(
            ["nvidia-smi", "-q"],
            capture_output=True,
            text=True
        ).stdout

        # Filter for lines containing "CUDA Version"
        cuda_lines = [line.strip() for line in nvidia_output.split('\n')
                      if "CUDA Version" in line]

        if cuda_lines:
            # Extract version from the first matching line
            # Typically in format "    CUDA Version    : 12.6"
            return cuda_lines[0].split(':')[1].strip()
        else:
            raise ValueError("CUDA version not found in nvidia-smi output")

    except Exception as e:
        return f"An error occurred: {str(e)}"


def get_gpu_stats(stats_to_query: list = []):
    """
    Run nvidia-smi with JSON output format and return the parsed data
    All options: `nvidia-smi --help-query-gpu`
    """

    # Run nvidia-smi with csv output format
    result = subprocess.run(
        ["nvidia-smi",
            f"--query-gpu={','.join(stats_to_query)}", "--format=csv,nounits"],
        capture_output=True,
        text=True,
        check=True
    )

    cuda_version = get_cuda_version()

    # Process the CSV output
    csv_data = result.stdout.strip().splitlines()
    reader = csv.DictReader(csv_data)
    stats = []
    for row in reader:
        data = {key.strip(): value.strip()
                for key, value in row.items()}
        data["cuda_version"] = cuda_version
        for key, value in data.items():
            # Convert numeric values to appropriate types
            if value.isdigit():
                data[key] = int(value)
            else:
                try:
                    data[key] = float(value)
                except ValueError:
                    pass
        stats.append(data)
    return stats


def get_system_stats():
    cpu_percent = psutil.cpu_percent(interval=1, percpu=True)
    cpu_freq = psutil.cpu_freq()

    # Get memory stats
    virtual_mem = psutil.virtual_memory()
    swap_mem = psutil.swap_memory()

    return {
        "cpu_percent": cpu_percent,
        "cpu_freq": cpu_freq,
        "virtual_memory": {
            "total": virtual_mem.total,
            "available": virtual_mem.available,
            "used": virtual_mem.used,
            "free": virtual_mem.free,
            "percent": virtual_mem.percent
        },
        "swap_memory": {
            "total": swap_mem.total,
            "used": swap_mem.used,
            "free": swap_mem.free,
            "percent": swap_mem.percent
        }
    }


def get_all_stats():
    gpu_stats = get_gpu_stats([
        "timestamp",
        "index",
        "gpu_name",
        "driver_version",
        "memory.total",
        "memory.free",
        "memory.used",
        "memory.reserved",
        "compute_cap",
        "utilization.gpu",
        "utilization.memory",
        "temperature.gpu",
        "temperature.memory",
        "power.draw",
        "power.limit"
    ])
    system_stats = get_system_stats()
    system_stats["gpu"] = gpu_stats
    return system_stats


def reallocate_me(reason: str):
    url = 'http://169.254.169.254/v1/reallocate'
    headers = {
      'Content-Type': 'application/json',
      "Metadata": "true"
    }
    data = {
      "reason": reason
    }
    response = requests.post(url, headers=headers, json=data)
    if response.status_code == 200:
      print("Reallocation successfully requested.")
    else:
      print(f"Reallocation failed: {response.status_code} - {response.text}")


if __name__ == "__main__":
    while True:
        stats = get_all_stats()
        gpu_temp = stats["gpu"][0]["temperature.gpu"]
        vram_usage = stats["gpu"][0]["utilization.memory [%]"]

        if gpu_temp > max_gpu_temp:
            reallocate_me(f"GPU temperature exceeded threshold: {gpu_temp} / {max_gpu_temp}C")
            break
        elif vram_usage > max_vram_usage:
            reallocate_me(f"VRAM usage exceeded threshold: {vram_usage} / {max_vram_usage}%")
            break

        time.sleep(check_interval)

You can then run this script in the background and it will continuously check the stats and take action if any of the thresholds are exceeded. A simple way to do this is to use & in your Dockerfile CMD to run the script in the background

CMD ["bash", "-c", "python monitor.py & python my_app.py"]

Conclusion

In this tutorial, we learned how to implement performance monitoring in your application using nvidia-smi and the psutil library. We also learned how to reallocate under-performing nodes using the IMDS Reallocation Endpoint. By monitoring the performance of your application, you can ensure that it runs smoothly and efficiently, even in the face of changing conditions. This is an important step in building a robust and reliable application.

Explanation

Tutorials

How-to Guides

Reference

Implementing Performance Monitoring

Overview

Getting GPU Stats

Getting System Stats

Combined example

Reallocating Under-Performing Nodes

Putting it All Together

Conclusion

Explanation

Tutorials

How-to Guides

Reference

​Overview

​Getting GPU Stats

​Getting System Stats

​Combined example

​Reallocating Under-Performing Nodes

​Putting it All Together

​Conclusion

Overview

Getting GPU Stats

Getting System Stats

Combined example

Reallocating Under-Performing Nodes

Putting it All Together

Conclusion