Using NVIDIA Nsight Systems in Containers and the Cloud

Gone are the days when it was expected that a programmer would “own” all the systems that they needed. Modern computational work frequently happens in shared systems, in the cloud, or otherwise on hardware not owned by the user or even their employer.

This is good for developers. It can save time and money by allowing for testing and development on multiple architectures or OSs without prohibitive expense. But it can present real challenges as well, including handling set up, negotiating security constraints, and navigating access requirements for resources.

Trying to use the right tools in those environments is sometimes complicated, so here is a handy guide to help you use the NVIDIA Nsight family of tools wherever you are doing your development.

Diagram showing how the structure of apps in a Docker container differ from apps in VMs built on Hypervisor.
Figure 1 Basic structure of containers and VMs

VMs and Containers and Clouds, Oh My!

First, here’s a breakdown of some options.

As shown in Figure 1, both virtual machines (VMs) and containers are used to create a self-contained environment that allows you to run your application in isolation on a shared system. The main difference is that when you run a container, the underlying platform provides an operating system for basic services with a limited degree of isolation using Linux namespaces. With VMs, the VM must provide the operating system itself, and VM support in the hardware is used to enforce isolation.

For this reason, VMs have higher overhead than containers, so containers are usually used when hardware is shared by many users.  “Bare metal” instances in the cloud are a thin hypervisor layer, basically a VM.

Containers are a widely adopted method of taming the complexity of deploying HPC and AI software. The entire software environment, from the deep learning framework or HPC application down to the math and communication libraries necessary for performance, is packaged into a single, easily deployed bundle. 

Because workloads inside a container always use the same environment, the performance is reproducible and portable across a wide variety of systems. It is possible to easily deploy the same containerized software environment on your local workstation, a DGX server in your datacenter, your site’s shared HPC cluster, and the cloud.

There are several container runtimes available for use, including Docker, Singularity, and Podman. The best container technology for you depends on the administrative requirements of the cluster that you are working on, particularly security and job scheduling requirements. In the examples below, we focus on Docker and Singularity as two of the most commonly used container technologies.

Likewise, there are dozens of cloud service providers (CSPs) to choose from. If your organization does not already have a designated CSP, consider aspects of service, pricing, and GPU availability when making your decision.

Container runtimes often limit the access of software running inside the container. This can be a challenge for profiling tools like NVIDIA Nsight Systems or NVIDIA Nsight Compute. For this post, we are focusing on considerations when using Nsight Systems. Look for a future post covering Nsight Compute.

Setting Up and Using Nsight Systems Inside Containers (Docker/Singularity)

There are different ways to work with Nsight Systems in a container environment.  You can map it into an existing container or build a new container image that includes it. After it’s installed, you can run the tool inside the container or remote from the tool into the container for analysis.

This post assumes that you have experience with Docker or Singularity containers. If you have general questions, see the or documentation.

Enable Sampling in Containers

Nsight Systems samples CPU activity and gets backtraces using the Linux kernel’s perf subsystem. To collect thread scheduling data and instruction pointer (IP) samples, the perf paranoid level on the target system must be ≤2. Run the following command to check the level:

cat /proc/sys/kernel/perf_event_paranoid

If the output is >2, then run the following command to temporarily adjust the paranoid level (after each reboot):

sudo sh -c 'echo 2 >/proc/sys/kernel/perf_event_paranoid'

To make the change permanent, run the following command:

sudo sh -c 'echo kernel.perf_event_paranoid=2 > /etc/sysctl.d/local.conf'

When performing a Nsight Systems collection with sampling in a Docker container, additional steps are required to enable the perf_event_open system call to enable Linux perf. These steps are not required if you are using Singularity.

There are two ways to enable the perf_event_open syscall. You can enable it by using the --cap-add=SYS_ADMIN switch. If your system meets the requirements, you can also enable it by setting the seccomp security profile.

Secure computing mode (seccomp) is a feature of the Linux kernel that can be used to restrict an application’s access. This feature is available only if the kernel is enabled with seccomp support. To check for seccomp support, use the following command:

$ grep CONFIG_SECCOMP= /boot/config-$(uname -r)

The result should contain the following line:


Seccomp profiles require seccomp 2.2.1, which is not available on some older distributions (for example, Ubuntu 14.04, Debian Wheezy, or Debian Jessie). To use seccomp on older distributions, you must download the latest static Linux binaries rather than packages.

Download the default seccomp profile file, default.json, relevant to your Docker version. If perf_event_open is already listed in the file as guarded by CAP_SYS_ADMIN, then remove the perf_event_open line. Add the following lines under “syscalls” and save the resulting file as default_with_perf.json:

   "name": "perf_event_open",
   "action": "SCMP_ACT_ALLOW",
   "args": []

To apply the new seccomp profile, use the following switch when starting the Docker container.

--security-opt seccomp=default_with_perf.json

To make sure that your container is set up properly for Nsight Systems, run the status command inside the container to check your environment:

$ nsys status -e

This lets you know whether features are limited in the current environment.

Installing Nsight Systems from NGC and the CUDA Toolkit

The simplest way to profile with Nsight Systems in a container is to download one of the containers from the NVIDIA GPU Cloud (NGC) . Many of these containers, such as the NGC 19.11 , already include Nsight Systems and just work out of the box. 

If a NGC container does not have Nsight Systems pre-installed, install it using one of the following commands, depending on the CUDA toolkit version used in the container. If you are not sure which CUDA toolkit version is included in your container, the output of nvidia-smi shows the CUDA toolkit version.

CUDA 10.1:

$ apt-get update -y
$ apt-get install -y cuda-nsight-systems-10-1 nsight-systems-2019.3.7

CUDA 10.2:

$ apt-get update -y
$ apt-get install -y cuda-nsight-systems-10-2 nsight-systems-2019.5.2

However, the frequently has more features and fixes than the version available with the CUDA toolkit. To use the most recent version, build or modify your own container image.

Adding Nsight Systems to your Existing Docker Container

Adding the latest Nsight Systems to your existing Docker container image is a simple, two-step process.  First, download the latest Nsight Systems by choosing Download Now on the . Select either the RPM package for RHEL-based Linux distributions (for example, CentOS) or the DEB package for Debian-based Linux distributions (for example, Ubuntu).  

Second, add the following Dockerfile code example to your existing Dockerfile. Use the appropriate code example depending on whether your container image is RHEL-based or Debian-based. If you are not sure what Linux distribution your container image is based on, try looking at /etc/os-release in the container image.  Most NGC images are based on Ubuntu.

ARG NSYS_PKG=NVIDIA_Nsight_Systems_Linux_2020.1.1.65.rpm 
COPY $NSYS_PKG /var/tmp
RUN yum install -y /var/tmp/$NSYS_PKG && \     
rm -rf /var/cache/yum var countVars = {"disqusShortname":"nvparallelforall"};