Running CUDA 12 workloads on Ubuntu

vhsven

2024-08-15

Introduction

CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform that allows developers to harness the power of NVIDIA GPUs for general-purpose computing (GPGPU). CUDA provides a suite of tools and libraries that enable high-performance computing on GPUs, making it a go-to solution for a wide range of computational tasks, including deep learning.

CUDA competes with other GPU computing platforms, such as AMD's ROCm and Intel's OneAPI. Both ROCm and OneAPI are open-source platforms that offer similar capabilities to CUDA. However, CUDA remains dominant, especially in the AI and deep learning space, due to its mature ecosystem and widespread support.

CUDA can be deployed on various operating systems, including Linux, Windows, and macOS. However, it is worth noting that CUDA support for macOS was discontinued after version 12.5 due to Apple's transition to the new ARM-based architecture (Apple Silicon).

In this blog post, we will dive into CUDA by exploring it across three layers:

System-wide setup: We will cover the installation and configuration of the graphics driver.
CUDA and cuDNN setup: We will discuss two different approaches: system-wide or isolated.
Using CUDA: How to leverage CUDA in your projects, including the installation of GPU-accelerated libraries and frameworks.

System overview

For this guide, we will walk through setting up CUDA on a Linux system. Specifically, we will be using Ubuntu 23.10. While it would have been ideal to use a long-term support (LTS) version like 24.04, the differences in setup will be minimal.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 23.10
Release:        23.10
Codename:       mantic

$ uname -m
x86_64

$ uname -r
6.5.0-44-generic

$ ldd --version
ldd (Ubuntu GLIBC 2.38-1ubuntu6.3) 2.38
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

Linux display driver Setup

When setting up CUDA on Linux, one of the first steps is ensuring that your display driver is correctly installed. On Linux, this is always a system-wide process, and you have two main options: the proprietary NVIDIA driver or the open-source Nouveau driver.

Proprietary drivers: The official drivers provided by NVIDIA, offering the best performance and full support for CUDA. Common versions include 470, 525, 535, 545 and 550.

Nouveau drivers: The Nouveau driver is an open-source alternative to NVIDIA's proprietary driver. While it provides basic functionality and is a good choice for general use, it does not support CUDA.

In this guide, we will be using the proprietary NVIDIA drivers. These drivers also include the nvidia-smi tool, which is vital for managing and monitoring your GPU.

There are two main procedures for installing the drivers: automatic or manual.

Automatic Installation

The easiest way to install the appropriate NVIDIA driver is through the automatic installation process, which detects your GPU and recommends the best driver.

$ ubuntu-drivers devices
== /sys/devices/pci0000:40/0000:40:01.1/0000:41:00.0 ==
modalias : pci:v000010DEd00002204sv000010DEsd0000147Dbc03sc00i00
vendor   : NVIDIA Corporation
model    : GA102 [GeForce RTX 3090]
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-545 - distro non-free
driver   : nvidia-driver-545-open - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-535 - distro non-free recommended
driver   : nvidia-driver-535-open - distro non-free
driver   : nvidia-driver-535-server-open - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

$ ubuntu-drivers list
$ ubuntu-drivers install

Manual Installation

For those who prefer more control over the installation process, or if you want the latest drivers not available in the default Ubuntu repositories, you can manually install the driver.

You can install the display drivers either from the default Ubuntu repositories or from the additional PPA (Personal Package Archive) provided by Ubuntu's graphics drivers team if you want the latest and greatest.

$ sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt update

$ sudo apt install nvidia-driver-535

This command installs the NVIDIA driver version 535, but you can replace "535" with your desired version number.

Once installed, you can verify that the driver is correctly set up using the nvidia-smi tool:

$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:41:00.0  On |                  N/A |
|  0%   45C    P8              32W / 350W |    541MiB / 24576MiB |      7%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off | 00000000:42:00.0 Off |                  N/A |
|  0%   40C    P8              19W / 350W |     10MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2635      G   /usr/lib/xorg/Xorg                          192MiB |
|    0   N/A  N/A      2944      G   /usr/bin/gnome-shell                         82MiB |
|    0   N/A  N/A      3607      G   ...irefox/3626/usr/lib/firefox/firefox      134MiB |
|    0   N/A  N/A      4701      G   ...erProcess --variations-seed-version      116MiB |
|    1   N/A  N/A      2635      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

Important Notes

Stability vs. latest features: When choosing your driver version, consider the trade-off between stability and access to the latest features. Older drivers like version 470 are more stable and widely tested, while newer versions like 550 offer the latest updates and support for newer hardware.
GPU compatibility: Ensure that the driver you select is compatible with your GPU model. The automatic detection method mentioned above typically handles this well.
Bundled CUDA runtime: The NVIDIA driver comes with a minimal CUDA runtime (i.e., version 12.2) necessary for running basic CUDA applications. However, it does not include the full CUDA toolkit required for development purposes. The runtime version bundled with the driver will not change even if you separately install a proper CUDA runtime or toolkit.

To see where the CUDA runtime is located on your system, you can run:

$ find /usr -name libcuda.so*
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.535.171.04
/usr/lib/i386-linux-gnu/libcuda.so.1
/usr/lib/i386-linux-gnu/libcuda.so
/usr/lib/i386-linux-gnu/libcuda.so.535.171.04

This command will display the paths to the installed CUDA runtime libraries, which are essential for running CUDA-enabled applications.

Compute capability

Compute capability determines the hardware capabilities of your GPU and cannot be upgraded through software. Here's a table summarizing the compute capabilities of various NVIDIA GPU architectures:

Architecture	Compute Capability	GPU Models
Volta	7.0	V100
Turing	7.5	GeForce RTX 20xx, Quadro RTX 8000 and RTX 6000, Tesla T4
Ampere	8.x	A100 (8.0), GeForce RTX 30xx (8.6), RTX A6000 (8.6)
Ada Lovelace	8.9	GeForce RTX 40xx, RTX 6000 Ada
Hopper	9.0	H100, H200
Blackwell	10.x	B100, B200, GeForce RTX 5090

To check the compute capability of your GPU, you can use the following command:

$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
8.6
8.6

For more detailed information on compute capabilities and their implications, refer to the NVIDIA CUDA C Programming Guide.

CUDA and cuDNN

When working with CUDA, it is important to distinguish between the CUDA runtime and the CUDA toolkit, similar to the difference between the Java Runtime Environment (JRE) and the Java Development Kit (JDK). The NVIDIA driver includes a minimal CUDA runtime that enables you to run basic CUDA-enabled applications. However, this runtime is limited and does not include all the components needed for more advanced CUDA tasks.

The CUDA toolkit, on the other hand, is a comprehensive package that provides all the development tools necessary for creating, compiling, and running CUDA applications. It also includes a more complete runtime, which provides additional libraries and features needed for more complex applications. Most deep learning tasks will require this toolkit to function correctly.

When installing the CUDA toolkit, ensure that you only install versions that are less than or equal to the runtime version bundled with your display driver. Installing a higher version without updating the driver first can lead to instability.

In addition to the CUDA toolkit, some deep learning frameworks also require the CUDA Deep Neural Network (cuDNN) package.

System-wide installation

To set up CUDA and cuDNN system-wide, start by ensuring that the NVIDIA proprietary drivers are installed, as discussed earlier. Next, you will need to install a C++ compiler, which is required for compiling CUDA code. This can be done with sudo apt install gcc g++.

Once GCC is installed, you can proceed to install CUDA. You have two main options for this:

The easiest approach is to use the official Ubuntu repository by running sudo apt install nvidia-cuda-toolkit
Alternatively, for more control or to get the latest version, you can install CUDA directly from NVIDIA’s official source. This involves following the detailed instructions provided in the NVIDIA CUDA Installation Guide for Linux.

After installing CUDA, the next step is to set up cuDNN, which is essential for deep learning applications. You can do this by following the instructions in the NVIDIA cuDNN Installation Guide.

Isolated installation (recommended)

For situations where you need to manage multiple projects with different CUDA or cuDNN versions, setting up an isolated environment using Conda or Mamba is the recommended option. This approach keeps the system-wide components minimal and allows each environment to have its own specific setup.

Start by ensuring that the NVIDIA proprietary drivers are installed as discussed earlier, since they are the only system-wide component required. Next, install Mamba, which is a faster alternative to Conda, via Miniforge. You can verify that the installation was successful by running mamba info:

$ mamba info

          mamba version : 1.5.5
     active environment : None
            shell level : 0
       user config file : /home/user/.condarc
 populated config files : /home/user/miniforge3/.condarc
          conda version : 23.11.0
    conda-build version : not installed
         python version : 3.10.13.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=zen3
                          __conda=23.11.0=0
                          __cuda=12.2=0
                          __glibc=2.38=0
                          __linux=6.5.0=0
                          __unix=0=0
       base environment : /home/user/miniforge3  (writable)
      conda av data dir : /home/user/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/user/miniforge3/pkgs
                          /home/user/.conda/pkgs
       envs directories : /home/user/miniforge3/envs
                          /home/user/.conda/envs
               platform : linux-64
             user-agent : conda/23.11.0 requests/2.31.0 CPython/3.10.13 Linux/6.5.0-44-generic ubuntu/23.10 glibc/2.38 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.5
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False

Relevance of virtual packages in mamba environments

Mamba environments utilize virtual packages to dynamically detect and represent certain system features that are critical for package resolution and compatibility. These virtual packages include system-specific details such as architecture, the operating system, the version of the GNU C Library (glibc), and, importantly, the version of CUDA supported by your installed NVIDIA drivers.

Virtual packages are not installed in the traditional sense but are automatically detected by Mamba. They help the package manager resolve dependencies by ensuring that the packages you install are compatible with your system's underlying hardware and software.

Among these virtual packages, __cuda is particularly important when working with CUDA. It represents the maximum version of CUDA that your NVIDIA driver officially supports. This information is automatically detected and provided by Mamba, assisting in the selection of the appropriate CUDA toolkit and related packages for your environment.

The output above indicates that our system's NVIDIA drivers support CUDA up to version 12.2. While Mamba does not strictly enforce this version when installing the CUDA toolkit, it serves as a guideline. You can technically install lower or higher versions of the toolkit, but installing a version higher than what __cuda indicates is generally not recommended, as it could lead to instability or compatibility issues. If your project requires a higher than supported version of the toolkit, consider upgrading your graphics driver first.

Creating a new mamba environment

Once Mamba is set up, we can proceed to install the CUDA toolkit within an isolated environment. We can do this by creating a new environment and specifying the CUDA version we need, along with any other packages such as Python or cuDNN. For example:

$ mamba create -n my-environment python=3.12 cuda-toolkit=12.2 cudnn
$ mamba activate my-environment

This command sets up a new environment named my-environment with Python 3.12, CUDA toolkit 12.2, and a recent, compatible version of cuDNN.

After activating your environment, you can verify that CUDA is correctly installed by checking the version of the NVIDIA CUDA compiler:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Package overview

When working with CUDA 12, it is important to be aware of the significant changes in package names and structures compared to earlier versions. In CUDA 11 and earlier versions, the installation was typically done using the cudatoolkit or cudatoolkit-dev package. However, starting with CUDA 12, there has been a restructuring of package names. This shift makes installation trickier, especially as much of the older documentation still references these outdated package names.

The following meta-packages can be used to setup a CUDA 12 environment:

cuda-toolkit: This package (notice the hyphen) is now the primary way of installing CUDA development tools. Running mamba install cuda-toolkit=12.2, for instance, will typically provide all the necessary components for CUDA 12.2 development, including the compiler, libraries, and headers. The downside is that it is fairly big, and it will likely include much more than what is strictly necessary for your use case.
cuda-runtime: If you only need the runtime components, install this package. Note that this refers to the complete runtime, not the minimal one bundled with the driver.
cuda: This is a meta-package that pulls in both the toolkit and runtime. Seeing as the runtime contains a subset of the packages in the toolkit, this is in effect functionally equivalent to the cuda-toolkit package.
Other meta packages such as cuda-libraries, cuda-libraries-dev, cuda-compiler and cuda-tools can be used to specify more precisely what your project needs. Below is a simplified hierarchy of the most relevant packages. Check the appendix for a full list.

cuda
├── cuda-runtime
│   └── cuda-libraries
└── cuda-toolkit
    ├── cuda-compiler
    ├── cuda-libraries
    ├── cuda-libraries-dev
    └── cuda-tools
        └── cuda-command-line-tools

Finally, for the fullest possible control, refer to the actual CUDA packages instead of these meta-packages (which are simply groups of packages).

Note that neither cuda, cuda-runtime, nor cuda-toolkit include cuDNN. It is a separate package specifically tailored for deep learning applications, which needs to be installed independently as shown earlier.

Channel selection

When setting up CUDA and related packages in a Mamba environment, the various channels that offer similar packages might cause confusion. The two primary channels to consider are conda-forge (default in Miniforge) and nvidia, both of which offer nearly identical CUDA-related packages, including the CUDA toolkit, cuDNN, and other NVIDIA libraries. We will ignore the anaconda channel (default in Anaconda) because it typically hosts somewhat outdated package versions. Many mamba commands accept a -c <channel> option to include an extra channel on top of the default channels configured in the .condarc file.

The conda-forge channel is a widely used, community-driven repository known for its extensive package coverage beyond CUDA. This makes conda-forge particularly suitable for projects that require a mix of CUDA and other libraries. Additionally, conda-forge is continuously updated and maintained, ensuring that you have access to recent versions of packages.

In contrast, the official nvidia channel, maintained directly by NVIDIA, is dedicated specifically to CUDA and other NVIDIA tools. To install the complete CUDA suite from this channel, you can use mamba install -c nvidia cuda. Although the CUDA-related (meta-)packages in the nvidia channel are almost identical to those found in conda-forge, the nvidia channel provides slightly earlier access to the latest versions and includes some less common versions that may not be available on conda-forge.

In most cases, if your environment requires a wide range of software, conda-forge is likely the better option due to its extensive package offerings. It also helps to avoid some minor hiccups that can result from multi-channel package resolution.

Note that certain packages, such as pytorch also provide their own dedicated channel to install packages from.

Setting environment variables

For certain applications, you might need to manually set additional environment variables:

CUDA_HOME and CUDA_PATH
- These are interchangeable and typically point to the root of the CUDA toolkit folder, which contains the lib and bin directories.
- In the case of a conda environment, they should point to the environment's root.
LD_LIBRARY_PATH
- Add $CUDA_HOME/lib to this path to ensure your system can locate the necessary libraries.

$ echo $CONDA_PREFIX
/home/user/miniforge3/envs/my-environment

$ which nvcc
/home/user/miniforge3/envs/my-environment/bin/nvcc

$ export CUDA_HOME=$CONDA_PREFIX
$ export CUDA_PATH=$CUDA_HOME

Install frameworks and libraries

Once CUDA and cuDNN are set up, the next step is installing the python packages that leverage GPU acceleration. Depending on your development environment, you can install these using either pip in a virtual environment or mamba.

Here are some popular libraries and frameworks:

PyCUDA: Python wrapper for CUDA.
CuPy: NumPy-compatible library that runs on CUDA.
cuNumeric: A drop-in replacement for NumPy, optimized for CUDA.
RAPIDS: A suite of libraries for data science and analytics on GPUs, including cuDF (a faster pandas) and cuML (a faster scikit-learn).
Deep learning frameworks: TensorFlow, PyTorch, ONNX

For example, to install PyTorch with CUDA support using mamba:

$ mamba create -n torch-env -c pytorch -c nvidia python=3.12 pytorch-cuda=12.1 torchvision torchaudio


Looking for: ['python=3', 'pytorch-cuda=12.1', 'torchvision', 'torchaudio']

...

  Package                          Version  Build                         Channel           Size
──────────────────────────────────────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────────────────────────────────────

  + libcublas                    12.1.0.26  0                             nvidia           345MB
  + libcufft                      11.0.2.4  0                             nvidia           108MB
  + libcusolver                  11.4.4.55  0                             nvidia           103MB
  + libcusparse                  12.0.2.55  0                             nvidia           171MB
  + libnpp                       12.0.2.50  0                             nvidia           147MB
  + cuda-cudart                   12.1.105  0                             nvidia           193kB
  + cuda-nvrtc                    12.1.105  0                             nvidia            21MB
  + libnvjitlink                  12.1.105  0                             nvidia            18MB
  + libnvjpeg                    12.1.1.14  0                             nvidia             3MB
  + cuda-cupti                    12.1.105  0                             nvidia            16MB
  + cuda-nvtx                     12.1.105  0                             nvidia            58kB
  ...
  + libcurand                    10.3.7.37  0                             nvidia            54MB
  + libcufile                    1.11.0.15  0                             nvidia             1MB
  + cuda-opencl                    12.6.37  0                             nvidia            27kB
  + cuda-libraries                  12.1.0  0                             nvidia             2kB
  + cuda-runtime                    12.1.0  0                             nvidia             1kB
  ...
  + pytorch                          2.4.0  py3.12_cuda12.1_cudnn9.1.0_0  pytorch            1GB
  + torchtriton                      3.0.0  py312                         pytorch          245MB
  + torchaudio                       2.4.0  py312_cu121                   pytorch            7MB
  + torchvision                     0.19.0  py312_cu121                   pytorch            9MB

  Summary:

  Install: 180 packages

  Total download: 3GB

───────────────────────────────────────────────────────────────────────────────────────────────────


Confirm changes: [Y/n]

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

To activate this environment, use

     $ mamba activate torch-env

To deactivate an active environment, use

     $ mamba deactivate

Here, the meta-package pytorch-cuda allows us to specify the required CUDA version. Since version 12.2 is not available, we settle for version 12.1. If we had installed the regular pytorch package, we would have downloaded the CPU version without CUDA acceleration. We can double check the build string of the pytorch package in the command output: py3.12_cuda12.1_cudnn9.1.0_0.

Notice how we need two extra channels: pytorch and nvidia. The first attempt without the nvidia channel failed because pytorch-cuda=12.1 has a dependency on a very specific version of cuBLAS that is unavailable in conda-forge.

Furthermore, notice how we did not specify cudnn this time. PyTorch with CUDA support includes a statically linked version of this library, so we don't need to include it separately.

When the environment is created and activated, we can test whether CUDA support is enabled.

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
2
>>> torch.cuda.current_device()
0
>>> torch.backends.cudnn.version()
90100

Topics for another time

more elaborate pytorch example
- or tensorflow vs tensorflow-gpu
NVIDIA TensorRT

Appendix

Glossary

cudaRT - CUDA runtime
cuBLAS - CUDA BLAS
cuFFT - CUDA Fast Fourier Transform
cuDPP - CUDA Data Parallel Primitives
cuDNN - CUDA Deep Neural Network
cuRAND - CUDA Random Number Generation library
cuSOLVER - CUDA based collection of dense and sparse direct solvers
cuSPARSE - CUDA Sparse Matrix library
NPP - NVIDIA Performance Primitives library
nvGRAPH - NVIDIA Graph Analytics library
NVML - NVIDIA Management Library
NVRTC - NVIDIA Runtime Compilation library for CUDA C++
NVCC - Nvidia CUDA Compiler
- based on LLVM
- source file extension: *.cu
NCCL - NVIDIA Collective Communications Library
- multi-GPU setup
Thrust: open source C++ library of parallel algorithms and data structures

Mamba: CUDA 12 package overview

useful commands
- mamba search -c <channel> --override-channels [--info] <package-spec>
- mamba repoquery whoneeds --tree --recursive -c <channel> <package>
- mamba repoquery depends --tree --recursive -c <channel> <package>

Hierarchical meta package list

cuda
- cuda-runtime
  - cuda-libraries
    - (see below)
- cuda-toolkit
  - cuda-compiler
    - c-compiler
    - cuda-cuobjdump
    - cuda-cuxxfilt
    - cuda-nvcc
    - cuda-nvprune
    - cxx-compiler
  - cuda-libraries
    - cuda-cudart
    - cuda-nvrtc
    - cuda-opencl
    - libcublas
    - libcufft
    - libcufile
    - libcurand
    - libcusolver
    - libcusparse
    - libnpp
    - libnvfatbin
    - libnvjitlink
    - libnvjpeg
  - cuda-libraries-dev
    - cuda-cccl
    - cuda-cudart-dev
    - cuda-driver-dev
    - cuda-nvrtc-dev
    - cuda-opencl-dev
    - cuda-profiler-api
    - libcublas-dev
    - libcufft-dev
    - libcufile-dev
    - libcurand-dev
    - libcusolver-dev
    - libcusparse-dev
    - libnpp-dev
    - libnvfatbin-dev
    - libnvjitlink-dev
    - libnvjpeg-dev
  - cuda-nvml-dev
  - cuda-tools
    - cuda-command-line-tools
      - cuda-cupti-dev
      - cuda-gdb
      - cuda-nvdisasm
      - cuda-nvprof
      - cuda-nvtx
      - cuda-sanitizer-api
    - cuda-visual-tools
    - gds-tools
cuda-minimal-build
- cuda-cccl
- cuda-compiler
  - ...
- cuda-cudart-dev
- cuda-profiler-api
not part of any meta package
- cuda-compat
- cuda-crt
- cuda-nsight
- cuda-nvvm
- cuda-nvvp
- cuda-python
- cudnn
- cuquantum
- cutensor
- nccl

Flat package list

cuda-cccl
cuda-compat
cuda-crt
cuda-crt-dev_linux-64
cuda-crt-tools
cuda-cudart
cuda-cudart-dev
cuda-cuobjdump
cuda-cupti
cuda-cupti-dev
cuda-cupti-doc
cuda-cuxxfilt
cuda-driver-dev
cuda-gdb
cuda-gdb-src
cuda-nsight
cuda-nvcc
cuda-nvcc-dev_linux-64
cuda-nvcc-impl
cuda-nvcc-tools
cuda-nvdisasm
cuda-nvml-dev
cuda-nvprof
cuda-nvprune
cuda-nvrtc
cuda-nvrtc-dev
cuda-nvtx
cuda-nvtx-dev
cuda-nvvm
cuda-nvvm-dev_linux-64
cuda-nvvm-impl
cuda-nvvm-tools
cuda-nvvp
cuda-opencl
cuda-opencl-dev
cuda-profiler-api
cuda-python
cuda-sanitizer-api
cuda-visual-tools
cudnn
~~cupti~~
cuquantum
cutensor
libcublas
libcublas-dev
libcufft
libcufft-dev
~~libcuquantum~~
libcurand
libcurand-dev
libcusolver
libcusolver-dev
libcusparse
libcusparse-dev
~~libcutensor~~
nccl

Repoqueries

output has been slightly edited for clarity

$ mamba repoquery depends cuda=12.6 -c conda-forge --tree --recursive

cuda[12.6.0]
  ├─ cuda-runtime[12.6.0]
  │  └─ cuda-libraries[12.6.0]
  │     ├─ cuda-cudart[12.6.37]
  │     │  ├─ cuda-cudart_linux-64[12.6.37]
  │     │  │  └─ cuda-version[12.6]
  │     │  ├─ libgcc-ng[14.1.0]
  │     │  │  ├─ _libgcc_mutex[0.1]
  │     │  │  └─ _openmp_mutex[4.5]
  │     │  │     ├─ _libgcc_mutex already visited
  │     │  │     └─ llvm-openmp[18.1.8]
  │     │  │        ├─ libzlib[1.3.1]
  │     │  │        └─ zstd[1.5.6]
  │     │  │           ├─ libzlib already visited
  │     │  │           └─ libstdcxx-ng[14.1.0]
  │     │  └─ libstdcxx-ng already visited
  │     ├─ cuda-nvrtc[12.6.20]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ cuda-opencl[12.6.37]
  │     │  ├─ libgcc-ng already visited
  │     │  ├─ libstdcxx-ng already visited
  │     │  └─ ocl-icd[2.3.2]
  │     │     └─ libgcc-ng already visited
  │     ├─ libcublas[12.6.0.22]
  │     │  ├─ libgcc-ng already visited
  │     │  ├─ libstdcxx-ng already visited
  │     │  └─ cuda-nvrtc[12.0.76]
  │     │     ├─ libgcc-ng already visited
  │     │     ├─ libstdcxx-ng already visited
  │     │     └─ cuda-version[12.0.0]
  │     ├─ libcufft[11.2.6.28]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ libcufile[1.11.0.15]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ libcurand[10.3.7.37]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ libcusolver[11.6.4.38]
  │     │  ├─ libgcc-ng already visited
  │     │  ├─ libstdcxx-ng already visited
  │     │  ├─ libcublas already visited
  │     │  ├─ libcusparse[12.5.2.23]
  │     │  │  ├─ libgcc-ng already visited
  │     │  │  ├─ libstdcxx-ng already visited
  │     │  │  └─ libnvjitlink[12.6.20]
  │     │  │     ├─ libgcc-ng already visited
  │     │  │     └─ libstdcxx-ng already visited
  │     │  └─ libnvjitlink already visited
  │     ├─ libcusparse already visited
  │     ├─ libnvjitlink already visited
  │     ├─ libnpp[12.3.1.23]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ libnvfatbin[12.6.20]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     └─ libnvjpeg[12.3.3.23]
  │        ├─ libgcc-ng already visited
  │        └─ libstdcxx-ng already visited
  └─ cuda-toolkit[12.6.0]
     ├─ cuda-libraries already visited
     ├─ cuda-compiler[12.6.0]
     │  ├─ c-compiler[1.0.0]
     │  │  ├─ libgcc-ng already visited
     │  │  └─ gcc_linux-64[10.3.0]
     │  │     ├─ binutils_linux-64[2.36]
     │  │     │  ├─ binutils_impl_linux-64[2.36.1]
     │  │     │  │  ├─ ld_impl_linux-64[2.36.1]
     │  │     │  │  └─ sysroot_linux-64[2.12]
     │  │     │  │     └─ kernel-headers_linux-64[2.6.32]
     │  │     │  └─ sysroot_linux-64 already visited
     │  │     ├─ sysroot_linux-64 already visited
     │  │     └─ gcc_impl_linux-64[10.3.0]
     │  │        ├─ libgcc-ng already visited
     │  │        ├─ libstdcxx-ng already visited
     │  │        ├─ binutils_impl_linux-64 already visited
     │  │        ├─ sysroot_linux-64 already visited
     │  │        ├─ libgcc-devel_linux-64[10.3.0]
     │  │        ├─ libgomp[14.1.0]
     │  │        │  └─ _libgcc_mutex already visited
     │  │        └─ libsanitizer[10.3.0]
     │  │           └─ libgcc-ng already visited
     │  ├─ cuda-cuobjdump[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ cuda-nvdisasm[12.0.76]
     │  │     ├─ libgcc-ng already visited
     │  │     └─ libstdcxx-ng already visited
     │  ├─ cuda-cuxxfilt[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  └─ libstdcxx-ng already visited
     │  ├─ cuda-nvcc[12.6.20]
     │  │  ├─ gcc_linux-64 already visited
     │  │  ├─ cuda-nvcc_linux-64[12.6.20]
     │  │  │  ├─ cuda-cudart-dev_linux-64[12.6.37]
     │  │  │  │  ├─ cuda-cccl_linux-64[12.0.90]
     │  │  │  │  ├─ cuda-cudart-static_linux-64[12.0.107]
     │  │  │  │  └─ cuda-cudart_linux-64[12.0.107]
     │  │  │  │     ├─ libgcc-ng already visited
     │  │  │  │     └─ libstdcxx-ng already visited
     │  │  │  ├─ cuda-driver-dev_linux-64[12.6.37]
     │  │  │  ├─ cuda-nvcc-dev_linux-64[12.6.20]
     │  │  │  │  ├─ libgcc-ng already visited
     │  │  │  │  ├─ cuda-crt-dev_linux-64[12.6.20]
     │  │  │  │  └─ cuda-nvvm-dev_linux-64[12.6.20]
     │  │  │  ├─ cuda-nvcc-impl[12.6.20]
     │  │  │  │  ├─ cuda-cudart already visited
     │  │  │  │  ├─ cuda-nvcc-dev_linux-64 already visited
     │  │  │  │  ├─ cuda-cudart-dev[12.0.107]
     │  │  │  │  │  ├─ libgcc-ng already visited
     │  │  │  │  │  ├─ libstdcxx-ng already visited
     │  │  │  │  │  ├─ cuda-cudart[12.0.107]
     │  │  │  │  │  │  ├─ libgcc-ng already visited
     │  │  │  │  │  │  └─ libstdcxx-ng already visited
     │  │  │  │  │  ├─ cuda-cudart-dev_linux-64[12.0.107]
     │  │  │  │  │  │  ├─ cuda-cccl_linux-64 already visited
     │  │  │  │  │  │  └─ cuda-cudart-static_linux-64 already visited
     │  │  │  │  │  └─ cuda-cudart-static[12.0.107]
     │  │  │  │  │     ├─ libgcc-ng already visited
     │  │  │  │  │     ├─ libstdcxx-ng already visited
     │  │  │  │  │     └─ cuda-cudart-static_linux-64 already visited
     │  │  │  │  ├─ cuda-nvcc-tools[12.6.20]
     │  │  │  │  │  ├─ libgcc-ng already visited
     │  │  │  │  │  ├─ libstdcxx-ng already visited
     │  │  │  │  │  ├─ cuda-crt-tools[12.6.20]
     │  │  │  │  │  └─ cuda-nvvm-tools[12.6.20]
     │  │  │  │  │     ├─ libgcc-ng already visited
     │  │  │  │  │     └─ libstdcxx-ng already visited
     │  │  │  │  └─ cuda-nvvm-impl[12.6.20]
     │  │  │  │     ├─ libgcc-ng already visited
     │  │  │  │     └─ libstdcxx-ng already visited
     │  │  │  ├─ cuda-nvcc-tools already visited
     │  │  │  └─ sysroot_linux-64[2.28]
     │  │  │     ├─ _sysroot_linux-64_curr_repodata_hack[3]
     │  │  │     └─ kernel-headers_linux-64[4.18.0]
     │  │  │        └─ _sysroot_linux-64_curr_repodata_hack already visited
     │  │  └─ gxx_linux-64[10.3.0]
     │  │     ├─ gcc_linux-64 already visited
     │  │     ├─ binutils_linux-64 already visited
     │  │     ├─ sysroot_linux-64 already visited
     │  │     └─ gxx_impl_linux-64[10.3.0]
     │  │        ├─ sysroot_linux-64 already visited
     │  │        ├─ gcc_impl_linux-64 already visited
     │  │        └─ libstdcxx-devel_linux-64[10.3.0]
     │  ├─ cuda-nvprune[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  └─ libstdcxx-ng already visited
     │  └─ cxx-compiler[1.0.0]
     │     ├─ libgcc-ng already visited
     │     ├─ libstdcxx-ng already visited
     │     └─ gxx_linux-64 already visited
     ├─ cuda-libraries-dev[12.6.0]
     │  ├─ cuda-cccl[12.6.37]
     │  │  ├─ cccl[2.5.0]
     │  │  └─ cuda-cccl_linux-64[12.6.37]
     │  ├─ cuda-cudart-dev[12.6.37]
     │  │  ├─ cuda-cudart already visited
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  ├─ cuda-cudart-dev_linux-64 already visited
     │  │  └─ cuda-cudart-static[12.6.37]
     │  │     ├─ libgcc-ng already visited
     │  │     ├─ libstdcxx-ng already visited
     │  │     └─ cuda-cudart-static_linux-64[12.6.37]
     │  ├─ cuda-driver-dev[12.6.37]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ cuda-driver-dev_linux-64[12.0.107]
     │  ├─ cuda-nvrtc-dev[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ cuda-nvrtc already visited
     │  ├─ cuda-opencl-dev[12.6.37]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ cuda-opencl already visited
     │  ├─ cuda-profiler-api[12.6.37]
     │  │  └─ cuda-cudart-dev already visited
     │  ├─ libcublas-dev[12.6.0.22]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcublas already visited
     │  ├─ libcufft-dev[11.2.6.28]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcufft already visited
     │  ├─ libcufile-dev[1.11.0.15]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcufile already visited
     │  ├─ libcurand-dev[10.3.7.37]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcurand already visited
     │  ├─ libcusolver-dev[11.6.4.38]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcusolver already visited
     │  ├─ libcusparse-dev[12.5.2.23]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  ├─ libcusparse already visited
     │  │  └─ libnvjitlink already visited
     │  ├─ libnpp-dev[12.3.1.23]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libnpp already visited
     │  ├─ libnvfatbin-dev[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libnvfatbin already visited
     │  ├─ libnvjitlink-dev[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libnvjitlink already visited
     │  └─ libnvjpeg-dev[12.3.3.23]
     │     ├─ libnvjpeg already visited
     │     └─ cuda-cudart-dev already visited
     ├─ cuda-nvml-dev[12.6.37]
     │  ├─ libgcc-ng already visited
     │  └─ libstdcxx-ng already visited
     └─ cuda-tools[12.6.0]
        ├─ cuda-command-line-tools[12.6.0]
        │  ├─ cuda-cupti-dev[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  ├─ libstdcxx-ng already visited
        │  │  └─ cuda-cupti[12.6.37]
        │  │     ├─ libgcc-ng already visited
        │  │     └─ libstdcxx-ng already visited
        │  ├─ cuda-gdb[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  ├─ libstdcxx-ng already visited
        │  │  └─ gmp[6.3.0]
        │  │     ├─ libgcc-ng already visited
        │  │     └─ libstdcxx-ng already visited
        │  ├─ cuda-nvdisasm[12.6.20]
        │  │  ├─ libgcc-ng already visited
        │  │  └─ libstdcxx-ng already visited
        │  ├─ cuda-nvprof[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  ├─ libstdcxx-ng already visited
        │  │  └─ cuda-cupti[12.0.90]
        │  │     ├─ libgcc-ng already visited
        │  │     └─ libstdcxx-ng already visited
        │  ├─ cuda-nvtx[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  └─ libstdcxx-ng already visited
        │  └─ cuda-sanitizer-api[12.6.34]
        │     ├─ libgcc-ng already visited
        │     └─ libstdcxx-ng already visited
        ├─ cuda-visual-tools[12.6.0]
        │  ├─ cuda-libraries-dev already visited
        │  ├─ cuda-nvml-dev already visited
        │  ├─ cuda-nsight[12.6.20]
        │  ├─ cuda-nvvp[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  ├─ libstdcxx-ng already visited
        │  │  ├─ cuda-nvdisasm already visited
        │  │  └─ cuda-nvprof[12.0.90]
        │  │     ├─ libgcc-ng already visited
        │  │     ├─ libstdcxx-ng already visited
        │  │     └─ cuda-cupti already visited
        │  └─ nsight-compute[2024.3.0.15]
        │     └─ ...
        └─ gds-tools[1.11.0.15]
           ├─ libgcc-ng already visited
           ├─ libstdcxx-ng already visited
           └─ libcufile already visited

$ mamba repoquery depends cuda-minimal-build=12.6 -c conda-forge --tree --recursive

cuda-minimal-build[12.6.0]
  ├─ cuda-cccl[12.6.37]
  │  ├─ cccl[2.5.0]
  │  └─ cuda-cccl_linux-64[12.6.37]
  ├─ cuda-compiler[12.6.0]
  │  ├─ c-compiler[1.0.0]
  │  │  ├─ gcc_linux-64[10.3.0]
  │  │  │  ├─ binutils_linux-64[2.36]
  │  │  │  │  ├─ binutils_impl_linux-64[2.36.1]
  │  │  │  │  │  ├─ ld_impl_linux-64[2.36.1]
  │  │  │  │  │  └─ sysroot_linux-64[2.12]
  │  │  │  │  │     └─ kernel-headers_linux-64[2.6.32]
  │  │  │  │  └─ sysroot_linux-64 already visited
  │  │  │  ├─ sysroot_linux-64 already visited
  │  │  │  └─ gcc_impl_linux-64[10.3.0]
  │  │  │     ├─ binutils_impl_linux-64 already visited
  │  │  │     ├─ sysroot_linux-64 already visited
  │  │  │     ├─ libgcc-devel_linux-64[10.3.0]
  │  │  │     ├─ libgcc-ng[14.1.0]
  │  │  │     │  ├─ _libgcc_mutex[0.1]
  │  │  │     │  └─ _openmp_mutex[4.5]
  │  │  │     │     ├─ _libgcc_mutex already visited
  │  │  │     │     └─ llvm-openmp[18.1.8]
  │  │  │     │        ├─ libzlib[1.3.1]
  │  │  │     │        └─ zstd[1.5.6]
  │  │  │     │           ├─ libzlib already visited
  │  │  │     │           └─ libstdcxx-ng[14.1.0]
  │  │  │     ├─ libstdcxx-ng already visited
  │  │  │     ├─ libgomp[14.1.0]
  │  │  │     │  └─ _libgcc_mutex already visited
  │  │  │     └─ libsanitizer[10.3.0]
  │  │  │        └─ libgcc-ng already visited
  │  │  └─ libgcc-ng already visited
  │  ├─ cuda-cuobjdump[12.6.20]
  │  │  ├─ libgcc-ng already visited
  │  │  ├─ libstdcxx-ng already visited
  │  │  └─ cuda-nvdisasm[12.0.76]
  │  │     ├─ libgcc-ng already visited
  │  │     ├─ libstdcxx-ng already visited
  │  │     └─ cuda-version[12.0.0]
  │  ├─ cuda-cuxxfilt[12.6.20]
  │  │  ├─ libgcc-ng already visited
  │  │  └─ libstdcxx-ng already visited
  │  ├─ cuda-nvcc[12.6.20]
  │  │  ├─ gcc_linux-64 already visited
  │  │  ├─ cuda-nvcc_linux-64[12.6.20]
  │  │  │  ├─ cuda-cudart-dev_linux-64[12.6.37]
  │  │  │  │  ├─ cuda-cccl_linux-64[12.0.90]
  │  │  │  │  ├─ cuda-cudart-static_linux-64[12.0.107]
  │  │  │  │  └─ cuda-cudart_linux-64[12.0.107]
  │  │  │  │     ├─ libgcc-ng already visited
  │  │  │  │     └─ libstdcxx-ng already visited
  │  │  │  ├─ cuda-driver-dev_linux-64[12.6.37]
  │  │  │  ├─ cuda-nvcc-dev_linux-64[12.6.20]
  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  ├─ cuda-crt-dev_linux-64[12.6.20]
  │  │  │  │  └─ cuda-nvvm-dev_linux-64[12.6.20]
  │  │  │  ├─ cuda-nvcc-impl[12.6.20]
  │  │  │  │  ├─ cuda-nvcc-dev_linux-64 already visited
  │  │  │  │  ├─ cuda-cudart[12.6.37]
  │  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  │  ├─ libstdcxx-ng already visited
  │  │  │  │  │  └─ cuda-cudart_linux-64[12.6.37]
  │  │  │  │  ├─ cuda-cudart-dev[12.0.107]
  │  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  │  ├─ libstdcxx-ng already visited
  │  │  │  │  │  ├─ cuda-cudart[12.0.107]
  │  │  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  │  │  └─ libstdcxx-ng already visited
  │  │  │  │  │  ├─ cuda-cudart-dev_linux-64[12.0.107]
  │  │  │  │  │  │  ├─ cuda-cccl_linux-64 already visited
  │  │  │  │  │  │  └─ cuda-cudart-static_linux-64 already visited
  │  │  │  │  │  └─ cuda-cudart-static[12.0.107]
  │  │  │  │  │     ├─ libgcc-ng already visited
  │  │  │  │  │     ├─ libstdcxx-ng already visited
  │  │  │  │  │     └─ cuda-cudart-static_linux-64 already visited
  │  │  │  │  ├─ cuda-nvcc-tools[12.6.20]
  │  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  │  ├─ libstdcxx-ng already visited
  │  │  │  │  │  ├─ cuda-crt-tools[12.6.20]
  │  │  │  │  │  └─ cuda-nvvm-tools[12.6.20]
  │  │  │  │  │     ├─ libgcc-ng already visited
  │  │  │  │  │     └─ libstdcxx-ng already visited
  │  │  │  │  └─ cuda-nvvm-impl[12.6.20]
  │  │  │  │     ├─ libgcc-ng already visited
  │  │  │  │     └─ libstdcxx-ng already visited
  │  │  │  ├─ cuda-nvcc-tools already visited
  │  │  │  └─ sysroot_linux-64[2.28]
  │  │  │     ├─ _sysroot_linux-64_curr_repodata_hack[3]
  │  │  │     └─ kernel-headers_linux-64[4.18.0]
  │  │  │        └─ _sysroot_linux-64_curr_repodata_hack already visited
  │  │  └─ gxx_linux-64[10.3.0]
  │  │     ├─ gcc_linux-64 already visited
  │  │     ├─ binutils_linux-64 already visited
  │  │     ├─ sysroot_linux-64 already visited
  │  │     └─ gxx_impl_linux-64[10.3.0]
  │  │        ├─ sysroot_linux-64 already visited
  │  │        ├─ gcc_impl_linux-64 already visited
  │  │        └─ libstdcxx-devel_linux-64[10.3.0]
  │  ├─ cuda-nvprune[12.6.20]
  │  │  ├─ libgcc-ng already visited
  │  │  └─ libstdcxx-ng already visited
  │  └─ cxx-compiler[1.0.0]
  │     ├─ libgcc-ng already visited
  │     ├─ libstdcxx-ng already visited
  │     └─ gxx_linux-64 already visited
  ├─ cuda-cudart-dev[12.6.37]
  │  ├─ libgcc-ng already visited
  │  ├─ libstdcxx-ng already visited
  │  ├─ cuda-cudart-dev_linux-64 already visited
  │  ├─ cuda-cudart already visited
  │  └─ cuda-cudart-static[12.6.37]
  │     ├─ libgcc-ng already visited
  │     ├─ libstdcxx-ng already visited
  │     └─ cuda-cudart-static_linux-64[12.6.37]
  └─ cuda-profiler-api[12.6.37]
     └─ cuda-cudart-dev already visited