Introduction
CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform that allows developers to harness the power of NVIDIA GPUs for general-purpose computing (GPGPU). CUDA provides a suite of tools and libraries that enable high-performance computing on GPUs, making it a go-to solution for a wide range of computational tasks, including deep learning.
CUDA competes with other GPU computing platforms, such as AMD's ROCm and Intel's OneAPI. Both ROCm and OneAPI are open-source platforms that offer similar capabilities to CUDA. However, CUDA remains dominant, especially in the AI and deep learning space, due to its mature ecosystem and widespread support.
CUDA can be deployed on various operating systems, including Linux, Windows, and macOS. However, it is worth noting that CUDA support for macOS was discontinued after version 12.5 due to Apple's transition to the new ARM-based architecture (Apple Silicon).
In this blog post, we will dive into CUDA by exploring it across three layers:
- System-wide setup: We will cover the installation and configuration of the graphics driver.
- CUDA and cuDNN setup: We will discuss two different approaches: system-wide or isolated.
- Using CUDA: How to leverage CUDA in your projects, including the installation of GPU-accelerated libraries and frameworks.
System overview
For this guide, we will walk through setting up CUDA on a Linux system. Specifically, we will be using Ubuntu 23.10. While it would have been ideal to use a long-term support (LTS) version like 24.04, the differences in setup will be minimal.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 23.10
Release: 23.10
Codename: mantic
$ uname -r
6.5.0-44-generic
$ ldd --version
ldd (Ubuntu GLIBC 2.38-1ubuntu6.3) 2.38
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
Linux display driver Setup
When setting up CUDA on Linux, one of the first steps is ensuring that your display driver is correctly installed. On Linux, this is always a system-wide process, and you have two main options: the proprietary NVIDIA driver or the open-source Nouveau driver.
- Proprietary drivers: The official drivers provided by NVIDIA, offering the best performance and full support for CUDA. Common versions include 470, 525, 535, 545 and 550.
- Nouveau drivers: The Nouveau driver is an open-source alternative to NVIDIA's proprietary driver. While it provides basic functionality and is a good choice for general use, it does not support CUDA.
In this guide, we will be using the proprietary NVIDIA drivers. These drivers also include the nvidia-smi
tool, which is vital for managing and monitoring your GPU.
There are two main procedures for installing the drivers: automatic or manual.
Automatic Installation
The easiest way to install the appropriate NVIDIA driver is through the automatic installation process, which detects your GPU and recommends the best driver.
$ ubuntu-drivers devices
== /sys/devices/pci0000:40/0000:40:01.1/0000:41:00.0 ==
modalias : pci:v000010DEd00002204sv000010DEsd0000147Dbc03sc00i00
vendor : NVIDIA Corporation
model : GA102 [GeForce RTX 3090]
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-545 - distro non-free
driver : nvidia-driver-545-open - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-535 - distro non-free recommended
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-535-server-open - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
$ ubuntu-drivers list
$ ubuntu-drivers install
Manual Installation
For those who prefer more control over the installation process, or if you want the latest drivers not available in the default Ubuntu repositories, you can manually install the driver.
You can install the display drivers either from the default Ubuntu repositories or from the additional PPA (Personal Package Archive) provided by Ubuntu's graphics drivers team if you want the latest and greatest.
$ sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt update
$ sudo apt install nvidia-driver-535
This command installs the NVIDIA driver version 535, but you can replace "535" with your desired version number.
Once installed, you can verify that the driver is correctly set up using the nvidia-smi
tool:
$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:41:00.0 On | N/A |
| 0% 45C P8 32W / 350W | 541MiB / 24576MiB | 7% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:42:00.0 Off | N/A |
| 0% 40C P8 19W / 350W | 10MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2635 G /usr/lib/xorg/Xorg 192MiB |
| 0 N/A N/A 2944 G /usr/bin/gnome-shell 82MiB |
| 0 N/A N/A 3607 G ...irefox/3626/usr/lib/firefox/firefox 134MiB |
| 0 N/A N/A 4701 G ...erProcess --variations-seed-version 116MiB |
| 1 N/A N/A 2635 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
Important Notes
- Stability vs. latest features: When choosing your driver version, consider the trade-off between stability and access to the latest features. Older drivers like version 470 are more stable and widely tested, while newer versions like 550 offer the latest updates and support for newer hardware.
- GPU compatibility: Ensure that the driver you select is compatible with your GPU model. The automatic detection method mentioned above typically handles this well.
- Bundled CUDA runtime: The NVIDIA driver comes with a minimal CUDA runtime (i.e., version 12.2) necessary for running basic CUDA applications. However, it does not include the full CUDA toolkit required for development purposes. The runtime version bundled with the driver will not change even if you separately install a proper CUDA runtime or toolkit.
To see where the CUDA runtime is located on your system, you can run:
$ find /usr -name libcuda.so*
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.535.171.04
/usr/lib/i386-linux-gnu/libcuda.so.1
/usr/lib/i386-linux-gnu/libcuda.so
/usr/lib/i386-linux-gnu/libcuda.so.535.171.04
This command will display the paths to the installed CUDA runtime libraries, which are essential for running CUDA-enabled applications.
Compute capability
Compute capability determines the hardware capabilities of your GPU and cannot be upgraded through software. Here's a table summarizing the compute capabilities of various NVIDIA GPU architectures:
Architecture |
Compute Capability |
GPU Models |
Volta |
7.0 |
V100 |
Turing |
7.5 |
GeForce RTX 20xx, Quadro RTX 8000 and RTX 6000, Tesla T4 |
Ampere |
8.x |
A100 (8.0), GeForce RTX 30xx (8.6), RTX A6000 (8.6) |
Ada Lovelace |
8.9 |
GeForce RTX 40xx, RTX 6000 Ada |
Hopper |
9.0 |
H100, H200 |
Blackwell |
10.x |
B100, B200, GeForce RTX 5090 |
To check the compute capability of your GPU, you can use the following command:
$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
8.6
8.6
For more detailed information on compute capabilities and their implications, refer to the NVIDIA CUDA C Programming Guide.
CUDA and cuDNN
When working with CUDA, it is important to distinguish between the CUDA runtime and the CUDA toolkit, similar to the difference between the Java Runtime Environment (JRE) and the Java Development Kit (JDK). The NVIDIA driver includes a minimal CUDA runtime that enables you to run basic CUDA-enabled applications. However, this runtime is limited and does not include all the components needed for more advanced CUDA tasks.
The CUDA toolkit, on the other hand, is a comprehensive package that provides all the development tools necessary for creating, compiling, and running CUDA applications. It also includes a more complete runtime, which provides additional libraries and features needed for more complex applications. Most deep learning tasks will require this toolkit to function correctly.
When installing the CUDA toolkit, ensure that you only install versions that are less than or equal to the runtime version bundled with your display driver. Installing a higher version without updating the driver first can lead to instability.
In addition to the CUDA toolkit, some deep learning frameworks also require the CUDA Deep Neural Network (cuDNN) package.
System-wide installation
To set up CUDA and cuDNN system-wide, start by ensuring that the NVIDIA proprietary drivers are installed, as discussed earlier. Next, you will need to install a C++ compiler, which is required for compiling CUDA code. This can be done with sudo apt install gcc g++
.
Once GCC is installed, you can proceed to install CUDA. You have two main options for this:
- The easiest approach is to use the official Ubuntu repository by running
sudo apt install nvidia-cuda-toolkit
- Alternatively, for more control or to get the latest version, you can install CUDA directly from NVIDIA’s official source. This involves following the detailed instructions provided in the NVIDIA CUDA Installation Guide for Linux.
After installing CUDA, the next step is to set up cuDNN, which is essential for deep learning applications. You can do this by following the instructions in the NVIDIA cuDNN Installation Guide.
Isolated installation (recommended)
For situations where you need to manage multiple projects with different CUDA or cuDNN versions, setting up an isolated environment using Conda or Mamba is the recommended option. This approach keeps the system-wide components minimal and allows each environment to have its own specific setup.
Start by ensuring that the NVIDIA proprietary drivers are installed as discussed earlier, since they are the only system-wide component required. Next, install Mamba, which is a faster alternative to Conda, via Miniforge. You can verify that the installation was successful by running mamba info
:
$ mamba info
mamba version : 1.5.5
active environment : None
shell level : 0
user config file : /home/user/.condarc
populated config files : /home/user/miniforge3/.condarc
conda version : 23.11.0
conda-build version : not installed
python version : 3.10.13.final.0
solver : libmamba (default)
virtual packages : __archspec=1=zen3
__conda=23.11.0=0
__cuda=12.2=0
__glibc=2.38=0
__linux=6.5.0=0
__unix=0=0
base environment : /home/user/miniforge3 (writable)
conda av data dir : /home/user/miniforge3/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
package cache : /home/user/miniforge3/pkgs
/home/user/.conda/pkgs
envs directories : /home/user/miniforge3/envs
/home/user/.conda/envs
platform : linux-64
user-agent : conda/23.11.0 requests/2.31.0 CPython/3.10.13 Linux/6.5.0-44-generic ubuntu/23.10 glibc/2.38 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.5
UID:GID : 1000:1000
netrc file : None
offline mode : False
Relevance of virtual packages in mamba environments
Mamba environments utilize virtual packages to dynamically detect and represent certain system features that are critical for package resolution and compatibility. These virtual packages include system-specific details such as architecture, the operating system, the version of the GNU C Library (glibc), and, importantly, the version of CUDA supported by your installed NVIDIA drivers.
Virtual packages are not installed in the traditional sense but are automatically detected by Mamba. They help the package manager resolve dependencies by ensuring that the packages you install are compatible with your system's underlying hardware and software.
Among these virtual packages, __cuda
is particularly important when working with CUDA. It represents the maximum version of CUDA that your NVIDIA driver officially supports. This information is automatically detected and provided by Mamba, assisting in the selection of the appropriate CUDA toolkit and related packages for your environment.
The output above indicates that our system's NVIDIA drivers support CUDA up to version 12.2. While Mamba does not strictly enforce this version when installing the CUDA toolkit, it serves as a guideline. You can technically install lower or higher versions of the toolkit, but installing a version higher than what __cuda
indicates is generally not recommended, as it could lead to instability or compatibility issues. If your project requires a higher than supported version of the toolkit, consider upgrading your graphics driver first.
Creating a new mamba environment
Once Mamba is set up, we can proceed to install the CUDA toolkit within an isolated environment. We can do this by creating a new environment and specifying the CUDA version we need, along with any other packages such as Python or cuDNN. For example:
$ mamba create -n my-environment python=3.12 cuda-toolkit=12.2 cudnn
$ mamba activate my-environment
This command sets up a new environment named my-environment
with Python 3.12, CUDA toolkit 12.2, and a recent, compatible version of cuDNN.
After activating your environment, you can verify that CUDA is correctly installed by checking the version of the NVIDIA CUDA compiler:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0
Package overview
When working with CUDA 12, it is important to be aware of the significant changes in package names and structures compared to earlier versions. In CUDA 11 and earlier versions, the installation was typically done using the cudatoolkit
or cudatoolkit-dev
package. However, starting with CUDA 12, there has been a restructuring of package names. This shift makes installation trickier, especially as much of the older documentation still references these outdated package names.
The following meta-packages can be used to setup a CUDA 12 environment:
-
cuda-toolkit
: This package (notice the hyphen) is now the primary way of installing CUDA development tools. Running mamba install cuda-toolkit=12.2
, for instance, will typically provide all the necessary components for CUDA 12.2 development, including the compiler, libraries, and headers. The downside is that it is fairly big, and it will likely include much more than what is strictly necessary for your use case.
-
cuda-runtime
: If you only need the runtime components, install this package. Note that this refers to the complete runtime, not the minimal one bundled with the driver.
-
cuda
: This is a meta-package that pulls in both the toolkit and runtime. Seeing as the runtime contains a subset of the packages in the toolkit, this is in effect functionally equivalent to the cuda-toolkit
package.
- Other meta packages such as
cuda-libraries
, cuda-libraries-dev
, cuda-compiler
and cuda-tools
can be used to specify more precisely what your project needs. Below is a simplified hierarchy of the most relevant packages. Check the appendix for a full list.
cuda
├── cuda-runtime
│ └── cuda-libraries
└── cuda-toolkit
├── cuda-compiler
├── cuda-libraries
├── cuda-libraries-dev
└── cuda-tools
└── cuda-command-line-tools
Finally, for the fullest possible control, refer to the actual CUDA packages instead of these meta-packages (which are simply groups of packages).
Note that neither cuda
, cuda-runtime
, nor cuda-toolkit
include cuDNN. It is a separate package specifically tailored for deep learning applications, which needs to be installed independently as shown earlier.
Channel selection
When setting up CUDA and related packages in a Mamba environment, the various channels that offer similar packages might cause confusion. The two primary channels to consider are conda-forge
(default in Miniforge) and nvidia
, both of which offer nearly identical CUDA-related packages, including the CUDA toolkit, cuDNN, and other NVIDIA libraries. We will ignore the anaconda
channel (default in Anaconda) because it typically hosts somewhat outdated package versions. Many mamba commands accept a -c <channel>
option to include an extra channel on top of the default channels configured in the .condarc file.
The conda-forge
channel is a widely used, community-driven repository known for its extensive package coverage beyond CUDA. This makes conda-forge
particularly suitable for projects that require a mix of CUDA and other libraries. Additionally, conda-forge
is continuously updated and maintained, ensuring that you have access to recent versions of packages.
In contrast, the official nvidia
channel, maintained directly by NVIDIA, is dedicated specifically to CUDA and other NVIDIA tools. To install the complete CUDA suite from this channel, you can use mamba install -c nvidia cuda
. Although the CUDA-related (meta-)packages in the nvidia
channel are almost identical to those found in conda-forge
, the nvidia
channel provides slightly earlier access to the latest versions and includes some less common versions that may not be available on conda-forge
.
In most cases, if your environment requires a wide range of software, conda-forge
is likely the better option due to its extensive package offerings. It also helps to avoid some minor hiccups that can result from multi-channel package resolution.
Note that certain packages, such as pytorch
also provide their own dedicated channel to install packages from.
Setting environment variables
For certain applications, you might need to manually set additional environment variables:
-
CUDA_HOME
and CUDA_PATH
- These are interchangeable and typically point to the root of the CUDA toolkit folder, which contains the
lib
and bin
directories.
- In the case of a conda environment, they should point to the environment's root.
-
LD_LIBRARY_PATH
- Add
$CUDA_HOME/lib
to this path to ensure your system can locate the necessary libraries.
$ echo $CONDA_PREFIX
/home/user/miniforge3/envs/my-environment
$ which nvcc
/home/user/miniforge3/envs/my-environment/bin/nvcc
$ export CUDA_HOME=$CONDA_PREFIX
$ export CUDA_PATH=$CUDA_HOME
Install frameworks and libraries
Once CUDA and cuDNN are set up, the next step is installing the python packages that leverage GPU acceleration. Depending on your development environment, you can install these using either pip
in a virtual environment or mamba
.
Here are some popular libraries and frameworks:
- PyCUDA: Python wrapper for CUDA.
- CuPy: NumPy-compatible library that runs on CUDA.
- cuNumeric: A drop-in replacement for NumPy, optimized for CUDA.
- RAPIDS: A suite of libraries for data science and analytics on GPUs, including cuDF (a faster pandas) and cuML (a faster scikit-learn).
- Deep learning frameworks: TensorFlow, PyTorch, ONNX
For example, to install PyTorch with CUDA support using mamba
:
$ mamba create -n torch-env -c pytorch -c nvidia python=3.12 pytorch-cuda=12.1 torchvision torchaudio
Looking for: ['python=3', 'pytorch-cuda=12.1', 'torchvision', 'torchaudio']
...
Package Version Build Channel Size
──────────────────────────────────────────────────────────────────────────────────────────────────
Install:
──────────────────────────────────────────────────────────────────────────────────────────────────
+ libcublas 12.1.0.26 0 nvidia 345MB
+ libcufft 11.0.2.4 0 nvidia 108MB
+ libcusolver 11.4.4.55 0 nvidia 103MB
+ libcusparse 12.0.2.55 0 nvidia 171MB
+ libnpp 12.0.2.50 0 nvidia 147MB
+ cuda-cudart 12.1.105 0 nvidia 193kB
+ cuda-nvrtc 12.1.105 0 nvidia 21MB
+ libnvjitlink 12.1.105 0 nvidia 18MB
+ libnvjpeg 12.1.1.14 0 nvidia 3MB
+ cuda-cupti 12.1.105 0 nvidia 16MB
+ cuda-nvtx 12.1.105 0 nvidia 58kB
...
+ libcurand 10.3.7.37 0 nvidia 54MB
+ libcufile 1.11.0.15 0 nvidia 1MB
+ cuda-opencl 12.6.37 0 nvidia 27kB
+ cuda-libraries 12.1.0 0 nvidia 2kB
+ cuda-runtime 12.1.0 0 nvidia 1kB
...
+ pytorch 2.4.0 py3.12_cuda12.1_cudnn9.1.0_0 pytorch 1GB
+ torchtriton 3.0.0 py312 pytorch 245MB
+ torchaudio 2.4.0 py312_cu121 pytorch 7MB
+ torchvision 0.19.0 py312_cu121 pytorch 9MB
Summary:
Install: 180 packages
Total download: 3GB
───────────────────────────────────────────────────────────────────────────────────────────────────
Confirm changes: [Y/n]
Downloading and Extracting Packages:
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
To activate this environment, use
$ mamba activate torch-env
To deactivate an active environment, use
$ mamba deactivate
Here, the meta-package pytorch-cuda
allows us to specify the required CUDA version. Since version 12.2 is not available, we settle for version 12.1. If we had installed the regular pytorch
package, we would have downloaded the CPU version without CUDA acceleration. We can double check the build string of the pytorch package in the command output: py3.12_cuda12.1_cudnn9.1.0_0
.
Notice how we need two extra channels: pytorch
and nvidia
. The first attempt without the nvidia
channel failed because pytorch-cuda=12.1
has a dependency on a very specific version of cuBLAS that is unavailable in conda-forge
.
Furthermore, notice how we did not specify cudnn
this time. PyTorch with CUDA support includes a statically linked version of this library, so we don't need to include it separately.
When the environment is created and activated, we can test whether CUDA support is enabled.
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
2
>>> torch.cuda.current_device()
0
>>> torch.backends.cudnn.version()
90100
Topics for another time
- more elaborate pytorch example
- or
tensorflow
vs tensorflow-gpu
- NVIDIA TensorRT
Appendix
Glossary
- cudaRT - CUDA runtime
- cuBLAS - CUDA BLAS
- cuFFT - CUDA Fast Fourier Transform
- cuDPP - CUDA Data Parallel Primitives
- cuDNN - CUDA Deep Neural Network
- cuRAND - CUDA Random Number Generation library
- cuSOLVER - CUDA based collection of dense and sparse direct solvers
- cuSPARSE - CUDA Sparse Matrix library
- NPP - NVIDIA Performance Primitives library
- nvGRAPH - NVIDIA Graph Analytics library
- NVML - NVIDIA Management Library
- NVRTC - NVIDIA Runtime Compilation library for CUDA C++
- NVCC - Nvidia CUDA Compiler
- based on LLVM
- source file extension: *.cu
- NCCL - NVIDIA Collective Communications Library
- Thrust: open source C++ library of parallel algorithms and data structures
Mamba: CUDA 12 package overview
- useful commands
mamba search -c <channel> --override-channels [--info] <package-spec>
mamba repoquery whoneeds --tree --recursive -c <channel> <package>
mamba repoquery depends --tree --recursive -c <channel> <package>
- cuda
- cuda-runtime
- cuda-toolkit
- cuda-compiler
- c-compiler
- cuda-cuobjdump
- cuda-cuxxfilt
- cuda-nvcc
- cuda-nvprune
- cxx-compiler
- cuda-libraries
- cuda-cudart
- cuda-nvrtc
- cuda-opencl
- libcublas
- libcufft
- libcufile
- libcurand
- libcusolver
- libcusparse
- libnpp
- libnvfatbin
- libnvjitlink
- libnvjpeg
- cuda-libraries-dev
- cuda-cccl
- cuda-cudart-dev
- cuda-driver-dev
- cuda-nvrtc-dev
- cuda-opencl-dev
- cuda-profiler-api
- libcublas-dev
- libcufft-dev
- libcufile-dev
- libcurand-dev
- libcusolver-dev
- libcusparse-dev
- libnpp-dev
- libnvfatbin-dev
- libnvjitlink-dev
- libnvjpeg-dev
- cuda-nvml-dev
- cuda-tools
- cuda-command-line-tools
- cuda-cupti-dev
- cuda-gdb
- cuda-nvdisasm
- cuda-nvprof
- cuda-nvtx
- cuda-sanitizer-api
- cuda-visual-tools
- gds-tools
- cuda-minimal-build
- cuda-cccl
- cuda-compiler
- cuda-cudart-dev
- cuda-profiler-api
- not part of any meta package
- cuda-compat
- cuda-crt
- cuda-nsight
- cuda-nvvm
- cuda-nvvp
- cuda-python
- cudnn
- cuquantum
- cutensor
- nccl
Flat package list
- cuda-cccl
- cuda-compat
- cuda-crt
- cuda-crt-dev_linux-64
- cuda-crt-tools
- cuda-cudart
- cuda-cudart-dev
- cuda-cuobjdump
- cuda-cupti
- cuda-cupti-dev
- cuda-cupti-doc
- cuda-cuxxfilt
- cuda-driver-dev
- cuda-gdb
- cuda-gdb-src
- cuda-nsight
- cuda-nvcc
- cuda-nvcc-dev_linux-64
- cuda-nvcc-impl
- cuda-nvcc-tools
- cuda-nvdisasm
- cuda-nvml-dev
- cuda-nvprof
- cuda-nvprune
- cuda-nvrtc
- cuda-nvrtc-dev
- cuda-nvtx
- cuda-nvtx-dev
- cuda-nvvm
- cuda-nvvm-dev_linux-64
- cuda-nvvm-impl
- cuda-nvvm-tools
- cuda-nvvp
- cuda-opencl
- cuda-opencl-dev
- cuda-profiler-api
- cuda-python
- cuda-sanitizer-api
- cuda-visual-tools
- cudnn
cupti
- cuquantum
- cutensor
- libcublas
- libcublas-dev
- libcufft
- libcufft-dev
libcuquantum
- libcurand
- libcurand-dev
- libcusolver
- libcusolver-dev
- libcusparse
- libcusparse-dev
libcutensor
- nccl
Repoqueries
- output has been slightly edited for clarity
$ mamba repoquery depends cuda=12.6 -c conda-forge --tree --recursive
cuda[12.6.0]
├─ cuda-runtime[12.6.0]
│ └─ cuda-libraries[12.6.0]
│ ├─ cuda-cudart[12.6.37]
│ │ ├─ cuda-cudart_linux-64[12.6.37]
│ │ │ └─ cuda-version[12.6]
│ │ ├─ libgcc-ng[14.1.0]
│ │ │ ├─ _libgcc_mutex[0.1]
│ │ │ └─ _openmp_mutex[4.5]
│ │ │ ├─ _libgcc_mutex already visited
│ │ │ └─ llvm-openmp[18.1.8]
│ │ │ ├─ libzlib[1.3.1]
│ │ │ └─ zstd[1.5.6]
│ │ │ ├─ libzlib already visited
│ │ │ └─ libstdcxx-ng[14.1.0]
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-nvrtc[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-opencl[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ ocl-icd[2.3.2]
│ │ └─ libgcc-ng already visited
│ ├─ libcublas[12.6.0.22]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-nvrtc[12.0.76]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-version[12.0.0]
│ ├─ libcufft[11.2.6.28]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ libcufile[1.11.0.15]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ libcurand[10.3.7.37]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ libcusolver[11.6.4.38]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ ├─ libcublas already visited
│ │ ├─ libcusparse[12.5.2.23]
│ │ │ ├─ libgcc-ng already visited
│ │ │ ├─ libstdcxx-ng already visited
│ │ │ └─ libnvjitlink[12.6.20]
│ │ │ ├─ libgcc-ng already visited
│ │ │ └─ libstdcxx-ng already visited
│ │ └─ libnvjitlink already visited
│ ├─ libcusparse already visited
│ ├─ libnvjitlink already visited
│ ├─ libnpp[12.3.1.23]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ libnvfatbin[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ └─ libnvjpeg[12.3.3.23]
│ ├─ libgcc-ng already visited
│ └─ libstdcxx-ng already visited
└─ cuda-toolkit[12.6.0]
├─ cuda-libraries already visited
├─ cuda-compiler[12.6.0]
│ ├─ c-compiler[1.0.0]
│ │ ├─ libgcc-ng already visited
│ │ └─ gcc_linux-64[10.3.0]
│ │ ├─ binutils_linux-64[2.36]
│ │ │ ├─ binutils_impl_linux-64[2.36.1]
│ │ │ │ ├─ ld_impl_linux-64[2.36.1]
│ │ │ │ └─ sysroot_linux-64[2.12]
│ │ │ │ └─ kernel-headers_linux-64[2.6.32]
│ │ │ └─ sysroot_linux-64 already visited
│ │ ├─ sysroot_linux-64 already visited
│ │ └─ gcc_impl_linux-64[10.3.0]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ ├─ binutils_impl_linux-64 already visited
│ │ ├─ sysroot_linux-64 already visited
│ │ ├─ libgcc-devel_linux-64[10.3.0]
│ │ ├─ libgomp[14.1.0]
│ │ │ └─ _libgcc_mutex already visited
│ │ └─ libsanitizer[10.3.0]
│ │ └─ libgcc-ng already visited
│ ├─ cuda-cuobjdump[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-nvdisasm[12.0.76]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-cuxxfilt[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-nvcc[12.6.20]
│ │ ├─ gcc_linux-64 already visited
│ │ ├─ cuda-nvcc_linux-64[12.6.20]
│ │ │ ├─ cuda-cudart-dev_linux-64[12.6.37]
│ │ │ │ ├─ cuda-cccl_linux-64[12.0.90]
│ │ │ │ ├─ cuda-cudart-static_linux-64[12.0.107]
│ │ │ │ └─ cuda-cudart_linux-64[12.0.107]
│ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ └─ libstdcxx-ng already visited
│ │ │ ├─ cuda-driver-dev_linux-64[12.6.37]
│ │ │ ├─ cuda-nvcc-dev_linux-64[12.6.20]
│ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ ├─ cuda-crt-dev_linux-64[12.6.20]
│ │ │ │ └─ cuda-nvvm-dev_linux-64[12.6.20]
│ │ │ ├─ cuda-nvcc-impl[12.6.20]
│ │ │ │ ├─ cuda-cudart already visited
│ │ │ │ ├─ cuda-nvcc-dev_linux-64 already visited
│ │ │ │ ├─ cuda-cudart-dev[12.0.107]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ ├─ libstdcxx-ng already visited
│ │ │ │ │ ├─ cuda-cudart[12.0.107]
│ │ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ │ └─ libstdcxx-ng already visited
│ │ │ │ │ ├─ cuda-cudart-dev_linux-64[12.0.107]
│ │ │ │ │ │ ├─ cuda-cccl_linux-64 already visited
│ │ │ │ │ │ └─ cuda-cudart-static_linux-64 already visited
│ │ │ │ │ └─ cuda-cudart-static[12.0.107]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ ├─ libstdcxx-ng already visited
│ │ │ │ │ └─ cuda-cudart-static_linux-64 already visited
│ │ │ │ ├─ cuda-nvcc-tools[12.6.20]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ ├─ libstdcxx-ng already visited
│ │ │ │ │ ├─ cuda-crt-tools[12.6.20]
│ │ │ │ │ └─ cuda-nvvm-tools[12.6.20]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ └─ libstdcxx-ng already visited
│ │ │ │ └─ cuda-nvvm-impl[12.6.20]
│ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ └─ libstdcxx-ng already visited
│ │ │ ├─ cuda-nvcc-tools already visited
│ │ │ └─ sysroot_linux-64[2.28]
│ │ │ ├─ _sysroot_linux-64_curr_repodata_hack[3]
│ │ │ └─ kernel-headers_linux-64[4.18.0]
│ │ │ └─ _sysroot_linux-64_curr_repodata_hack already visited
│ │ └─ gxx_linux-64[10.3.0]
│ │ ├─ gcc_linux-64 already visited
│ │ ├─ binutils_linux-64 already visited
│ │ ├─ sysroot_linux-64 already visited
│ │ └─ gxx_impl_linux-64[10.3.0]
│ │ ├─ sysroot_linux-64 already visited
│ │ ├─ gcc_impl_linux-64 already visited
│ │ └─ libstdcxx-devel_linux-64[10.3.0]
│ ├─ cuda-nvprune[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ └─ cxx-compiler[1.0.0]
│ ├─ libgcc-ng already visited
│ ├─ libstdcxx-ng already visited
│ └─ gxx_linux-64 already visited
├─ cuda-libraries-dev[12.6.0]
│ ├─ cuda-cccl[12.6.37]
│ │ ├─ cccl[2.5.0]
│ │ └─ cuda-cccl_linux-64[12.6.37]
│ ├─ cuda-cudart-dev[12.6.37]
│ │ ├─ cuda-cudart already visited
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ ├─ cuda-cudart-dev_linux-64 already visited
│ │ └─ cuda-cudart-static[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-cudart-static_linux-64[12.6.37]
│ ├─ cuda-driver-dev[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-driver-dev_linux-64[12.0.107]
│ ├─ cuda-nvrtc-dev[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-nvrtc already visited
│ ├─ cuda-opencl-dev[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-opencl already visited
│ ├─ cuda-profiler-api[12.6.37]
│ │ └─ cuda-cudart-dev already visited
│ ├─ libcublas-dev[12.6.0.22]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ libcublas already visited
│ ├─ libcufft-dev[11.2.6.28]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ libcufft already visited
│ ├─ libcufile-dev[1.11.0.15]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ libcufile already visited
│ ├─ libcurand-dev[10.3.7.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ libcurand already visited
│ ├─ libcusolver-dev[11.6.4.38]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ libcusolver already visited
│ ├─ libcusparse-dev[12.5.2.23]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ ├─ libcusparse already visited
│ │ └─ libnvjitlink already visited
│ ├─ libnpp-dev[12.3.1.23]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ libnpp already visited
│ ├─ libnvfatbin-dev[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ libnvfatbin already visited
│ ├─ libnvjitlink-dev[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ libnvjitlink already visited
│ └─ libnvjpeg-dev[12.3.3.23]
│ ├─ libnvjpeg already visited
│ └─ cuda-cudart-dev already visited
├─ cuda-nvml-dev[12.6.37]
│ ├─ libgcc-ng already visited
│ └─ libstdcxx-ng already visited
└─ cuda-tools[12.6.0]
├─ cuda-command-line-tools[12.6.0]
│ ├─ cuda-cupti-dev[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-cupti[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-gdb[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ gmp[6.3.0]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-nvdisasm[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-nvprof[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-cupti[12.0.90]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-nvtx[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ └─ cuda-sanitizer-api[12.6.34]
│ ├─ libgcc-ng already visited
│ └─ libstdcxx-ng already visited
├─ cuda-visual-tools[12.6.0]
│ ├─ cuda-libraries-dev already visited
│ ├─ cuda-nvml-dev already visited
│ ├─ cuda-nsight[12.6.20]
│ ├─ cuda-nvvp[12.6.37]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ ├─ cuda-nvdisasm already visited
│ │ └─ cuda-nvprof[12.0.90]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-cupti already visited
│ └─ nsight-compute[2024.3.0.15]
│ └─ ...
└─ gds-tools[1.11.0.15]
├─ libgcc-ng already visited
├─ libstdcxx-ng already visited
└─ libcufile already visited
$ mamba repoquery depends cuda-minimal-build=12.6 -c conda-forge --tree --recursive
cuda-minimal-build[12.6.0]
├─ cuda-cccl[12.6.37]
│ ├─ cccl[2.5.0]
│ └─ cuda-cccl_linux-64[12.6.37]
├─ cuda-compiler[12.6.0]
│ ├─ c-compiler[1.0.0]
│ │ ├─ gcc_linux-64[10.3.0]
│ │ │ ├─ binutils_linux-64[2.36]
│ │ │ │ ├─ binutils_impl_linux-64[2.36.1]
│ │ │ │ │ ├─ ld_impl_linux-64[2.36.1]
│ │ │ │ │ └─ sysroot_linux-64[2.12]
│ │ │ │ │ └─ kernel-headers_linux-64[2.6.32]
│ │ │ │ └─ sysroot_linux-64 already visited
│ │ │ ├─ sysroot_linux-64 already visited
│ │ │ └─ gcc_impl_linux-64[10.3.0]
│ │ │ ├─ binutils_impl_linux-64 already visited
│ │ │ ├─ sysroot_linux-64 already visited
│ │ │ ├─ libgcc-devel_linux-64[10.3.0]
│ │ │ ├─ libgcc-ng[14.1.0]
│ │ │ │ ├─ _libgcc_mutex[0.1]
│ │ │ │ └─ _openmp_mutex[4.5]
│ │ │ │ ├─ _libgcc_mutex already visited
│ │ │ │ └─ llvm-openmp[18.1.8]
│ │ │ │ ├─ libzlib[1.3.1]
│ │ │ │ └─ zstd[1.5.6]
│ │ │ │ ├─ libzlib already visited
│ │ │ │ └─ libstdcxx-ng[14.1.0]
│ │ │ ├─ libstdcxx-ng already visited
│ │ │ ├─ libgomp[14.1.0]
│ │ │ │ └─ _libgcc_mutex already visited
│ │ │ └─ libsanitizer[10.3.0]
│ │ │ └─ libgcc-ng already visited
│ │ └─ libgcc-ng already visited
│ ├─ cuda-cuobjdump[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-nvdisasm[12.0.76]
│ │ ├─ libgcc-ng already visited
│ │ ├─ libstdcxx-ng already visited
│ │ └─ cuda-version[12.0.0]
│ ├─ cuda-cuxxfilt[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ ├─ cuda-nvcc[12.6.20]
│ │ ├─ gcc_linux-64 already visited
│ │ ├─ cuda-nvcc_linux-64[12.6.20]
│ │ │ ├─ cuda-cudart-dev_linux-64[12.6.37]
│ │ │ │ ├─ cuda-cccl_linux-64[12.0.90]
│ │ │ │ ├─ cuda-cudart-static_linux-64[12.0.107]
│ │ │ │ └─ cuda-cudart_linux-64[12.0.107]
│ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ └─ libstdcxx-ng already visited
│ │ │ ├─ cuda-driver-dev_linux-64[12.6.37]
│ │ │ ├─ cuda-nvcc-dev_linux-64[12.6.20]
│ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ ├─ cuda-crt-dev_linux-64[12.6.20]
│ │ │ │ └─ cuda-nvvm-dev_linux-64[12.6.20]
│ │ │ ├─ cuda-nvcc-impl[12.6.20]
│ │ │ │ ├─ cuda-nvcc-dev_linux-64 already visited
│ │ │ │ ├─ cuda-cudart[12.6.37]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ ├─ libstdcxx-ng already visited
│ │ │ │ │ └─ cuda-cudart_linux-64[12.6.37]
│ │ │ │ ├─ cuda-cudart-dev[12.0.107]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ ├─ libstdcxx-ng already visited
│ │ │ │ │ ├─ cuda-cudart[12.0.107]
│ │ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ │ └─ libstdcxx-ng already visited
│ │ │ │ │ ├─ cuda-cudart-dev_linux-64[12.0.107]
│ │ │ │ │ │ ├─ cuda-cccl_linux-64 already visited
│ │ │ │ │ │ └─ cuda-cudart-static_linux-64 already visited
│ │ │ │ │ └─ cuda-cudart-static[12.0.107]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ ├─ libstdcxx-ng already visited
│ │ │ │ │ └─ cuda-cudart-static_linux-64 already visited
│ │ │ │ ├─ cuda-nvcc-tools[12.6.20]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ ├─ libstdcxx-ng already visited
│ │ │ │ │ ├─ cuda-crt-tools[12.6.20]
│ │ │ │ │ └─ cuda-nvvm-tools[12.6.20]
│ │ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ │ └─ libstdcxx-ng already visited
│ │ │ │ └─ cuda-nvvm-impl[12.6.20]
│ │ │ │ ├─ libgcc-ng already visited
│ │ │ │ └─ libstdcxx-ng already visited
│ │ │ ├─ cuda-nvcc-tools already visited
│ │ │ └─ sysroot_linux-64[2.28]
│ │ │ ├─ _sysroot_linux-64_curr_repodata_hack[3]
│ │ │ └─ kernel-headers_linux-64[4.18.0]
│ │ │ └─ _sysroot_linux-64_curr_repodata_hack already visited
│ │ └─ gxx_linux-64[10.3.0]
│ │ ├─ gcc_linux-64 already visited
│ │ ├─ binutils_linux-64 already visited
│ │ ├─ sysroot_linux-64 already visited
│ │ └─ gxx_impl_linux-64[10.3.0]
│ │ ├─ sysroot_linux-64 already visited
│ │ ├─ gcc_impl_linux-64 already visited
│ │ └─ libstdcxx-devel_linux-64[10.3.0]
│ ├─ cuda-nvprune[12.6.20]
│ │ ├─ libgcc-ng already visited
│ │ └─ libstdcxx-ng already visited
│ └─ cxx-compiler[1.0.0]
│ ├─ libgcc-ng already visited
│ ├─ libstdcxx-ng already visited
│ └─ gxx_linux-64 already visited
├─ cuda-cudart-dev[12.6.37]
│ ├─ libgcc-ng already visited
│ ├─ libstdcxx-ng already visited
│ ├─ cuda-cudart-dev_linux-64 already visited
│ ├─ cuda-cudart already visited
│ └─ cuda-cudart-static[12.6.37]
│ ├─ libgcc-ng already visited
│ ├─ libstdcxx-ng already visited
│ └─ cuda-cudart-static_linux-64[12.6.37]
└─ cuda-profiler-api[12.6.37]
└─ cuda-cudart-dev already visited