Running Portal natively on Apple Silicon

vhsven

2025-09-13

Portal, a puzzle-platform game developed by Valve, was first released in 2007 as part of The Orange Box. It quickly became a cult classic, known for its unique mechanics and dark humor. Set in a mysterious research facility, the game revolves around the use of a "portal gun" to create linked portals that allow the player to navigate through the environment and solve puzzles. It’s widely regarded as one of the best games of the 2000s and remains a staple of gaming culture.

In recent years, Portal became available on multiple platforms, including macOS. However, with the increasing focus on 64-bit computing, Valve's macOS version of Portal remained stuck in the 32-bit era, making it incompatible with the latest macOS updates. In 2020, Apple announced that macOS Catalina (10.15) and beyond would no longer support 32-bit applications, forcing many older games and software to be abandoned or require updates to work on newer systems.

For those looking to play Portal on macOS today, there is a workaround: using the leaked Source Engine code to build the game from scratch. The source code was made public years ago, and while it has some issues out of the box, with a bit of tinkering, it can be compiled to work on modern macOS versions.

In this post, I'll walk you through the steps to build Portal on macOS using the leaked Source Engine, specifically for users who are dealing with Apple’s 64-bit-only policy. The process involves downloading the leaked source code, building the engine, downloading the necessary game assets, and combining everything to make Portal run on your system.

Step 1: Download the Leaked Source Code

The leaked source code for the Source Engine is available on GitHub. However, you'll want to use a specific fork for it to build successfully on macOS.

Main repository: nillerusr/source-engine
- Unfortunately, this repository doesn’t build successfully on macOS by default
Working fork: er2off/source-engine
- This commit fixes the build issues

Step 2: Build the Source Code

Once you've downloaded the source, follow these instructions to build it.

Prerequisites

Install the required dependencies:

xcode-select --install
brew install sdl2 freetype2 fontconfig pkg-config opus libpng libedit jpeg jpeg-turbo python3

Next, set up your workspace:

cd ~/workspace
git clone --recursive https://github.com/er2off/source-engine.git
cd source-engine
git checkout clang19

Build the Engine

Now you can configure and build the source:

python3 waf configure -T release --prefix='' --build-games=portal
python3 waf build
python3 waf install --destdir='~/Documents/Gaming/Portal'

Step 3: Download Game Assets from Steam

Portal for macOS is still available on Steam today, but only as 32-bit version. This means you can download it, but you cannot run in. Our goal is to combine the assets from this download with our own 64-bit game engine build. Unfortunately, recent updates to the game have made it incompatible with the leaked source engine. The last version of Portal from 2024 that works with the leaked engine can be found on SteamDB:

Luckily, the current beta branch "SteamPipe Beta" points to this older version, so it is very easy to download from Steam.

Step 4: Combine the Engine and Assets

Now that you’ve built the engine and downloaded the necessary game files, it’s time to combine them. First we back up Steam's Portal folder, then delete the 32-bit binaries, and finally replace them with our own 64-bit versions.

cd ~/Library/Application\ Support/Steam/steamapps/common/Portal
cp -r . ~/Portal_backup
rm -rf ./bin ./portal/bin ./hl2_osx
cp -r ~/Documents/Gaming/Portal/bin ./bin
cp -r ~/Documents/Gaming/Portal/portal/bin ./portal/bin
cp ~/Documents/Gaming/Portal/hl2_launcher ./hl2_osx

Note how hl2_launcher gets renamed to hl2_osx.

Step 5: Run the Game

Finally, you're ready to run the game!

./hl2_osx -game Portal

This should launch the game using the custom-built engine. In my experiments it runs flawlessly.

The launch button in Steam should now work as well.

What about Portal 2?

While this approach works for the original Portal (and Half Life 2), it does not work for Portal 2. Portal 2 requires a more recent version of the Source Engine, so it is not compatible with the leaked code we used. If you're looking to play Portal 2 or need a more straightforward way to play Portal on newer versions of macOS, there are several alternatives you can explore using emulation. I have not tried these myself, but googling for terms such as Wine, Whisky, Crossover and Proton should get you started.

Happy gaming!

PS: in retrospect it might be safer to combine the engine and the assets in a folder outside of Steam, so that it will not accidentally get overwritten by any incoming updates.

Running CUDA 12 workloads on Ubuntu

vhsven

2024-08-15

Introduction

CUDA (Compute Unified Device Architecture) is NVIDIA's parallel computing platform that allows developers to harness the power of NVIDIA GPUs for general-purpose computing (GPGPU). CUDA provides a suite of tools and libraries that enable high-performance computing on GPUs, making it a go-to solution for a wide range of computational tasks, including deep learning.

CUDA competes with other GPU computing platforms, such as AMD's ROCm and Intel's OneAPI. Both ROCm and OneAPI are open-source platforms that offer similar capabilities to CUDA. However, CUDA remains dominant, especially in the AI and deep learning space, due to its mature ecosystem and widespread support.

CUDA can be deployed on various operating systems, including Linux, Windows, and macOS. However, it is worth noting that CUDA support for macOS was discontinued after version 12.5 due to Apple's transition to the new ARM-based architecture (Apple Silicon).

In this blog post, we will dive into CUDA by exploring it across three layers:

System-wide setup: We will cover the installation and configuration of the graphics driver.
CUDA and cuDNN setup: We will discuss two different approaches: system-wide or isolated.
Using CUDA: How to leverage CUDA in your projects, including the installation of GPU-accelerated libraries and frameworks.

System overview

For this guide, we will walk through setting up CUDA on a Linux system. Specifically, we will be using Ubuntu 23.10. While it would have been ideal to use a long-term support (LTS) version like 24.04, the differences in setup will be minimal.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 23.10
Release:        23.10
Codename:       mantic

$ uname -m
x86_64

$ uname -r
6.5.0-44-generic

$ ldd --version
ldd (Ubuntu GLIBC 2.38-1ubuntu6.3) 2.38
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.

Linux display driver Setup

When setting up CUDA on Linux, one of the first steps is ensuring that your display driver is correctly installed. On Linux, this is always a system-wide process, and you have two main options: the proprietary NVIDIA driver or the open-source Nouveau driver.

Proprietary drivers: The official drivers provided by NVIDIA, offering the best performance and full support for CUDA. Common versions include 470, 525, 535, 545 and 550.

Nouveau drivers: The Nouveau driver is an open-source alternative to NVIDIA's proprietary driver. While it provides basic functionality and is a good choice for general use, it does not support CUDA.

In this guide, we will be using the proprietary NVIDIA drivers. These drivers also include the nvidia-smi tool, which is vital for managing and monitoring your GPU.

There are two main procedures for installing the drivers: automatic or manual.

Automatic Installation

The easiest way to install the appropriate NVIDIA driver is through the automatic installation process, which detects your GPU and recommends the best driver.

$ ubuntu-drivers devices
== /sys/devices/pci0000:40/0000:40:01.1/0000:41:00.0 ==
modalias : pci:v000010DEd00002204sv000010DEsd0000147Dbc03sc00i00
vendor   : NVIDIA Corporation
model    : GA102 [GeForce RTX 3090]
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-545 - distro non-free
driver   : nvidia-driver-545-open - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-535 - distro non-free recommended
driver   : nvidia-driver-535-open - distro non-free
driver   : nvidia-driver-535-server-open - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

$ ubuntu-drivers list
$ ubuntu-drivers install

Manual Installation

For those who prefer more control over the installation process, or if you want the latest drivers not available in the default Ubuntu repositories, you can manually install the driver.

You can install the display drivers either from the default Ubuntu repositories or from the additional PPA (Personal Package Archive) provided by Ubuntu's graphics drivers team if you want the latest and greatest.

$ sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt update

$ sudo apt install nvidia-driver-535

This command installs the NVIDIA driver version 535, but you can replace "535" with your desired version number.

Once installed, you can verify that the driver is correctly set up using the nvidia-smi tool:

$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:41:00.0  On |                  N/A |
|  0%   45C    P8              32W / 350W |    541MiB / 24576MiB |      7%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off | 00000000:42:00.0 Off |                  N/A |
|  0%   40C    P8              19W / 350W |     10MiB / 24576MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2635      G   /usr/lib/xorg/Xorg                          192MiB |
|    0   N/A  N/A      2944      G   /usr/bin/gnome-shell                         82MiB |
|    0   N/A  N/A      3607      G   ...irefox/3626/usr/lib/firefox/firefox      134MiB |
|    0   N/A  N/A      4701      G   ...erProcess --variations-seed-version      116MiB |
|    1   N/A  N/A      2635      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

Important Notes

Stability vs. latest features: When choosing your driver version, consider the trade-off between stability and access to the latest features. Older drivers like version 470 are more stable and widely tested, while newer versions like 550 offer the latest updates and support for newer hardware.
GPU compatibility: Ensure that the driver you select is compatible with your GPU model. The automatic detection method mentioned above typically handles this well.
Bundled CUDA runtime: The NVIDIA driver comes with a minimal CUDA runtime (i.e., version 12.2) necessary for running basic CUDA applications. However, it does not include the full CUDA toolkit required for development purposes. The runtime version bundled with the driver will not change even if you separately install a proper CUDA runtime or toolkit.

To see where the CUDA runtime is located on your system, you can run:

$ find /usr -name libcuda.so*
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.535.171.04
/usr/lib/i386-linux-gnu/libcuda.so.1
/usr/lib/i386-linux-gnu/libcuda.so
/usr/lib/i386-linux-gnu/libcuda.so.535.171.04

This command will display the paths to the installed CUDA runtime libraries, which are essential for running CUDA-enabled applications.

Compute capability

Compute capability determines the hardware capabilities of your GPU and cannot be upgraded through software. Here's a table summarizing the compute capabilities of various NVIDIA GPU architectures:

Architecture	Compute Capability	GPU Models
Volta	7.0	V100
Turing	7.5	GeForce RTX 20xx, Quadro RTX 8000 and RTX 6000, Tesla T4
Ampere	8.x	A100 (8.0), GeForce RTX 30xx (8.6), RTX A6000 (8.6)
Ada Lovelace	8.9	GeForce RTX 40xx, RTX 6000 Ada
Hopper	9.0	H100, H200
Blackwell	10.x	B100, B200, GeForce RTX 5090

To check the compute capability of your GPU, you can use the following command:

$ nvidia-smi --query-gpu=compute_cap --format=csv
compute_cap
8.6
8.6

For more detailed information on compute capabilities and their implications, refer to the NVIDIA CUDA C Programming Guide.

CUDA and cuDNN

When working with CUDA, it is important to distinguish between the CUDA runtime and the CUDA toolkit, similar to the difference between the Java Runtime Environment (JRE) and the Java Development Kit (JDK). The NVIDIA driver includes a minimal CUDA runtime that enables you to run basic CUDA-enabled applications. However, this runtime is limited and does not include all the components needed for more advanced CUDA tasks.

The CUDA toolkit, on the other hand, is a comprehensive package that provides all the development tools necessary for creating, compiling, and running CUDA applications. It also includes a more complete runtime, which provides additional libraries and features needed for more complex applications. Most deep learning tasks will require this toolkit to function correctly.

When installing the CUDA toolkit, ensure that you only install versions that are less than or equal to the runtime version bundled with your display driver. Installing a higher version without updating the driver first can lead to instability.

In addition to the CUDA toolkit, some deep learning frameworks also require the CUDA Deep Neural Network (cuDNN) package.

System-wide installation

To set up CUDA and cuDNN system-wide, start by ensuring that the NVIDIA proprietary drivers are installed, as discussed earlier. Next, you will need to install a C++ compiler, which is required for compiling CUDA code. This can be done with sudo apt install gcc g++.

Once GCC is installed, you can proceed to install CUDA. You have two main options for this:

The easiest approach is to use the official Ubuntu repository by running sudo apt install nvidia-cuda-toolkit
Alternatively, for more control or to get the latest version, you can install CUDA directly from NVIDIA’s official source. This involves following the detailed instructions provided in the NVIDIA CUDA Installation Guide for Linux.

After installing CUDA, the next step is to set up cuDNN, which is essential for deep learning applications. You can do this by following the instructions in the NVIDIA cuDNN Installation Guide.

Isolated installation (recommended)

For situations where you need to manage multiple projects with different CUDA or cuDNN versions, setting up an isolated environment using Conda or Mamba is the recommended option. This approach keeps the system-wide components minimal and allows each environment to have its own specific setup.

Start by ensuring that the NVIDIA proprietary drivers are installed as discussed earlier, since they are the only system-wide component required. Next, install Mamba, which is a faster alternative to Conda, via Miniforge. You can verify that the installation was successful by running mamba info:

$ mamba info

          mamba version : 1.5.5
     active environment : None
            shell level : 0
       user config file : /home/user/.condarc
 populated config files : /home/user/miniforge3/.condarc
          conda version : 23.11.0
    conda-build version : not installed
         python version : 3.10.13.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=zen3
                          __conda=23.11.0=0
                          __cuda=12.2=0
                          __glibc=2.38=0
                          __linux=6.5.0=0
                          __unix=0=0
       base environment : /home/user/miniforge3  (writable)
      conda av data dir : /home/user/miniforge3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
          package cache : /home/user/miniforge3/pkgs
                          /home/user/.conda/pkgs
       envs directories : /home/user/miniforge3/envs
                          /home/user/.conda/envs
               platform : linux-64
             user-agent : conda/23.11.0 requests/2.31.0 CPython/3.10.13 Linux/6.5.0-44-generic ubuntu/23.10 glibc/2.38 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.5
                UID:GID : 1000:1000
             netrc file : None
           offline mode : False

Relevance of virtual packages in mamba environments

Mamba environments utilize virtual packages to dynamically detect and represent certain system features that are critical for package resolution and compatibility. These virtual packages include system-specific details such as architecture, the operating system, the version of the GNU C Library (glibc), and, importantly, the version of CUDA supported by your installed NVIDIA drivers.

Virtual packages are not installed in the traditional sense but are automatically detected by Mamba. They help the package manager resolve dependencies by ensuring that the packages you install are compatible with your system's underlying hardware and software.

Among these virtual packages, __cuda is particularly important when working with CUDA. It represents the maximum version of CUDA that your NVIDIA driver officially supports. This information is automatically detected and provided by Mamba, assisting in the selection of the appropriate CUDA toolkit and related packages for your environment.

The output above indicates that our system's NVIDIA drivers support CUDA up to version 12.2. While Mamba does not strictly enforce this version when installing the CUDA toolkit, it serves as a guideline. You can technically install lower or higher versions of the toolkit, but installing a version higher than what __cuda indicates is generally not recommended, as it could lead to instability or compatibility issues. If your project requires a higher than supported version of the toolkit, consider upgrading your graphics driver first.

Creating a new mamba environment

Once Mamba is set up, we can proceed to install the CUDA toolkit within an isolated environment. We can do this by creating a new environment and specifying the CUDA version we need, along with any other packages such as Python or cuDNN. For example:

$ mamba create -n my-environment python=3.12 cuda-toolkit=12.2 cudnn
$ mamba activate my-environment

This command sets up a new environment named my-environment with Python 3.12, CUDA toolkit 12.2, and a recent, compatible version of cuDNN.

After activating your environment, you can verify that CUDA is correctly installed by checking the version of the NVIDIA CUDA compiler:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Package overview

When working with CUDA 12, it is important to be aware of the significant changes in package names and structures compared to earlier versions. In CUDA 11 and earlier versions, the installation was typically done using the cudatoolkit or cudatoolkit-dev package. However, starting with CUDA 12, there has been a restructuring of package names. This shift makes installation trickier, especially as much of the older documentation still references these outdated package names.

The following meta-packages can be used to setup a CUDA 12 environment:

cuda-toolkit: This package (notice the hyphen) is now the primary way of installing CUDA development tools. Running mamba install cuda-toolkit=12.2, for instance, will typically provide all the necessary components for CUDA 12.2 development, including the compiler, libraries, and headers. The downside is that it is fairly big, and it will likely include much more than what is strictly necessary for your use case.
cuda-runtime: If you only need the runtime components, install this package. Note that this refers to the complete runtime, not the minimal one bundled with the driver.
cuda: This is a meta-package that pulls in both the toolkit and runtime. Seeing as the runtime contains a subset of the packages in the toolkit, this is in effect functionally equivalent to the cuda-toolkit package.
Other meta packages such as cuda-libraries, cuda-libraries-dev, cuda-compiler and cuda-tools can be used to specify more precisely what your project needs. Below is a simplified hierarchy of the most relevant packages. Check the appendix for a full list.

cuda
├── cuda-runtime
│   └── cuda-libraries
└── cuda-toolkit
    ├── cuda-compiler
    ├── cuda-libraries
    ├── cuda-libraries-dev
    └── cuda-tools
        └── cuda-command-line-tools

Finally, for the fullest possible control, refer to the actual CUDA packages instead of these meta-packages (which are simply groups of packages).

Note that neither cuda, cuda-runtime, nor cuda-toolkit include cuDNN. It is a separate package specifically tailored for deep learning applications, which needs to be installed independently as shown earlier.

Channel selection

When setting up CUDA and related packages in a Mamba environment, the various channels that offer similar packages might cause confusion. The two primary channels to consider are conda-forge (default in Miniforge) and nvidia, both of which offer nearly identical CUDA-related packages, including the CUDA toolkit, cuDNN, and other NVIDIA libraries. We will ignore the anaconda channel (default in Anaconda) because it typically hosts somewhat outdated package versions. Many mamba commands accept a -c <channel> option to include an extra channel on top of the default channels configured in the .condarc file.

The conda-forge channel is a widely used, community-driven repository known for its extensive package coverage beyond CUDA. This makes conda-forge particularly suitable for projects that require a mix of CUDA and other libraries. Additionally, conda-forge is continuously updated and maintained, ensuring that you have access to recent versions of packages.

In contrast, the official nvidia channel, maintained directly by NVIDIA, is dedicated specifically to CUDA and other NVIDIA tools. To install the complete CUDA suite from this channel, you can use mamba install -c nvidia cuda. Although the CUDA-related (meta-)packages in the nvidia channel are almost identical to those found in conda-forge, the nvidia channel provides slightly earlier access to the latest versions and includes some less common versions that may not be available on conda-forge.

In most cases, if your environment requires a wide range of software, conda-forge is likely the better option due to its extensive package offerings. It also helps to avoid some minor hiccups that can result from multi-channel package resolution.

Note that certain packages, such as pytorch also provide their own dedicated channel to install packages from.

Setting environment variables

For certain applications, you might need to manually set additional environment variables:

CUDA_HOME and CUDA_PATH
- These are interchangeable and typically point to the root of the CUDA toolkit folder, which contains the lib and bin directories.
- In the case of a conda environment, they should point to the environment's root.
LD_LIBRARY_PATH
- Add $CUDA_HOME/lib to this path to ensure your system can locate the necessary libraries.

$ echo $CONDA_PREFIX
/home/user/miniforge3/envs/my-environment

$ which nvcc
/home/user/miniforge3/envs/my-environment/bin/nvcc

$ export CUDA_HOME=$CONDA_PREFIX
$ export CUDA_PATH=$CUDA_HOME

Install frameworks and libraries

Once CUDA and cuDNN are set up, the next step is installing the python packages that leverage GPU acceleration. Depending on your development environment, you can install these using either pip in a virtual environment or mamba.

Here are some popular libraries and frameworks:

PyCUDA: Python wrapper for CUDA.
CuPy: NumPy-compatible library that runs on CUDA.
cuNumeric: A drop-in replacement for NumPy, optimized for CUDA.
RAPIDS: A suite of libraries for data science and analytics on GPUs, including cuDF (a faster pandas) and cuML (a faster scikit-learn).
Deep learning frameworks: TensorFlow, PyTorch, ONNX

For example, to install PyTorch with CUDA support using mamba:

$ mamba create -n torch-env -c pytorch -c nvidia python=3.12 pytorch-cuda=12.1 torchvision torchaudio


Looking for: ['python=3', 'pytorch-cuda=12.1', 'torchvision', 'torchaudio']

...

  Package                          Version  Build                         Channel           Size
──────────────────────────────────────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────────────────────────────────────

  + libcublas                    12.1.0.26  0                             nvidia           345MB
  + libcufft                      11.0.2.4  0                             nvidia           108MB
  + libcusolver                  11.4.4.55  0                             nvidia           103MB
  + libcusparse                  12.0.2.55  0                             nvidia           171MB
  + libnpp                       12.0.2.50  0                             nvidia           147MB
  + cuda-cudart                   12.1.105  0                             nvidia           193kB
  + cuda-nvrtc                    12.1.105  0                             nvidia            21MB
  + libnvjitlink                  12.1.105  0                             nvidia            18MB
  + libnvjpeg                    12.1.1.14  0                             nvidia             3MB
  + cuda-cupti                    12.1.105  0                             nvidia            16MB
  + cuda-nvtx                     12.1.105  0                             nvidia            58kB
  ...
  + libcurand                    10.3.7.37  0                             nvidia            54MB
  + libcufile                    1.11.0.15  0                             nvidia             1MB
  + cuda-opencl                    12.6.37  0                             nvidia            27kB
  + cuda-libraries                  12.1.0  0                             nvidia             2kB
  + cuda-runtime                    12.1.0  0                             nvidia             1kB
  ...
  + pytorch                          2.4.0  py3.12_cuda12.1_cudnn9.1.0_0  pytorch            1GB
  + torchtriton                      3.0.0  py312                         pytorch          245MB
  + torchaudio                       2.4.0  py312_cu121                   pytorch            7MB
  + torchvision                     0.19.0  py312_cu121                   pytorch            9MB

  Summary:

  Install: 180 packages

  Total download: 3GB

───────────────────────────────────────────────────────────────────────────────────────────────────


Confirm changes: [Y/n]

Downloading and Extracting Packages:

Preparing transaction: done
Verifying transaction: done
Executing transaction: done

To activate this environment, use

     $ mamba activate torch-env

To deactivate an active environment, use

     $ mamba deactivate

Here, the meta-package pytorch-cuda allows us to specify the required CUDA version. Since version 12.2 is not available, we settle for version 12.1. If we had installed the regular pytorch package, we would have downloaded the CPU version without CUDA acceleration. We can double check the build string of the pytorch package in the command output: py3.12_cuda12.1_cudnn9.1.0_0.

Notice how we need two extra channels: pytorch and nvidia. The first attempt without the nvidia channel failed because pytorch-cuda=12.1 has a dependency on a very specific version of cuBLAS that is unavailable in conda-forge.

Furthermore, notice how we did not specify cudnn this time. PyTorch with CUDA support includes a statically linked version of this library, so we don't need to include it separately.

When the environment is created and activated, we can test whether CUDA support is enabled.

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
2
>>> torch.cuda.current_device()
0
>>> torch.backends.cudnn.version()
90100

Topics for another time

more elaborate pytorch example
- or tensorflow vs tensorflow-gpu
NVIDIA TensorRT

Appendix

Glossary

cudaRT - CUDA runtime
cuBLAS - CUDA BLAS
cuFFT - CUDA Fast Fourier Transform
cuDPP - CUDA Data Parallel Primitives
cuDNN - CUDA Deep Neural Network
cuRAND - CUDA Random Number Generation library
cuSOLVER - CUDA based collection of dense and sparse direct solvers
cuSPARSE - CUDA Sparse Matrix library
NPP - NVIDIA Performance Primitives library
nvGRAPH - NVIDIA Graph Analytics library
NVML - NVIDIA Management Library
NVRTC - NVIDIA Runtime Compilation library for CUDA C++
NVCC - Nvidia CUDA Compiler
- based on LLVM
- source file extension: *.cu
NCCL - NVIDIA Collective Communications Library
- multi-GPU setup
Thrust: open source C++ library of parallel algorithms and data structures

Mamba: CUDA 12 package overview

useful commands
- mamba search -c <channel> --override-channels [--info] <package-spec>
- mamba repoquery whoneeds --tree --recursive -c <channel> <package>
- mamba repoquery depends --tree --recursive -c <channel> <package>

Hierarchical meta package list

cuda
- cuda-runtime
  - cuda-libraries
    - (see below)
- cuda-toolkit
  - cuda-compiler
    - c-compiler
    - cuda-cuobjdump
    - cuda-cuxxfilt
    - cuda-nvcc
    - cuda-nvprune
    - cxx-compiler
  - cuda-libraries
    - cuda-cudart
    - cuda-nvrtc
    - cuda-opencl
    - libcublas
    - libcufft
    - libcufile
    - libcurand
    - libcusolver
    - libcusparse
    - libnpp
    - libnvfatbin
    - libnvjitlink
    - libnvjpeg
  - cuda-libraries-dev
    - cuda-cccl
    - cuda-cudart-dev
    - cuda-driver-dev
    - cuda-nvrtc-dev
    - cuda-opencl-dev
    - cuda-profiler-api
    - libcublas-dev
    - libcufft-dev
    - libcufile-dev
    - libcurand-dev
    - libcusolver-dev
    - libcusparse-dev
    - libnpp-dev
    - libnvfatbin-dev
    - libnvjitlink-dev
    - libnvjpeg-dev
  - cuda-nvml-dev
  - cuda-tools
    - cuda-command-line-tools
      - cuda-cupti-dev
      - cuda-gdb
      - cuda-nvdisasm
      - cuda-nvprof
      - cuda-nvtx
      - cuda-sanitizer-api
    - cuda-visual-tools
    - gds-tools
cuda-minimal-build
- cuda-cccl
- cuda-compiler
  - ...
- cuda-cudart-dev
- cuda-profiler-api
not part of any meta package
- cuda-compat
- cuda-crt
- cuda-nsight
- cuda-nvvm
- cuda-nvvp
- cuda-python
- cudnn
- cuquantum
- cutensor
- nccl

Flat package list

cuda-cccl
cuda-compat
cuda-crt
cuda-crt-dev_linux-64
cuda-crt-tools
cuda-cudart
cuda-cudart-dev
cuda-cuobjdump
cuda-cupti
cuda-cupti-dev
cuda-cupti-doc
cuda-cuxxfilt
cuda-driver-dev
cuda-gdb
cuda-gdb-src
cuda-nsight
cuda-nvcc
cuda-nvcc-dev_linux-64
cuda-nvcc-impl
cuda-nvcc-tools
cuda-nvdisasm
cuda-nvml-dev
cuda-nvprof
cuda-nvprune
cuda-nvrtc
cuda-nvrtc-dev
cuda-nvtx
cuda-nvtx-dev
cuda-nvvm
cuda-nvvm-dev_linux-64
cuda-nvvm-impl
cuda-nvvm-tools
cuda-nvvp
cuda-opencl
cuda-opencl-dev
cuda-profiler-api
cuda-python
cuda-sanitizer-api
cuda-visual-tools
cudnn
~~cupti~~
cuquantum
cutensor
libcublas
libcublas-dev
libcufft
libcufft-dev
~~libcuquantum~~
libcurand
libcurand-dev
libcusolver
libcusolver-dev
libcusparse
libcusparse-dev
~~libcutensor~~
nccl

Repoqueries

output has been slightly edited for clarity

$ mamba repoquery depends cuda=12.6 -c conda-forge --tree --recursive

cuda[12.6.0]
  ├─ cuda-runtime[12.6.0]
  │  └─ cuda-libraries[12.6.0]
  │     ├─ cuda-cudart[12.6.37]
  │     │  ├─ cuda-cudart_linux-64[12.6.37]
  │     │  │  └─ cuda-version[12.6]
  │     │  ├─ libgcc-ng[14.1.0]
  │     │  │  ├─ _libgcc_mutex[0.1]
  │     │  │  └─ _openmp_mutex[4.5]
  │     │  │     ├─ _libgcc_mutex already visited
  │     │  │     └─ llvm-openmp[18.1.8]
  │     │  │        ├─ libzlib[1.3.1]
  │     │  │        └─ zstd[1.5.6]
  │     │  │           ├─ libzlib already visited
  │     │  │           └─ libstdcxx-ng[14.1.0]
  │     │  └─ libstdcxx-ng already visited
  │     ├─ cuda-nvrtc[12.6.20]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ cuda-opencl[12.6.37]
  │     │  ├─ libgcc-ng already visited
  │     │  ├─ libstdcxx-ng already visited
  │     │  └─ ocl-icd[2.3.2]
  │     │     └─ libgcc-ng already visited
  │     ├─ libcublas[12.6.0.22]
  │     │  ├─ libgcc-ng already visited
  │     │  ├─ libstdcxx-ng already visited
  │     │  └─ cuda-nvrtc[12.0.76]
  │     │     ├─ libgcc-ng already visited
  │     │     ├─ libstdcxx-ng already visited
  │     │     └─ cuda-version[12.0.0]
  │     ├─ libcufft[11.2.6.28]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ libcufile[1.11.0.15]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ libcurand[10.3.7.37]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ libcusolver[11.6.4.38]
  │     │  ├─ libgcc-ng already visited
  │     │  ├─ libstdcxx-ng already visited
  │     │  ├─ libcublas already visited
  │     │  ├─ libcusparse[12.5.2.23]
  │     │  │  ├─ libgcc-ng already visited
  │     │  │  ├─ libstdcxx-ng already visited
  │     │  │  └─ libnvjitlink[12.6.20]
  │     │  │     ├─ libgcc-ng already visited
  │     │  │     └─ libstdcxx-ng already visited
  │     │  └─ libnvjitlink already visited
  │     ├─ libcusparse already visited
  │     ├─ libnvjitlink already visited
  │     ├─ libnpp[12.3.1.23]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     ├─ libnvfatbin[12.6.20]
  │     │  ├─ libgcc-ng already visited
  │     │  └─ libstdcxx-ng already visited
  │     └─ libnvjpeg[12.3.3.23]
  │        ├─ libgcc-ng already visited
  │        └─ libstdcxx-ng already visited
  └─ cuda-toolkit[12.6.0]
     ├─ cuda-libraries already visited
     ├─ cuda-compiler[12.6.0]
     │  ├─ c-compiler[1.0.0]
     │  │  ├─ libgcc-ng already visited
     │  │  └─ gcc_linux-64[10.3.0]
     │  │     ├─ binutils_linux-64[2.36]
     │  │     │  ├─ binutils_impl_linux-64[2.36.1]
     │  │     │  │  ├─ ld_impl_linux-64[2.36.1]
     │  │     │  │  └─ sysroot_linux-64[2.12]
     │  │     │  │     └─ kernel-headers_linux-64[2.6.32]
     │  │     │  └─ sysroot_linux-64 already visited
     │  │     ├─ sysroot_linux-64 already visited
     │  │     └─ gcc_impl_linux-64[10.3.0]
     │  │        ├─ libgcc-ng already visited
     │  │        ├─ libstdcxx-ng already visited
     │  │        ├─ binutils_impl_linux-64 already visited
     │  │        ├─ sysroot_linux-64 already visited
     │  │        ├─ libgcc-devel_linux-64[10.3.0]
     │  │        ├─ libgomp[14.1.0]
     │  │        │  └─ _libgcc_mutex already visited
     │  │        └─ libsanitizer[10.3.0]
     │  │           └─ libgcc-ng already visited
     │  ├─ cuda-cuobjdump[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ cuda-nvdisasm[12.0.76]
     │  │     ├─ libgcc-ng already visited
     │  │     └─ libstdcxx-ng already visited
     │  ├─ cuda-cuxxfilt[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  └─ libstdcxx-ng already visited
     │  ├─ cuda-nvcc[12.6.20]
     │  │  ├─ gcc_linux-64 already visited
     │  │  ├─ cuda-nvcc_linux-64[12.6.20]
     │  │  │  ├─ cuda-cudart-dev_linux-64[12.6.37]
     │  │  │  │  ├─ cuda-cccl_linux-64[12.0.90]
     │  │  │  │  ├─ cuda-cudart-static_linux-64[12.0.107]
     │  │  │  │  └─ cuda-cudart_linux-64[12.0.107]
     │  │  │  │     ├─ libgcc-ng already visited
     │  │  │  │     └─ libstdcxx-ng already visited
     │  │  │  ├─ cuda-driver-dev_linux-64[12.6.37]
     │  │  │  ├─ cuda-nvcc-dev_linux-64[12.6.20]
     │  │  │  │  ├─ libgcc-ng already visited
     │  │  │  │  ├─ cuda-crt-dev_linux-64[12.6.20]
     │  │  │  │  └─ cuda-nvvm-dev_linux-64[12.6.20]
     │  │  │  ├─ cuda-nvcc-impl[12.6.20]
     │  │  │  │  ├─ cuda-cudart already visited
     │  │  │  │  ├─ cuda-nvcc-dev_linux-64 already visited
     │  │  │  │  ├─ cuda-cudart-dev[12.0.107]
     │  │  │  │  │  ├─ libgcc-ng already visited
     │  │  │  │  │  ├─ libstdcxx-ng already visited
     │  │  │  │  │  ├─ cuda-cudart[12.0.107]
     │  │  │  │  │  │  ├─ libgcc-ng already visited
     │  │  │  │  │  │  └─ libstdcxx-ng already visited
     │  │  │  │  │  ├─ cuda-cudart-dev_linux-64[12.0.107]
     │  │  │  │  │  │  ├─ cuda-cccl_linux-64 already visited
     │  │  │  │  │  │  └─ cuda-cudart-static_linux-64 already visited
     │  │  │  │  │  └─ cuda-cudart-static[12.0.107]
     │  │  │  │  │     ├─ libgcc-ng already visited
     │  │  │  │  │     ├─ libstdcxx-ng already visited
     │  │  │  │  │     └─ cuda-cudart-static_linux-64 already visited
     │  │  │  │  ├─ cuda-nvcc-tools[12.6.20]
     │  │  │  │  │  ├─ libgcc-ng already visited
     │  │  │  │  │  ├─ libstdcxx-ng already visited
     │  │  │  │  │  ├─ cuda-crt-tools[12.6.20]
     │  │  │  │  │  └─ cuda-nvvm-tools[12.6.20]
     │  │  │  │  │     ├─ libgcc-ng already visited
     │  │  │  │  │     └─ libstdcxx-ng already visited
     │  │  │  │  └─ cuda-nvvm-impl[12.6.20]
     │  │  │  │     ├─ libgcc-ng already visited
     │  │  │  │     └─ libstdcxx-ng already visited
     │  │  │  ├─ cuda-nvcc-tools already visited
     │  │  │  └─ sysroot_linux-64[2.28]
     │  │  │     ├─ _sysroot_linux-64_curr_repodata_hack[3]
     │  │  │     └─ kernel-headers_linux-64[4.18.0]
     │  │  │        └─ _sysroot_linux-64_curr_repodata_hack already visited
     │  │  └─ gxx_linux-64[10.3.0]
     │  │     ├─ gcc_linux-64 already visited
     │  │     ├─ binutils_linux-64 already visited
     │  │     ├─ sysroot_linux-64 already visited
     │  │     └─ gxx_impl_linux-64[10.3.0]
     │  │        ├─ sysroot_linux-64 already visited
     │  │        ├─ gcc_impl_linux-64 already visited
     │  │        └─ libstdcxx-devel_linux-64[10.3.0]
     │  ├─ cuda-nvprune[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  └─ libstdcxx-ng already visited
     │  └─ cxx-compiler[1.0.0]
     │     ├─ libgcc-ng already visited
     │     ├─ libstdcxx-ng already visited
     │     └─ gxx_linux-64 already visited
     ├─ cuda-libraries-dev[12.6.0]
     │  ├─ cuda-cccl[12.6.37]
     │  │  ├─ cccl[2.5.0]
     │  │  └─ cuda-cccl_linux-64[12.6.37]
     │  ├─ cuda-cudart-dev[12.6.37]
     │  │  ├─ cuda-cudart already visited
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  ├─ cuda-cudart-dev_linux-64 already visited
     │  │  └─ cuda-cudart-static[12.6.37]
     │  │     ├─ libgcc-ng already visited
     │  │     ├─ libstdcxx-ng already visited
     │  │     └─ cuda-cudart-static_linux-64[12.6.37]
     │  ├─ cuda-driver-dev[12.6.37]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ cuda-driver-dev_linux-64[12.0.107]
     │  ├─ cuda-nvrtc-dev[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ cuda-nvrtc already visited
     │  ├─ cuda-opencl-dev[12.6.37]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ cuda-opencl already visited
     │  ├─ cuda-profiler-api[12.6.37]
     │  │  └─ cuda-cudart-dev already visited
     │  ├─ libcublas-dev[12.6.0.22]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcublas already visited
     │  ├─ libcufft-dev[11.2.6.28]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcufft already visited
     │  ├─ libcufile-dev[1.11.0.15]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcufile already visited
     │  ├─ libcurand-dev[10.3.7.37]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcurand already visited
     │  ├─ libcusolver-dev[11.6.4.38]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libcusolver already visited
     │  ├─ libcusparse-dev[12.5.2.23]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  ├─ libcusparse already visited
     │  │  └─ libnvjitlink already visited
     │  ├─ libnpp-dev[12.3.1.23]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libnpp already visited
     │  ├─ libnvfatbin-dev[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libnvfatbin already visited
     │  ├─ libnvjitlink-dev[12.6.20]
     │  │  ├─ libgcc-ng already visited
     │  │  ├─ libstdcxx-ng already visited
     │  │  └─ libnvjitlink already visited
     │  └─ libnvjpeg-dev[12.3.3.23]
     │     ├─ libnvjpeg already visited
     │     └─ cuda-cudart-dev already visited
     ├─ cuda-nvml-dev[12.6.37]
     │  ├─ libgcc-ng already visited
     │  └─ libstdcxx-ng already visited
     └─ cuda-tools[12.6.0]
        ├─ cuda-command-line-tools[12.6.0]
        │  ├─ cuda-cupti-dev[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  ├─ libstdcxx-ng already visited
        │  │  └─ cuda-cupti[12.6.37]
        │  │     ├─ libgcc-ng already visited
        │  │     └─ libstdcxx-ng already visited
        │  ├─ cuda-gdb[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  ├─ libstdcxx-ng already visited
        │  │  └─ gmp[6.3.0]
        │  │     ├─ libgcc-ng already visited
        │  │     └─ libstdcxx-ng already visited
        │  ├─ cuda-nvdisasm[12.6.20]
        │  │  ├─ libgcc-ng already visited
        │  │  └─ libstdcxx-ng already visited
        │  ├─ cuda-nvprof[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  ├─ libstdcxx-ng already visited
        │  │  └─ cuda-cupti[12.0.90]
        │  │     ├─ libgcc-ng already visited
        │  │     └─ libstdcxx-ng already visited
        │  ├─ cuda-nvtx[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  └─ libstdcxx-ng already visited
        │  └─ cuda-sanitizer-api[12.6.34]
        │     ├─ libgcc-ng already visited
        │     └─ libstdcxx-ng already visited
        ├─ cuda-visual-tools[12.6.0]
        │  ├─ cuda-libraries-dev already visited
        │  ├─ cuda-nvml-dev already visited
        │  ├─ cuda-nsight[12.6.20]
        │  ├─ cuda-nvvp[12.6.37]
        │  │  ├─ libgcc-ng already visited
        │  │  ├─ libstdcxx-ng already visited
        │  │  ├─ cuda-nvdisasm already visited
        │  │  └─ cuda-nvprof[12.0.90]
        │  │     ├─ libgcc-ng already visited
        │  │     ├─ libstdcxx-ng already visited
        │  │     └─ cuda-cupti already visited
        │  └─ nsight-compute[2024.3.0.15]
        │     └─ ...
        └─ gds-tools[1.11.0.15]
           ├─ libgcc-ng already visited
           ├─ libstdcxx-ng already visited
           └─ libcufile already visited

$ mamba repoquery depends cuda-minimal-build=12.6 -c conda-forge --tree --recursive

cuda-minimal-build[12.6.0]
  ├─ cuda-cccl[12.6.37]
  │  ├─ cccl[2.5.0]
  │  └─ cuda-cccl_linux-64[12.6.37]
  ├─ cuda-compiler[12.6.0]
  │  ├─ c-compiler[1.0.0]
  │  │  ├─ gcc_linux-64[10.3.0]
  │  │  │  ├─ binutils_linux-64[2.36]
  │  │  │  │  ├─ binutils_impl_linux-64[2.36.1]
  │  │  │  │  │  ├─ ld_impl_linux-64[2.36.1]
  │  │  │  │  │  └─ sysroot_linux-64[2.12]
  │  │  │  │  │     └─ kernel-headers_linux-64[2.6.32]
  │  │  │  │  └─ sysroot_linux-64 already visited
  │  │  │  ├─ sysroot_linux-64 already visited
  │  │  │  └─ gcc_impl_linux-64[10.3.0]
  │  │  │     ├─ binutils_impl_linux-64 already visited
  │  │  │     ├─ sysroot_linux-64 already visited
  │  │  │     ├─ libgcc-devel_linux-64[10.3.0]
  │  │  │     ├─ libgcc-ng[14.1.0]
  │  │  │     │  ├─ _libgcc_mutex[0.1]
  │  │  │     │  └─ _openmp_mutex[4.5]
  │  │  │     │     ├─ _libgcc_mutex already visited
  │  │  │     │     └─ llvm-openmp[18.1.8]
  │  │  │     │        ├─ libzlib[1.3.1]
  │  │  │     │        └─ zstd[1.5.6]
  │  │  │     │           ├─ libzlib already visited
  │  │  │     │           └─ libstdcxx-ng[14.1.0]
  │  │  │     ├─ libstdcxx-ng already visited
  │  │  │     ├─ libgomp[14.1.0]
  │  │  │     │  └─ _libgcc_mutex already visited
  │  │  │     └─ libsanitizer[10.3.0]
  │  │  │        └─ libgcc-ng already visited
  │  │  └─ libgcc-ng already visited
  │  ├─ cuda-cuobjdump[12.6.20]
  │  │  ├─ libgcc-ng already visited
  │  │  ├─ libstdcxx-ng already visited
  │  │  └─ cuda-nvdisasm[12.0.76]
  │  │     ├─ libgcc-ng already visited
  │  │     ├─ libstdcxx-ng already visited
  │  │     └─ cuda-version[12.0.0]
  │  ├─ cuda-cuxxfilt[12.6.20]
  │  │  ├─ libgcc-ng already visited
  │  │  └─ libstdcxx-ng already visited
  │  ├─ cuda-nvcc[12.6.20]
  │  │  ├─ gcc_linux-64 already visited
  │  │  ├─ cuda-nvcc_linux-64[12.6.20]
  │  │  │  ├─ cuda-cudart-dev_linux-64[12.6.37]
  │  │  │  │  ├─ cuda-cccl_linux-64[12.0.90]
  │  │  │  │  ├─ cuda-cudart-static_linux-64[12.0.107]
  │  │  │  │  └─ cuda-cudart_linux-64[12.0.107]
  │  │  │  │     ├─ libgcc-ng already visited
  │  │  │  │     └─ libstdcxx-ng already visited
  │  │  │  ├─ cuda-driver-dev_linux-64[12.6.37]
  │  │  │  ├─ cuda-nvcc-dev_linux-64[12.6.20]
  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  ├─ cuda-crt-dev_linux-64[12.6.20]
  │  │  │  │  └─ cuda-nvvm-dev_linux-64[12.6.20]
  │  │  │  ├─ cuda-nvcc-impl[12.6.20]
  │  │  │  │  ├─ cuda-nvcc-dev_linux-64 already visited
  │  │  │  │  ├─ cuda-cudart[12.6.37]
  │  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  │  ├─ libstdcxx-ng already visited
  │  │  │  │  │  └─ cuda-cudart_linux-64[12.6.37]
  │  │  │  │  ├─ cuda-cudart-dev[12.0.107]
  │  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  │  ├─ libstdcxx-ng already visited
  │  │  │  │  │  ├─ cuda-cudart[12.0.107]
  │  │  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  │  │  └─ libstdcxx-ng already visited
  │  │  │  │  │  ├─ cuda-cudart-dev_linux-64[12.0.107]
  │  │  │  │  │  │  ├─ cuda-cccl_linux-64 already visited
  │  │  │  │  │  │  └─ cuda-cudart-static_linux-64 already visited
  │  │  │  │  │  └─ cuda-cudart-static[12.0.107]
  │  │  │  │  │     ├─ libgcc-ng already visited
  │  │  │  │  │     ├─ libstdcxx-ng already visited
  │  │  │  │  │     └─ cuda-cudart-static_linux-64 already visited
  │  │  │  │  ├─ cuda-nvcc-tools[12.6.20]
  │  │  │  │  │  ├─ libgcc-ng already visited
  │  │  │  │  │  ├─ libstdcxx-ng already visited
  │  │  │  │  │  ├─ cuda-crt-tools[12.6.20]
  │  │  │  │  │  └─ cuda-nvvm-tools[12.6.20]
  │  │  │  │  │     ├─ libgcc-ng already visited
  │  │  │  │  │     └─ libstdcxx-ng already visited
  │  │  │  │  └─ cuda-nvvm-impl[12.6.20]
  │  │  │  │     ├─ libgcc-ng already visited
  │  │  │  │     └─ libstdcxx-ng already visited
  │  │  │  ├─ cuda-nvcc-tools already visited
  │  │  │  └─ sysroot_linux-64[2.28]
  │  │  │     ├─ _sysroot_linux-64_curr_repodata_hack[3]
  │  │  │     └─ kernel-headers_linux-64[4.18.0]
  │  │  │        └─ _sysroot_linux-64_curr_repodata_hack already visited
  │  │  └─ gxx_linux-64[10.3.0]
  │  │     ├─ gcc_linux-64 already visited
  │  │     ├─ binutils_linux-64 already visited
  │  │     ├─ sysroot_linux-64 already visited
  │  │     └─ gxx_impl_linux-64[10.3.0]
  │  │        ├─ sysroot_linux-64 already visited
  │  │        ├─ gcc_impl_linux-64 already visited
  │  │        └─ libstdcxx-devel_linux-64[10.3.0]
  │  ├─ cuda-nvprune[12.6.20]
  │  │  ├─ libgcc-ng already visited
  │  │  └─ libstdcxx-ng already visited
  │  └─ cxx-compiler[1.0.0]
  │     ├─ libgcc-ng already visited
  │     ├─ libstdcxx-ng already visited
  │     └─ gxx_linux-64 already visited
  ├─ cuda-cudart-dev[12.6.37]
  │  ├─ libgcc-ng already visited
  │  ├─ libstdcxx-ng already visited
  │  ├─ cuda-cudart-dev_linux-64 already visited
  │  ├─ cuda-cudart already visited
  │  └─ cuda-cudart-static[12.6.37]
  │     ├─ libgcc-ng already visited
  │     ├─ libstdcxx-ng already visited
  │     └─ cuda-cudart-static_linux-64[12.6.37]
  └─ cuda-profiler-api[12.6.37]
     └─ cuda-cudart-dev already visited

Building a ML workstation

vhsven

2024-01-17

Intro
Component selection
- GPU
  - Multi-GPU considerations
- CPU
  - CPU cooler
- Motherboard
- Other
  - RAM
  - Storage
  - PSU
  - Case
  - Cooling fans
Further reading

Intro

I have been keeping a close eye on the evolutions in AI/ML for a while now. Whenever I come across an interesting demo, I of course like to try it out. Because my main computer at home only has a weak iGPU, I often resort to running workloads in the cloud (mostly Google Colab or AWS). While that works reasonably well, there are some downsides:

risk of going over budget when an instance accidentally keep running after use
general mild inconvenience of working with remote systems
cloud defeats the purpose of running a private/local LLM
more expensive in the long run

That is why I decided to build my own ML system last month. I am not sure if I will actually end up saving money this way, but it is going to be an educational experience regardless. It is still early days, but this post contains my lessons learned so far.

Component selection

GPU

The core of any ML workstation is the GPU. Due to the ubiquity of CUDA requirements in deep learning, there is only a single viable brand: Nvidia. Their offerings can be categorized in three categories:

Architecture	Desktop	Workstation	Datacenter
Pascal (2016)	GeForce GTX 10xx	Quadro P	Tesla P4 / Tesla P100
Volta (2017)	N/A	Quadro GV100	Tesla V100
Turing (2018)	GeForce RTX 20xx	Quadro RTX	Tesla T4
Ampere (2020)	GeForce RTX 30xx	RTX A series	A100
Ada (2022)	GeForce RTX 40xx	RTX 6000 Ada	N/A?
Hopper (2022)	N/A	N/A	H100
Blackwell

If you have tens of thousands of dollars to burn, you will want to look at Nvidia's enterprise offerings and more specifically at the A100 or newer H100 GPUs. These options come with abundant VRAM (40 to 80GB), which we can put to good use in a deep learning context. Additionally, they are very power efficient with a lower TDP compared to consumer-grade GeForce cards. This translates to a smaller physical footprint, so that multiple cards can fit in a single server case. Be careful when installing these GPUs in a regular desktop case though: they only have passive (i.e., fanless) cooling so they require very intensive external ventilation as is standard in a typical, temperature-controlled data center.

Notice how I focus on VRAM memory above all else. The reasoning behind this is simple: if you do not have enough memory, your model simply will not run. The other specs will only determine how patient you will have to be to see the result.

One step down in the price range (5 000EUR - 10 000EUR) we find their workstation offerings. Here, the RTX A6000 and RTX 6000 Ada with 48GB VRAM both look appealing. These come in a "blower style" form factor instead of the more traditional "open air" form factor, meaning they exhaust hot air via the back instead of spreading it back into the case. This again makes it possible to install many cards in a limited physical space without having to worry too much about heat dissipation. Unfortunately, these cards make a lot of noise, and this type of cooling is not suitable for more power-hungry consumer-grade cards (250W+).

The price range of up to 2000EUR makes those consumer-grade cards a lot more viable for most people. Conveniently, the current and last generation flagships - RTX 4090 and RTX 3090 (Ti) respectively - both have 24GB VRAM. In conclusion, buying a (lightly) used RTX 3090 (700EUR - 800EUR) might be the most budget-friendly option out there. Downsides of these consumer grade cards are their high power usage and their unwieldy form factor. (You will have a hard time fitting an RTX 4090 in a 4U server case.)

While Nvidia produces all of its own cards in the datacenter and workstation segments, there is a lot more competition in the consumer space. While Nvidia releases Founders Edition (FE) cards for soms of its own GPU chips, many other companies (Asus, MSI, Gigabyte, ...) build their own alternative cards around those same chips. They all have their own peculiarities:

factory overclocking
- not at all relevant for us, if anything we might end up underclocking our card to keep power usage and temperature over long time spans under control
cooling method (1-3 fans, optional watercooling)
physical size
- typically 3+ expansion slots
- watercooled cards can be slimmed down to 1 slot height
power usage
power connectors
- most cards need 2 to 3 PCIe 6+2pin connectors
looks
- especially RGB lights, if you are into that
...

For my build, I am going to start out with a single RTX 3090 FE, but I want to select the other components carefully so that I can expand to 2x3090 or even 3x3090 in the future. Specifications:

Ampere architecture
24GB GDDR6X VRAM (= 12 chips x 2GB/chip?)
- have a tendency to run hot (>100 degrees Celcius)
- might need to replace the thermal pads
- using a GPU brace is also reported to fix some heat issues
- or consider underclocking with sudo nvidia-smi -i <GPU_index> -pl <power_limit>
- or look into custom water cooling blocks, if that is your thing
memory bus: 384bit (= 12 chips x 32bits/chip)
dimensions: 313 mm x 138 mm x 3 expansion slots
PCIe connector: PCIe Gen 4 x16
- note: PCIe Gen 5 is the latest standard, but there are not Gen 5 GPUs yet
power
- 350W
- connector
  - placed on long edge of card (instead of short edge in higher segments)
  - (variant of) new 12Vhpwr connector found on new ATX 3.0 PSUs
  - including conversion cable to 2xPCIe 6+2pin connectors
last consumer card to support NVLink

For a much more in-depth analysis of GPUs for deep learning, check out https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/.

Multi-GPU considerations

We need a motherboard and CPU combination that has enough PCIe Gen 4 x16 slots with enough PCIe lanes and enough physical spacing in between to make this possible.

space
heat
power
PCIe lanes
SLI / NVLink
- bridge sold separately
- requires fixed amount of space between cards
  - not compatible with "creative" (i.e., vertical) GPU placement options
use same make and model for all cards
- else computation will often wait for the slowest card to finish
note: this kind of setup only makes sense for deep learning, not for gaming

CPU

For a CPU we have to choose between Intel and AMD. While Intel used to be a no-brainer in the not-too-distant past, the tables have turned in recent years. I knew this was true in the consumer space, but as it turns out it is also valid in high-end segments such as HEDT, workstation and server CPUs.

For a ML workstation, the CPU is not nearly as important as the GPU. When possible, extra budget should go to the GPU instead. However, the CPU has an important role in making sure the GPUs can reach their full potential. To do this, it has to be able to supply them with enough data so that they are not sitting idle. This memory bandwidth will play a crucial role in our choice of CPU segment.

Some background on PCI slots: each has a physical size (i.e., width) and a number of communication lanes that are both expressed with indicators such as "x1", "x2", "x4", "x8" or "x16". A GPU typically occupies a physical x16 slot because a lot of data has to be transferred back and forth. Note that smaller expansion cards can fit in larger slots but not the other way around. For example, a single x4 card can plug in a x16 PCIe slot, thereby forfeiting the other x12 lanes. Typically, a slot that is x4 wide and contains an x4 expansion card will use all x4 lanes and a slot that is x16 wide will use all x16 lanes. However, the two factors can in practice diverge. When either the CPU or motherboard are not able to handle the combined amount of lanes over all slots, they can decide to run one or more slot at half the number of lanes. So an x16 GPU slot can run with x8 lanes.

Furthermore, each PCIe generation roughly works double as fast as the previous one. So PCIe Gen 5x8 can work as fast as PCIe Gen 4x16. You might now be thinking: it evens out if we put a Gen 4x16 GPU in a Gen 5x8 slot. Unfortunately, that is not the case. The Gen 5 slot has full backwards compatibility with Gen 4 expansion cards, but it will also be limited to Gen 4 speeds in that case. Effectively, it will still be running at Gen4x8 if only 8 lanes are available. To be clear: running at half the amount of lanes does not halve the effective speed of the GPU. The number of lanes is not the main bottleneck in most systems, so the performance penalty will be much lower.

Consumer-grade CPUs such as Intel Core i3-i9 and AMD Ryzen 3-9 have a very limited number of available PCIe lanes. For example, the top end AMD Ryzen 9 7900X can only manage 28 PCIe lanes (of which 4 are reserved to communicate with the motherboard chipset). That leaves 24 lanes (e.g., x16 + x8) for our GPUs. For almost all consumers - who are only ever interested in having a single GPU - this is plenty. However, we have to ask ourselves if running our second GPU with only x8 instead of x16 lanes is worth it. For many people the answer will be "yes" and they should stick to this segment. The alternative is looking at HEDT, workstation or server segment CPUs, as we will do below.

HEDT or high-end desktop started with Intel Extreme Edition CPUs, and later Intel Core X CPUs. These days the HEDT segment of Intel has been integrated in their Xeon lineup of workstation and server processors. Specifically, the Xeon W9 2400 and 3400 series. The category sits somewhere between consumer-grade hardware and workstation hardware, offering more multithreading performance and more PCIe lanes. AMD is going back and forth with regard to their HEDT support. Threadripper CPUs are in the HEDT segment, while Threadripper PRO CPUs are in the workstation segment. AMD had not released a non-PRO Threadripper in a while, but at CES 2024 they announced a new lineup (e.g., AMD Ryzen Threadripper 7970X with 32 cores and 92 PCIe 5.0 lanes). In summary, HEDT is a good match for our build but the segment is being squeezed by high-end consumer hardware and lower-end workstation hardware.

Specifically, the AMD Ryzen Threadripper PRO 5000WX series (based on the older Zen 3 architecture) is very competitively priced these days. It offers workstation CPUs with up to 64 cores, 2TB of DDR4 RAM and 128 PCIe lanes. As we will see in the motherboard section, these builds come with typical enterprise features that are redundant for our target audience, to the point where a HEDT build would be a better match if properly priced. An additional benefit of using somewhat older (i.e., 2022) hardware is that the DDR4 memory and PCIe 4.0 SSDs that come with it are cheaper than the recent DDR5/PCIe 5.0 counterparts.

I also briefly looked at the server segment (Intel Xeon and AMD EPYC) but found no better offerings there. In the end I settled for a Threadripper PRO 5955WX 16 core CPU with a TDP of 280W and 128 PCIe lanes that I could get a decent deal on. The Intel counterparts have fewer cores, lower clock rates, fewer PCIe lanes for the same or more money.

Another fun fact about CPUs: you can buy them boxed (default) or as "tray". Tray refers to the tray with multiple CPUs that are typically bought by OEMs for use in prebuilt their systems. As such, these don't come with any extras (no box, no manual, no stock cooler, ...). OEMs are not supposed to resell these to consumers, but it sometimes happens regardless. Manufacturers like Intel and AMD will typically not provide factory warranty for such products, so you will have to talk to the intermediary (i.e., the OEM) instead in case of problems. The main advantage of tray CPUs is their lower price. If the discount is significant enough, it is worth considering. However, I learned the hard way that Threadripper CPUs are supposed to come with a torque wrench to fix the CPU mount precisely as tight as prescribed. The tray version of these CPUs obviously also do not include this tool.

CPU cooler

For workstation-grade builds, I prefer aircooling over watercooling. It requires barely any maintenance and it can run without failure for years on end. We do however need a cooler with a sizable heatsink to be able to dissipate the TDP of our CPU. When choosing one, make sure it does not occlude any RAM of PCIe slots that you intend to use.

Specifically for Threadripper PRO builds, pay attention to the orientation of the cooler. In desktops and workstations, coolers are supposed to blow air from front to back in the case. However, our CPU is from the server segment - contrary to the non-PRO Threadrippers in the HEDT segment - where a horizontal socket orientation is more common. In that case a regular cooler will blow air from bottom to top. The Noctua NH-U14S and NH-U12S both suffer from this. Eventually, I discovered the Arctic Freezer 4U-M, which has the correct orientation and also matches all other requirements (i.e., socket and TDP). The "4U" terminology refers to server height in a rack.

Motherboard

Our choice of CPU (or more precisely its sWRX80 socket) limits our choice of motherboards quite significantly. Again, we select in function of our multi-GPU setup. Specifically, we are looking for plenty of PCI Gen 4x16 slots that can run at full speed and are also spaced far enough apart. Additionally, we would like a few M.2 slots with heatsink that have a direct connection to the CPU and that are placed far enough away from hot GPUs. Finally, make sure to check the connectivity options (USB, USB-C, WiFi, Bluetooth). Fortunately, most motherboards in the short list fit the bill. I eventually went with the ASUS Pro WS WRX80E-SAGE SE WIFI because I saw it being used in Lambda Labs builds and I could get a good deal on one. It was only afterwards that I realized the awkward dimensions of this board (see Case section). It is worth looking for smaller alternatives, but make sure to look at the block diagram showing all interconnections before making a decision. The ASRock WRX80 Creator comes to mind, although it seems hard to come by and does not support x16 lanes in all PCIe slots.

Some random notes on the ASUS Pro WS WRX80E-SAGE SE WIFI board:

requires lots of power cables
- 1 x 24-pin ATX connector
- 2 x 8-pin CPU/EPS connector
- 2 x 6-pin PCIe connector
- 1 x 6+2-pin PCIe connector
main feature: 7 PCIe 4.0 x16 slots
built-in power and reset button
- works without connecting front panel headers of case
built-in VGA output
- useful when you don't have discrete GPU yet
  - note: AMD Threadripper PRO does not have iGPU
- makes linux crash on boot
  - first attempt: add acpi=off to grub bootloader options list
    - makes it possible to boot into live environment
    - also disables all but two USB ports
    - also crashes KVM via BMC
    - also disables all NVMe SSD drives
  - proper fix 1: add pci=nommconf to grub bootloader options list
    - in bootloader: E, add option, F10
      - linux /boot/vmlinuz-x.y.z-... ro quiet splash pci=nommconf
    - make permanent when booted
      - vi /etc/default/grub
        
        add pci=nommconf to GRUB_CMDLINE_LINUX variable
      - sudo update-grub
      - reboot
  - proper fix 2: disable VGA header via physical switch
- still works with BMC disabled
Q-code display output does not seem to match with table in manual
BMC / IPMI
- typical server-level feature
- access
  - can only be accessed over ethernet (not WiFi) via one of the two ports
  - make sure to use HTTPS
  - check IP address in BIOS
    - typically https://192.168.0.x
  - user: admin
  - password: admin
- makes system take minutes to boot after complete power down (e.g., after unplugging)
  - much faster after regular shutdown and start
    - but still slow compared to regular desktop
  - fixed when BMC is disabled
- LEDs
  - stay on when system is off
  - green (blinking): BMC is up and running
  - orange: on iff new warning in system event log
    - possibly about fans with RPM below threshold
- control fan curves via web portal
  - or via BIOS (after firmware update)
  - (non-PWM?) fans will run at max speed when BMC is disabled
- built-in KVM
contains two small fans
bottom pins are oriented south instead of up
- pro: allows large GPU to hang off bottom edge of motherboard
- con: many cases have limited space near bottom to connect everything
WiFi
- 6, not 6E
- shark-shaped WiFi antenna is very unpractical
  - alternative: aftermarket antennas that attach directly to connectors
no thunderbolt header (as is typical in AMD builds)
sound when running Ubuntu
- chip: Realtek ALC4080
- back panel 3.5mm AUX jack works fine
- front audio requires recent driver
  - requires alsa-ucm-conf>=1.2.9
    - commit: https://github.com/alsa-project/alsa-ucm-conf/commit/c79e8c18c6d08e3298bb4073a5429c17c1ff2b7b
    - not in Ubuntu 22.04 LTS
    - OK in Ubuntu 23.10

Other

RAM

check QVL list of motherboard for compatibility
- mine insisted specifically on DDR4-3200 RAM
amount
- more is better (up to a point)
- at least 20% more than total VRAM
type: DDR4 (cheaper) or DDR5
form factor
- DIMM (desktop)
- make sure they fit under the CPU cooler
speed, timings, latency: not important
overclocking profiles (Intel XMP/AMD EXPO): not important
heatsink: not important
mostly works best with two modules in dual channel mode
ECC
- nice to have
- more expensive
- more difficult to find
warranty: lifetime

Storage

main SSD
- type: NVMe M.2 SSD
- size: 1TB+
  - models take up a lot of room
- PCIe
  - typically uses x4 lanes
  - ideally directly connected to CPU instead of via motherboard chipset
  - both Gen 4 and Gen 5 options are available
- no need for heatsink if you motherboard already has one
- warranty (5+ years)
optional 5400RPM HDD(s) for cheap extra storage

PSU

must haves
- power rating
  - rule of thumb: $1.2 \times (TDP_{CPU} + \sum_i TDP_{GPU_i})$
- right type and amount of connectors
  - ATX24 for motherboard
  - EPS for CPU
  - PCIe depending on motherboard, GPU and other components
  - warning: never daisy chain GPUs
nice to haves
- 80+ efficiency rating (gold < platinum < titanium)
  - note: small percentual differences become relevant when using lots of power
- 12Vhpwr connector
  - supports up to 600W
  - plug these in properly, or you risk melting the plug
- modular design
- silent
- warranty (10+ years)

Case

volume
- for aircooled multi-GPU setup, disregard any case with a volume below 60 liters
constraints
- supports motherboard form factor
- CPU cooler height
- GPU length
- PSU length
nice to haves
- dust filters
- cable management options
- easy to open
- built-in GPU brace(s)

I realized fairly late in the process that my motherboard has an unusual form factor: EEB (12.2" x 13") instead of the far more common ATX (12" x 9.6"). This severely limited the number of compatible cases I could choose from. Even cases that officially claimed to support EEB form factors had some caveats. For example, because of downward-facing connectors on the bottom edge of the motherboard, I had to make sure I had spare room in that area to be able to connect all cables. Note that this extra space is useful regardless if you plan on installing a large GPU in the bottom PCIe slot. Furthermore, the standard cable management holes in the backplate of many cases get covered by the much wider motherboard. This results in some unconventional cable management practices. If I would do this build over again, I would put a much stronger emphasis on selecting a standard ATX motherboard.

Some feasible options:

Corsair 7000D Airflow (very tight, not recommended)
Fractal Design Define 7 XL
Fractal Design Meshify 2 XL
Lian Li O11 Dynamic XL
Phanteks Enthoo Pro 2

Cooling fans

Ventilation is very important is a high-powered system, especially if the goal is to sustain long duration workloads. Do not cheap out on fans after building a $2000+ computer.

case should have positive pressure ( $Q_{in} \geq Q_{out}$ )
size
- use 120mm or 140mm fans
- not 200mm, they fail first due to high torque
RPM trade-off
- high RPM = more airflow
- low RPM = less noise
purpose
- static pressure: for radiators, meshes, filters
- airflow: elsewhere
- hybrid
connector
- 3 pin (voltage regulated)
- 4 pin (PWM regulated, better)
bearings
- fluid
  - cheap
  - mineral oil for lubrication
  - dust sensitive
  - go bad after a while
  - sensitive to orientation (avoid horizontal)
- ball
  - expensive
  - long lasting
  - more noisy
  - ideal for servers
  - any orientation
- sleeve
  - hybrid between ball and fluid bearing
  - closed system
  - prefers horizontal orientation
- rifle
  - like sleeve
  - with Archimedes screw
    - prefers horizontal orientation
  - used in be quiet! fans
- magnetic / maglev
  - lowest noise
  - expensive
  - any orientation
fan orientation
- vertical: fails first
- horizontal
recommendations
- Noctua
  - favorite among enthusiasts
    - good airflow
    - low noise
    - long lifespan
  - very expensive compared to competition
  - 120mm, all round
    - https://noctua.at/en/nf-a12x25-pwm
    - https://noctua.at/en/products/fan/nf-a12x25-pwm-chromax-black-swap
  - 140mm, all round
    - https://noctua.at/en/nf-a14-pwm
    - https://noctua.at/en/nf-a14-pwm-chromax-black-swap
- Corsair ML140 (maglev)

PTI + DragGAN

vhsven

2023-07-15

I came across a tool called DragGAN this weekend. Although GANs are somewhat outdated, the fun example videos triggered me to play with the technique for a bit. Running the provided demos is very easy in Google Colab. The only hiccup I experienced was that I had to manually upload the StyleGAN-Human model to Colab to add it to the GUI list. It is not included in the original download script.

DragGAN
- edit StyleGAN images by "dragging" points from 1 spot to another
- https://arxiv.org/abs/2305.10973
- https://github.com/XingangPan/DragGAN
Pivotal Tuning Inversion (PTI)
- enables StyleGAN editing on non-GAN-generated images
- https://arxiv.org/abs/2106.05744
- https://github.com/danielroich/PTI

The DragGAN tutorial suggests using the PTI technique to use it own custom images. There are however no detailed instructions on how to combine the two techniques and pass the correct information between them. This notebook shows how it can be done. It can run in Google Colab on a T4 GPU.

Note that the basemodel we use here is stylegan2_ada_ffhq which has been trained on Flickr Faces HD (FFHD). As such, it will only work on pictures of faces.

In [1]:

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

class Downloader(object):
    def __init__(self, use_pydrive):
        self.use_pydrive = use_pydrive

        if self.use_pydrive:
            self.authenticate()

    def authenticate(self):
        auth.authenticate_user()
        gauth = GoogleAuth()
        gauth.credentials = GoogleCredentials.get_application_default()
        self.drive = GoogleDrive(gauth)

    def download_file(self, file_id, file_dst):
        if self.use_pydrive:
            downloaded = self.drive.CreateFile({'id':file_id})
            downloaded.FetchMetadata(fetch_all=True)
            downloaded.GetContentFile(file_dst)
        else:
            !gdown --id $file_id -O $file_dst

downloader = Downloader(True)

Step 1 - Install Packages required by PTI¶

In [ ]:

!pip install lpips wandb

# used for faster inference of StyleGAN by enabling C++ code compilation
!wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
!sudo unzip ninja-linux.zip -d /usr/local/bin/
!sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force

Step 2 - Download Pretrained models¶

In [ ]:

!git clone https://github.com/XingangPan/DragGAN.git

In [ ]:

!git clone https://github.com/danielroich/PTI.git
%cd /content/PTI/
!git checkout da94d59d15d94822e95840ab5a0aa9ba1a19c851

In [10]:

import os
image_dir_name = 'image'
os.makedirs(f'./{image_dir_name}_original', exist_ok=True)
os.makedirs(f'./{image_dir_name}_processed', exist_ok=True)
save_path = "pretrained_models"
os.makedirs(save_path, exist_ok=True)

In [11]:

downloader.download_file("125OG7SMkXI-Kf2aqiwLLHyCvSW-gZk3M", os.path.join(save_path, 'ffhq.pkl'))

In [12]:

downloader.download_file("1xPmn19T6Bdd-_RfCVlgNBbfYoh1muYxR", os.path.join(save_path, 'align.dat'))

Step 3 - Configuration Setup¶

In [19]:

import sys
import pickle
import torch
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from IPython.display import display

from configs import paths_config, hyperparameters, global_config
from utils.align_data import pre_process_images
from scripts.run_pti import run_PTI

image_name = 'personal_image'
global_config.device = 'cuda'
paths_config.e4e = '/content/PTI/pretrained_models/e4e_ffhq_encode.pt'
paths_config.input_data_id = image_dir_name
paths_config.input_data_path = f'/content/PTI/{image_dir_name}_processed'
paths_config.stylegan2_ada_ffhq = '/content/PTI/pretrained_models/ffhq.pkl'
paths_config.checkpoints_dir = '/content/PTI/'
paths_config.style_clip_pretrained_mappers = '/content/PTI/pretrained_models'
hyperparameters.use_locality_regularization = False

Step 4 - Preproccess Data¶

TODO: upload a picture to /content/PTI/image_original/personal_image.jpg

In [ ]:

original_image = Image.open(f'./{image_dir_name}_original/{image_name}.jpg')

In [ ]:

pre_process_images(f'/content/PTI/{image_dir_name}_original')

Step 5 - Invert images using PTI¶

In order to run PTI and use StyleGAN2-ada, the cwd should the parent of 'torch_utils' and 'dnnlib'

In [ ]:

model_id = run_PTI(use_wandb=False, use_multi_id_training=False)

Visualize results¶

In [26]:

def load_generators(model_id, image_name):
  with open(paths_config.stylegan2_ada_ffhq, 'rb') as f:
    d = pickle.load(f)
    old_G = d['G_ema'].cuda()
    old_D = d['D'].cuda()

  with open(f'{paths_config.checkpoints_dir}/model_{model_id}_{image_name}.pt', 'rb') as f_new:
    new_G = torch.load(f_new).cuda()

  return old_G, old_D, new_G

In [27]:

old_G, old_D, new_G = load_generators(model_id, image_name)

In [28]:

# def plot_syn_images(syn_images):
#   for img in syn_images:
#       img = (img.permute(0, 2, 3, 1) * 127.5 + 128).clamp(0, 255).to(torch.uint8).detach().cpu().numpy()[0]
#       plt.axis('off')
#       resized_image = Image.fromarray(img,mode='RGB').resize((256,256))
#       display(resized_image)
#       del img
#       del resized_image
#       torch.cuda.empty_cache()

In [29]:

w_pivot_path = f'{paths_config.embedding_base_dir}/{paths_config.input_data_id}/{paths_config.pti_results_keyword}/{image_name}/0.pt'
# w_pivot = torch.load(w_pivot_path)

# old_image = old_G.synthesis(w_pivot, noise_mode='const', force_fp32 = True)
# new_image = new_G.synthesis(w_pivot, noise_mode='const', force_fp32 = True)

# print('Upper image is the inversion before Pivotal Tuning and the lower image is the product of pivotal tuning')
# plot_syn_images([old_image, new_image])

Export¶

In [31]:

def export_updated_pickle(old_G, old_D, new_G, output_path):
  tmp = {}
  tmp['G'] = old_G.eval().requires_grad_(False).cpu()
  tmp['G_ema'] = new_G.eval().requires_grad_(False).cpu()
  tmp['D'] = old_D.eval().requires_grad_(False).cpu()
  tmp['training_set_kwargs'] = None
  tmp['augment_pipe'] = None

  with open(output_path, 'wb') as f:
      pickle.dump(tmp, f)

output_path = f'{paths_config.checkpoints_dir}/stylegan2_{image_name}.pkl'
export_updated_pickle(old_G, old_D, new_G, output_path)

In [32]:

import locale
locale.getpreferredencoding = lambda: "UTF-8"

!mkdir -p /content/DragGAN/checkpoints
!cp $output_path /content/DragGAN/checkpoints
!cp $w_pivot_path /content/DragGAN/checkpoints

DragGAN¶

In [ ]:

%cd /content/DragGAN
!git checkout c5e88b3eaf64c33a9e82782d75b4329d16711c3a

In [ ]:

!pip install -r requirements.txt

In [35]:

# !python scripts/download_model.py

Fix some errors in python scripts:

use our custom w_pivot from PTI
set the default model in the GUI to our own
bypass the watermark due to a font issue

In [36]:

!sed -i 's#None.*w_load#torch.load("/content/DragGAN/checkpoints/0.pt"),#' /content/DragGAN/visualizer_drag_gradio.py
!sed -i 's/stylegan2_lions_512_pytorch/stylegan2_personal_image/' /content/DragGAN/visualizer_drag_gradio.py
!sed -i 's/d = ImageDraw/return input_image_array  # d = ImageDraw/' /content/DragGAN/viz/renderer.py

In [ ]:

!python /content/DragGAN/visualizer_drag_gradio.py

StarCoder (WIP)

vhsven

2023-05-14

Intro

open source code autocomplete model
- cf. OpenAI Codex, GitHub Copilot, DeepMind AlphaCode
https://github.com/bigcode-project/starcoder
https://huggingface.co/spaces/HuggingFaceH4/starchat-playground
integrates with VSCode
- https://marketplace.visualstudio.com/items?itemName=HuggingFace.huggingface-vscode
goals
- run model in own AWS account instead of on HuggingFace
- let VSCode extension use own custom endpoint
- figure out finetuning

How to set up starcoder in AWS

create S3 bucket
create policy that allows read/write to that bucket
create EC2 role containing that policy
start a new EC2 instance
- TODO select right instance type
- t2.micro for now to set up S3 properly
- use newly created IAM role
sudo yum install git
Amazon Linux 2023 does not support git-lfs out of the box, workaround:
- curl -LO https://github.com/git-lfs/git-lfs/releases/download/v3.3.0/git-lfs-linux-amd64-v3.3.0.tar.gz
- tar xvfz git-lfs-linux-amd64-v3.3.0.tar.gz
- sudo ./install.sh instead of git lfs install
- git lfs version
git clone https://huggingface.co/bigcode/starcoder
- takes a while, needs to download 65GB
cd starcoder
TODO save to S3
don't forget to stop the instance when you're done

Local, out of the box usage

conda create -n starcoder python=3.11
conda activate starcoder
git clone https://github.com/bigcode-project/starcoder.git
cd starcoder
pip install -r requirements.txt
set ENV HUGGING_FACE_HUB_TOKEN

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
checkpoint = "bigcode/starcoder"

model = AutoModelForCausalLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
print( pipe("def hello():") )

What's new in Generative AI land

vhsven

2023-05-05

Regularly updated sources

Big tech news

Adobe

2023-03-21 🧨 https://firefly.adobe.com/
2023-05-23 🧨 https://www.adobe.com/products/photoshop/generative-fill.html
2023-10-10 🧨 Firefly 2

AI21 labs

Jurassic models
2024-03-28 https://www.ai21.com/jamba

Amazon

Andrej Karpathy / Eureka Labs

2015-05-21 https://karpathy.github.io/2015/05/21/rnn-effectiveness/
- https://github.com/karpathy/char-rnn
2022-08-16 https://karpathy.ai/zero-to-hero.html
- https://github.com/karpathy/micrograd
- https://github.com/karpathy/nanoGPT
2023-04-09 BabyGPT
2023-05-23 State of GPT
2023-07-23 https://github.com/karpathy/llama2.c
2023-11-23 1h intro to LLMs
2024-02-20 Let's build the GPT Tokenizer
2024-02-20 https://github.com/karpathy/minbpe
- https://www.youtube.com/watch?v=zduSFxRajkE
https://github.com/karpathy/llm.c
2024-07-11 train GPT-2 for $672
2024-07-16 Announcing Eureka Labs
- https://github.com/karpathy/LLM101n

Anthropic

2023-03-14 https://www.anthropic.com/index/introducing-claude
2023-05-11 https://www.anthropic.com/index/100k-context-windows
2023-07-11 https://www.anthropic.com/index/claude-2
2023-09-25 https://www.anthropic.com/index/anthropic-amazon
2023-10-27 https://www.wsj.com/tech/ai/google-commits-2-billion-in-funding-to-ai-startup-anthropic-db4d4c50
2023-11-21 https://www.anthropic.com/index/claude-2-1
2024-03-04 https://www.anthropic.com/news/claude-3-family
- Haiku
- Sonnet
- Opus
2024-03-13 https://www.anthropic.com/news/claude-3-haiku
2024-03-27 https://www.aboutamazon.com/news/company-news/amazon-anthropic-ai-investment
https://docs.anthropic.com/en/prompt-library/library
https://github.com/anthropics/anthropic-cookbook
2024-06-21 https://www.anthropic.com/news/claude-3-5-sonnet
2024-07-16 https://www.anthropic.com/news/android-app

Apple

Lightning AI

Microsoft

2021-06-17 https://github.com/microsoft/LoRA
2022-02-23 🖥️ https://github.blog/news-insights/product-news/introducing-github-copilot-ai-pair-programmer/
2023-02-07 New Bing
2023-03-16 https://www.microsoft.com/en-us/microsoft-365/blog/2023/03/16/introducing-microsoft-365-copilot-a-whole-new-way-to-work/
2023-03-22 🖥️ https://github.blog/2023-03-22-github-copilot-x-the-ai-powered-developer-experience/
2023-03-28 https://blogs.microsoft.com/blog/2023/03/28/introducing-microsoft-security-copilot-empowering-defenders-at-the-speed-of-ai/
2023-03-21 Bing Image Creator
2023-07-18 https://www.microsoft.com/en-us/microsoft-365/blog/2023/07/18/introducing-bing-chat-enterprise-microsoft-365-copilot-pricing-and-microsoft-sales-copilot/
2023-07-20 https://github.com/microsoft/promptflow
2023-09-07 https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/
2023-09-20 🖥️ https://github.blog/2023-09-20-github-copilot-chat-beta-now-available-for-all-individuals/
2023-12-12 https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/
2023-12-29 🖥️ https://github.blog/2023-12-29-github-copilot-chat-now-generally-available-for-organizations-and-individuals/
2024-01-15 https://blogs.microsoft.com/blog/2024/01/15/bringing-the-full-power-of-copilot-to-more-people-and-businesses/

Midjourney

2023-12-21 🧨 https://mid-journey.ai/midjourney-v6-release/

Mistral

2023-09-27 https://mistral.ai/news/announcing-mistral-7b/
2023-12-11 https://mistral.ai/news/mixtral-of-experts/
2023-12-11 https://mistral.ai/news/la-plateforme/
2024-02-26 https://mistral.ai/news/mistral-large/
2024-04-17 https://mistral.ai/news/mixtral-8x22b/
2024-05-29 https://mistral.ai/news/codestral/
2024-07-16 https://mistral.ai/news/codestral-mamba/
2024-07-16 https://mistral.ai/news/mathstral/
2024-07-18 https://mistral.ai/news/mistral-nemo/
- https://colab.research.google.com/drive/17d3U-CAIwzmbDRqbZ9NnpHxCkmXB6LZ0
2024-07-24 https://mistral.ai/news/mistral-large-2407/

NVIDIA

Ollama

2023-07-08 https://github.com/jmorganca/ollama
- 2024-01-23 https://ollama.ai/blog/python-javascript-libraries
- https://github.com/ollama/ollama
2024-02-08 https://ollama.com/blog/openai-compatibility
2024-02-15 https://ollama.com/blog/windows-preview
2024-03-14 https://ollama.com/blog/amd-preview

OpenAI

models
- GPT
- GPT 2
- GPT 3
- GPT 3.5
- GPT 3.5 Turbo
- GPT 4
- GPT 4 Turbo
- GPT 4o
- GPT 4o mini
- o1
- o1 mini
- GPT 4.5
- o3 mini
- GPT 4.1
- GPT 4.1 mini
- GPT 4.1 nano
- o3
- o4 mini
2021-07-27 https://openai.com/research/triton
2023-01-31 https://openai.com/blog/new-ai-classifier-for-indicating-ai-written-text
- 2023-07-20 offline due to low accuracy
2023-03-02 ChatGPT political bias
2023-03-14 https://openai.com/index/gpt-4-research/
2023-04-17 https://www.wired.com/story/openai-ceo-sam-altman-the-age-of-giant-ai-models-is-already-over/
2023-04-25 https://openai.com/blog/new-ways-to-manage-your-data-in-chatgpt
2023-05-09 https://openai.com/research/language-models-can-explain-neurons-in-language-models
2023-05-18 https://apps.apple.com/app/openai-chatgpt/id6448311069
2023-05-31 https://openai.com/research/improving-mathematical-reasoning-with-process-supervision
2023-06-11 https://openai.com/blog/function-calling-and-other-api-updates
2023-07-06 https://openai.com/blog/gpt-4-api-general-availability
2023-07-06 ChatGPT Code interpreter
2023-07-11 https://investor.shutterstock.com/news-releases/news-release-details/shutterstock-expands-partnership-openai-signs-new-six-year
2023-07-18 How is ChatGPT's behavior changing over time?
2023-07-20 https://openai.com/blog/custom-instructions-for-chatgpt
2023-07-25 https://play.google.com/store/apps/details?id=com.openai.chatgpt
2023-07-26 https://openai.com/blog/frontier-model-forum
- Anthropic, Google, Microsoft, and OpenAI
2023-08-28 https://openai.com/blog/introducing-chatgpt-enterprise
2023-08-31 https://openai.com/blog/teaching-with-ai
2023-09-25 https://openai.com/blog/chatgpt-can-now-see-hear-and-speak
- https://cdn.openai.com/papers/GPTV_System_Card.pdf
  - TODO add to models
2023-09-27 ChatGPT can now browse the internet
2023-10-03 https://cdn.openai.com/papers/DALL_E_3_System_Card.pdf
- https://cdn.openai.com/papers/dall-e-3.pdf
- TODO add to models
2023-10-11 ChatGPT system messages
2023-10-19 https://openai.com/blog/dall-e-3-is-now-available-in-chatgpt-plus-and-enterprise
2023-10-29 ChatGPT PDF support preview
2023-11-06 https://openai.com/blog/new-models-and-developer-products-announced-at-devday
- GPT-4 Turbo
- GPT-4 Turbo with Vision
- GPT-4 fine tuning
- Assistants API
- DALL-E 3 API
- Whisper v3
- Copyright Shield
2023-11-06 https://openai.com/blog/introducing-gpts
2023-11-17 https://openai.com/blog/openai-announces-leadership-transition
2023-11-21 ChatGPT with voice public availability
2023-11-29 https://openai.com/blog/sam-altman-returns-as-ceo-openai-has-a-new-initial-board
https://platform.openai.com/docs/guides/prompt-engineering/strategy-use-external-tools
2024-01-10 https://openai.com/blog/introducing-chatgpt-team
2024-01-25 https://openai.com/blog/new-embedding-models-and-api-updates
2024-02-13 https://openai.com/blog/memory-and-new-controls-for-chatgpt
2024-02-15 📽️ https://openai.com/research/video-generation-models-as-world-simulators
- https://openai.com/sora
2024-03-12 https://github.com/openai/transformer-debugger
2024-03-29 https://openai.com/index/navigating-the-challenges-and-opportunities-of-synthetic-voices/
2024-04-01 https://openai.com/index/start-using-chatgpt-instantly/
2024-04-03 🧨 edit DALL·E images in ChatGPT
2024-04-04 https://openai.com/index/introducing-improvements-to-the-fine-tuning-api-and-expanding-our-custom-models-program/
2024-05-08 https://openai.com/index/introducing-the-model-spec/
2024-05-13 https://openai.com/index/hello-gpt-4o/
2024-07-18 https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/
2024-07-25 https://openai.com/index/searchgpt-prototype/
2024-09-12 https://openai.com/index/openai-o1-mini-advancing-cost-efficient-reasoning/
2024-09-12 https://openai.com/index/learning-to-reason-with-llms/
2024-12-05 https://openai.com/index/introducing-chatgpt-pro/
2024-12-05 https://openai.com/index/openai-o1-system-card/
2024-12-18 https://help.openai.com/en/articles/10193193-1-800-chatgpt-calling-and-messaging-chatgpt-with-your-phone
2025-01-31 https://openai.com/index/openai-o3-mini/
2025-02-02 https://openai.com/index/introducing-deep-research/
2025-02-23 https://openai.com/index/introducing-operator/
2025-02-27 https://openai.com/index/introducing-gpt-4-5/
2025-02-27 https://openai.com/index/gpt-4-5-system-card/
2025-03-25 https://openai.com/index/introducing-4o-image-generation/
2025-04-14 https://openai.com/index/gpt-4-1/
2025-04-16 https://openai.com/index/introducing-o3-and-o4-mini/

Runway ML

Stability AI

🧨 https://clipdrop.co
- 2023-03-17 https://stability.ai/blog/stable-diffusion-reimagine
- 2023-05-25 https://stability.ai/blog/stability-ai-clipdrop-launches-reimagine-xl
- 2023-06-08 https://stability.ai/blog/clipdrop-launches-uncrop-the-ultimate-aspect-ratio-editor
- 2023-07-13 https://stability.ai/blog/clipdrop-launches-stable-doodle
2023-05-11 🧨 https://stability.ai/blog/stable-animation-sdk
🧨 https://dreamstudio.ai
- 2023-05-17 https://github.com/Stability-AI/StableStudio
2023-08-11 https://research.stability.ai/chat
2023-11-01 https://stability.ai/news/stability-ai-enhanced-image-apis-for-business-features
2023-11-21 https://stability.ai/news/stable-video-diffusion-open-ai-video-model
2023-11-28 https://stability.ai/news/stability-ai-sdxl-turbo
2023-12-07 https://stability.ai/news/stablelm-zephyr-3b-stability-llm
2024-01-16 https://stability.ai/news/stable-code-2024-llm-code-completion-release
2024-02-12 https://stability.ai/news/introducing-stable-cascade
2024-02-22 🧨 https://stability.ai/news/stable-diffusion-3
2024-03-13 https://stability.ai/news/celebrating-one-year-of-medarc
2024-03-18 https://stability.ai/news/introducing-stable-video-3d
2024-03-23 https://stability.ai/news/stabilityai-announcement
2024-03-25 https://stability.ai/news/introducing-stable-code-instruct-3b
2024-04-03 🎧 https://stability.ai/news/stable-audio-2-0
- https://stability-ai.github.io/stable-audio-demo/?
2024-04-08 https://stability.ai/news/introducing-stable-lm-2-12b
2024-04-17 🧨 https://stability.ai/news/stable-diffusion-3-api
2024-06-05 🎧 https://stability.ai/news/introducing-stable-audio-open
2024-06-12 🧨 https://stability.ai/news/stable-diffusion-3-medium
2024-08-01 https://stability.ai/news/introducing-stable-fast-3d

xAI

2023-07-12 xAI Grok
- https://github.com/xai-org/grok-1
- weights
2023-11-06 https://x.ai/prompt-ide/

Other news

2019-10-18 https://thegradient.pub/understanding-evaluation-metrics-for-language-models/
2020-07-28 https://dugas.ch/artificial_curiosity/GPT_architecture.html
2022-03-30 https://kipp.ly/transformer-param-count
2023-02-14 https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
2023-02-21 https://www.pinecone.io/learn/langchain/
2023-03-02 https://www.activeloop.ai/resources/ultimate-guide-to-lang-chain-deep-lake-build-chat-gpt-to-answer-questions-on-your-financial-data/
2023-03-04 https://confusedbit.dev/posts/how_does_gpt_work/
2023-03-29 https://github.com/xtekky/gpt4free
2023-03-29 https://ai.v-gar.de/ml/transformer/timeline/
- TODO update csv
2023-03-30 https://kipp.ly/transformer-taxonomy
2023-04-05 ChaosGPT
2023-04-10 https://www.izzy.co/blogs/robo-boys.html
2023-04-15 https://www.activeloop.ai/resources/lang-chain-gpt-4-for-code-understanding-twitter-algorithm/
2023-04-16 🧨 https://www.shruggingface.com/blog/how-i-used-stable-diffusion-and-dreambooth-to-create-a-painted-portrait-of-my-dog
2023-04-19 https://www.similarweb.com/blog/insights/ai-news/stack-overflow-chatgpt/
2023-04-20 https://github.com/Vision-CAIR/MiniGPT-4
2023-04-21 https://github.com/brexhq/prompt-engineering
2023-04-21 https://gist.github.com/timesler/4b244a6b73d6e02d17fd220fd92dfaec
2023-04-22 https://magazine.sebastianraschka.com/p/finetuning-large-language-models
2023-04-28 https://github.com/jostmey/NakedAttention
2023-04-29 https://thegradient.pub/in-context-learning-in-context/
2023-04-30 https://github.com/Mooler0410/LLMsPracticalGuide
2023-04-30 https://agi-sphere.com/llama-models/
2023-04-30 https://www.wired.com/story/how-chatgpt-works-large-language-model/
2023-05-01 https://www.kdnuggets.com/2023/05/machine-learning-chatgpt-cheat-sheet.html
2023-05-02 https://www.philschmid.de/sagemaker-fsdp-gpt
2023-05-03 https://www.assemblyai.com/blog/the-full-story-of-large-language-models-and-rlhf/
2023-05-05 https://www.pinecone.io/learn/vector-database
2023-05-09 https://newsroom.ibm.com/2023-05-09-IBM-Unveils-the-Watsonx-Platform-to-Power-Next-Generation-Foundation-Models-for-Business
2023-05-12 https://www.kdnuggets.com/2023/05/8-free-ai-llms-playgrounds.html
2023-05-15 https://erichartford.com/uncensored-models
2023-05-15 https://blog.gopenai.com/how-to-speed-up-llms-and-use-100k-context-window-all-tricks-in-one-place-ffd40577b4c
2023-05-17 https://github.com/ray-project/llm-numbers
2023-05-25 ✨ https://a16z.com/2023/05/25/ai-canon
2023-06-05 🧨 https://www.reddit.com/r/StableDiffusion/comments/141hg9x/controlnet_for_qr_code/
2023-06-12 https://www.evidentlyai.com/ml-system-design
2023-06-14 https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier
2023-06-15 https://crfm.stanford.edu/2023/06/15/eu-ai-act.html
2023-06-20 https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
2023-06-27 https://mlops.community/unraveling-gpu-inference-costs-for-fine-tuned-open-source-models-v-s-closed-platforms/
2023-06-30 ✨ https://neptune.ai/blog/mlops-tools-platforms-landscape
2023-07-03 https://www.similarweb.com/blog/insights/ai-news/chatgpt-traffic-drops/
2023-07-09 https://blog.mithrilsecurity.io/poisongpt-how-we-hid-a-lobotomized-llm-on-hugging-face-to-spread-fake-news/
2023-07-10 https://www.semianalysis.com/p/gpt-4-architecture-infrastructure
2023-07-20 https://www.cursor.so/blog/llama-inference
2023-07-20 ✨ https://github.com/tikkuncreation/llama-2-resources
2023-07-22 https://replicate.com/blog/run-llama-locally
2023-07-23 https://willthompson.name/what-we-know-about-llms-primer
2023-07-27 https://stackoverflow.blog/2023/07/27/announcing-overflowai/
- TODO
2023-07-27 https://hacks.mozilla.org/2023/07/so-you-want-to-build-your-own-open-source-chatbot/
2023-07-27 https://llm-attacks.org
2023-07-30 https://betterprogramming.pub/frameworks-for-serving-llms-60b7f7b23407
2023-07-30 ✨ https://eugeneyan.com/writing/llm-patterns/
2023-07-31 https://arstechnica.com/science/2023/07/a-jargon-free-explanation-of-how-ai-large-language-models-work/
2023-08-08 https://www.trustwave.com/en-us/resources/blogs/spiderlabs-blog/wormgpt-and-fraudgpt-the-rise-of-malicious-llms/
2023-08-09 ✨ https://blog.briankitano.com/llama-from-scratch/
2023-08-11 https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehensive-case-study-for-tailoring-models-to-unique-applications
2023-08-24 https://scale.com/blog/open-ai-scale-partnership-gpt-3-5-fine-tuning
2023-09-06 raccoons.be - gen AI playbook
2023-09-12 Fine-tune your own Llama 2 to replace GPT-3.5/4
2023-09-13 https://www.anyscale.com/blog/a-comprehensive-guide-for-building-rag-based-llm-applications-part-1
2023-09-14 LLM agent survey
- https://github.com/WooooDyy/LLM-Agent-Paper-List
2023-09-18 https://www.anyscale.com/anyscale-launches-new-service-anyscale-endpoints-10x-more-cost-effective-for-most-popular-open-source-llms
2023-10-09 https://blog.replit.com/ai4all
2023-10-18 How ChatGPT Vision Works - YouTube
2023-11-08 Samsung Gauss
2023-11-16 https://pytorch.org/blog/accelerating-generative-ai
2023-11-29 https://blog.perplexity.ai/blog/introducing-pplx-online-llms
2023-11-30 https://pytorch.org/blog/accelerating-generative-ai-2
2023-12-09 https://www.europarl.europa.eu/news/en/press-room/20231206IPR15699/artificial-intelligence-act-deal-on-comprehensive-rules-for-trustworthy-ai
2024-01-03 https://pytorch.org/blog/accelerating-generative-ai-3
2024-02-01 https://shyam.blog/posts/beyond-self-attention/
2024-02-01 https://allenai.org/olmo
2024-02-19 Transformers demystified
https://www.chenyang.co/diffusion.html
2024-05-29 https://foojay.io/today/indexing-all-of-wikipedia-on-a-laptop/
2024-06-05 https://opening-up-chatgpt.github.io/
2024-06-08 https://applied-llms.org/
2024-06-13 https://huggingface.co/blog/mlabonne/abliteration
2024-06-19 https://epochai.org/data/notable-ai-models
https://situational-awareness.ai/
2024-06-25 https://imbue.com/research/70b-infrastructure/
https://fastvoiceagent.cerebrium.ai/
2024-08-09 LLM price-performance graph

Tools

https://onnx.ai/
🖥️ https://www.tabnine.com
https://www.baseten.co
http://vectors.nlpl.eu/explore/embeddings/en/
https://github.com/togethercomputer/OpenChatKit
https://github.com/streamlit/streamlit
2020-07-09 https://github.com/gradio-app/gradio
2022-05-27 https://github.com/Dao-AILab/flash-attention
- https://arxiv.org/abs/2205.14135
- https://arxiv.org/abs/2307.08691
2022-09-02 🧨 https://promptbase.com
2021-12-21 🧨 https://github.com/invoke-ai/InvokeAI
2022-09-12 🧨 https://github.com/brycedrennan/imaginAIry
2022-10-24 https://github.com/hwchase17/langchain
- https://www.langchain.com/langsmith
2022-12-21 🧨 https://github.com/oobabooga/text-generation-webui
2023-01-17 🧨 https://github.com/comfyanonymous/ComfyUI
2023-01-24 🧨 https://github.com/AUTOMATIC1111/stable-diffusion-webui
2023-01-25 🧨 https://www.shutterstock.com/ai-image-generator
- uses DALL-E
2023-01-29 https://github.com/arc53/DocsGPT
2023-03-13 https://github.com/ShreyaR/guardrails
2023-03-16 🖥️ https://github.com/TabbyML/tabby
2023-03-19 https://chat.lmsys.org
2023-03-30 https://github.com/Significant-Gravitas/Auto-GPT
🧨 https://github.com/apple/ml-stable-diffusion
2023-03-10 https://github.com/ggerganov/llama.cpp
- 2023-05-14 now supports partial GPU usage
- 2023-06-12 CUDA support
2023-03-14 https://github.com/getumbrel/llama-gpt
https://github.com/bigscience-workshop/petals
- distributed LLM finetuning and inference
https://github.com/jerryjliu/llama_index
https://github.com/jupyterlab/jupyter-ai
- Codex clone using https://huggingface.co/Salesforce/codegen-350M-multi
https://kubiya.ai
https://www.chatpdf.com
https://pdf.ai
2023-03-30 https://www.cursor.so
2023-04-18 https://github.com/h2oai/h2o-llmstudio
2023-04-19 ✨ https://github.com/smallcloudai/refact
- https://refact.ai
- https://marketplace.visualstudio.com/items?itemName=smallcloud.codify
https://github.com/paulpierre/RasaGPT
2023-04-29 https://github.com/mlc-ai/mlc-llm
2023-05-02 https://www.modular.com/mojo
2023-05-02 https://heypi.com
2023-05-02 https://github.com/imartinez/privateGPT
2023-05-06 https://github.com/nadermx/backgroundremover
2023-05-12 https://github.com/assafelovic/gpt-researcher
- https://app.tavily.com
https://github.com/LucienShui/huggingface-vscode-endpoint-server
https://gandalf.lakera.ai
https://faraday.dev
https://sourcegraph.com
2023-05-13 🧨 https://github.com/varunshenoy/opendream
2023-05-13 https://github.com/StanGirard/quivr
2023-05-24 https://github.com/artidoro/qlora
🖥️ https://github.com/continuedev/continue
2023-06-08 https://simonwillison.net/2023/Jun/8/gpt-tokenizers/
2023-06-11 https://github.com/AntonOsika/gpt-engineer
2023-06-20 https://github.com/embedchain/embedchain
2023-06-29 https://github.com/ShishirPatil/gorilla
2023-07-14 https://github.com/KillianLucas/open-interpreter
2023-07-18 https://github.com/Pythagora-io/gpt-pilot
2023-07-24 https://github.com/kuafuai/DevOpsGPT
2023-07-26 https://github.com/Alpha-VLLM/LLaMA2-Accessory
2023-08-10 https://github.com/modelscope/facechain
2023-08-27 https://github.com/turboderp/exllamav2
2023-09-07 https://labs.heygen.com/video-translate
https://www.trulens.org
https://www.phind.com/search?home=true
- https://www.phind.com/blog/introducing-phind-70b
2023-10-05 GenAI Stack - Docker
2023-10-20 https://neuralmagic.com/blog/building-sparse-llm-applications-on-cpus-with-langchain-and-deepsparse/
- https://github.com/neuralmagic/deepsparse
- people behind DeepSparse were also behind GPTQ
2023-11-01 🧨 https://lumalabs.ai/genie
2023-11-18 https://github.com/tldraw/make-real
https://www.perplexity.ai
2023-11-28 📽️ https://pika.art/launch
2023-11-29 https://hacks.mozilla.org/2023/11/introducing-llamafile/
https://github.com/ishan0102/vimGPT
https://github.com/BuilderIO/gpt-crawler
https://github.com/Vaibhavs10/insanely-fast-whisper
https://github.com/intel/intel-extension-for-transformers
https://github.com/coqui-ai/TTS
https://github.com/Niek/chatgpt-web
https://lmstudio.ai
https://www.suno.ai
https://github.com/imoneoi/openchat
2023-12-03 https://github.com/myshell-ai/OpenVoice
2023-12-16 https://github.com/SJTU-IPADS/PowerInfer
2023-12-19 🧨 https://github.com/cumulo-autumn/StreamDiffusion
https://github.com/dvmazur/mixtral-offloading
2024-01-15 🧨 https://github.com/InstantID/InstantID
https://jan.ai/
https://github.com/danswer-ai/danswer
https://github.com/KillianLucas/open-interpreter
2024-02-19 https://groq.com
2024-03-12 https://www.cognition-labs.com/introducing-devin
2024-03-28 https://github.com/jasonppy/VoiceCraft
🧨 https://github.com/lllyasviel/Fooocus
📽️ https://blog.lumalabs.ai/p/dream-machine
2024-07-11 https://tridao.me/blog/2024/flash3/
- https://pytorch.org/blog/flashattention-3/
https://github.com/saoudrizwan/claude-dev
https://github.com/OpenInterpreter/open-interpreter

Recent models

see recent GAI models

Catching up with the Deep Learning revolution

vhsven

2023-05-01

Timeline

(dates vary by source)
two main AI research directions
- symbolic AI
  - good match with classic CS
  - 1948 - 1966: summer 1
    - logic
      - logic programming
      - theorem provers
    - search
    - 1966: ELIZA chatbot
  - 1967 - 1977: winter 1
  - 1978 - 1987: summer 2
    - expert systems
  - 1988 - 1993: winter 2
  - 1993 - 2012
    - Hidden Markov Models (HMM)
    - Bayesian reasoning
    - decision trees
  - 2012 - present
    - deep learning (non-symbolic)
- neural networks
  - uses techniques typically found in electrical engineering
  - 1943 - 1969: wave 1
    - 1943: McCulloch-Pitts neuron, neural networks
      - neural networks are Turing complete
    - 1950: Turing: imitation game (Turing test)
    - 1950s: stochastic gradient descent
    - 1956: Darthmouth workshop
    - 1957: Rosenblatt perceptron
    - 1969: Minsky & Papert: Perceptrons book
      - pessimism caused decline in neural net funding
      - shift towards symbolic AI
  - 1969 - 1987: winter
    - 1982: Hopfield networks, a type of RNN
    - 1982: Self-organizing maps
    - 1986: Rumelhart & Hinton: backpropagation
  - 1987 - 2012: wave 2
    - 1987: Rumelhart: connectionism
      - shift from symbolic AI back to neural networks
      - hidden layers
      - sigmoid activation function
    - 1989: Lecun: handwritten digit recognition with CNNs and backpropagation
    - 1992: Vapnik: SVM kernel trick
    - 1997: LSTM
    - 1997: bidirectional RNNs
    - 1998: Lecun & Bengio: handwritten character recognition with CNNs
    - 2003: Bengio: deep learning for language modeling
    - 2006: Hinton: deep learning for handwritten digit recognition
  - 2012 - present: wave 3 - more compute and storage
    - 2012: AlexNet CNN wins ImageNet ILSVRC 2012
    - 2014: Goodfellow: GANs
    - 2016: Google DeepMind: AlphaGo
    - 2017: Google: transformers

Information sources

recent LLM surveys (early 2023)
books
newsletters
tutorials
- https://www.tensorflow.org/tutorials
- https://pytorch.org/tutorials
blogs
- GAFA
- other
courses
- classic ML
  - https://www.coursera.org/learn/machine-learning by Andrew Ng, Stanford
  - https://www.coursera.org/learn/ai-for-everyone by Andrew Ng for non technical people (2019)
  - https://app.ai-cursus.nl by The Netherlands
  - https://www.elementsofai.com by Finland (2019)
    - https://www.elementsofai.be
  - https://developers.google.com/machine-learning/crash-course
  - https://cloud.google.com/blog/topics/developers-practitioners/ai-all-humans-course-delight-and-inspire
    - internal Google course by Cassie Kozyrkov
  - https://aws.amazon.com/training/learning-paths/machine-learning
  - https://aws.amazon.com/machine-learning/mlu
  - https://www.gptandchill.ai/codingproblems
  - https://www.deep-ml.com/
- deep learning
podcasts
- https://www.deepmind.com/the-podcast
  - with Hannah Fry
- https://twimlai.com/podcast/twimlai
- https://lexfridman.com/podcast
- https://www.therobotbrains.ai
  - with Pieter Abbeel
conferences
- general
- computer vision
  - CVPR
  - ICCV
  - ECCV
visualizations / misc
implementations

Important people

Geoffrey Hinton (1947): Google Brain, 1/3 godfathers of AI, backpropagation
Yann LeCun (1960): FB, 1/3 godfathers of AI, CNN
Yoshua Bengio (1964): Deep Learning book, 1/3 godfathers of AI
Andrew Ng (1976): Google Brain, Baidu, Coursera, deeplearning.ai,
Ian Goodfellow (1986): Deep Learning book, Google Brain, OpenAI, Apple, GANs, supervised by Ng + Bengio
François Chollet: Google, Keras
Aaron Courville
Pieter Abbeel, prof EE/robotics/AI @ UC Berkeley
- ESAT at KUL
- PhD at Stanford under Andrew Ng
- podcast: The Robot Brains
Andrej Karpathy: Stanford, Tesla, OpenAI, Eureka Labs
Chip Huyen: Stanford, Claypot AI, Voltron Data
Ilya Sutskever: AlexNet, Google, OpenAI
Tim Dettmers: QLoRA, bitsandbytes, GPU comparison

Modalities

input
- text
  - code
- audio
  - speech / voice
- visual
  - image
  - video
output
- text
  - code
- audio
  - speech / voice
  - music
- actions
  - movement (robots)
  - tools/APIs (agents)

Glossary

AE: auto encoder
AI: artificial intelligence
ANN: artificial neural network
BERT: bidirectional encoder representations from transformers
BPE: byte pair encoding
CLIP: contrastive language-image pretraining
CNN: convolutional neural network
CoT: chain of thought
CPU: central processing unit
DBN: deep belief network
DL: deep learning
DNN: deep neural network
DRL: deep reinforcement learning
EM: expectation maximization
Flan: finetuned language model
FNN: feedforward neural network
GAN: generative adversarial network
GPT: generative pre-trained transformer
GPU: graphical processing unit
HF: HuggingFace
LiT: locked image tuning
LLM: large language model
LoRA: low-rank adaptation
LSTM: long short term memory
ML: machine learning
MLP: multilayer perceptron
MoE: mixture of experts
MP: max pooling
NLG: natural language generation
NLP: natural language processing
NLU: natural language understanding
PEFT: parameter-efficient fine-tuning
RAG: retrieval-augmented generation
RBM: restricted Boltzmann machine
ReLU: rectified linear unit
RL: reinforcement learning
RNN: recurrent neural network
SFT: supervised finetuning
SGD: stochastic gradient descent
SL: supervised learning
SOTA: state of the art
SSL: self-supervised learning
SVM: support vector machines
TPU: tensor processing unit
UL: unsupervised learning
VAE: variational auto encoder
ViT: vision transformer
VRAM: video RAM (i.e., the memory of the GPU)

Infrastructure

you will need one or more Nvidia GPUs
- with CUDA, Tensor Cores and cuDNN support
- overview of recent Nvidia GPU architectures:

Architecture	Desktop	Workstation	Datacenter
Pascal (2016)	GeForce GTX 10xx	Quadro P	Tesla P4 / Tesla P100
Volta (2017)	N/A	Quadro GV100	Tesla V100
Turing (2018)	GeForce RTX 20xx	Quadro RTX	Tesla T4
Ampere (2020)	GeForce RTX 30xx	RTX A series	A100
Ada (2022)	GeForce RTX 40xx	RTX 6000 Ada	N/A?
Hopper (2022)	N/A	N/A	H100, H200
Blackwell	GeForce RTX 50xx	?	B100, B200

Cloud environments

https://cloud-gpus.com/
✨ https://colab.research.google.com
- tiers
  - free tier
    - access to Standard GPUs
      - e.g., Telse T4 16GB
  - Colab Pro (~$10/month)
    - access to Premium GPUs
      - e.g., V100 16GB or A100 40GB (subject to availability)
    - access to High RAM environments

Accelerator	Standard RAM	High RAM*
None	12.7 GB	25.5 GB
Standard GPU	12.7 GB	25.5 GB
Premium GPU*	12.7 GB	25.5 GB
TPU	12.7 GB	35.2 GB

https://www.paperspace.com
https://lambdalabs.com
https://vast.ai
https://jarvislabs.ai
https://modal.com
https://replicate.com
https://www.latitude.sh/accelerate
https://huggingface.co
https://www.anyscale.com
https://www.beam.cloud
traditional cloud
- Amazon Web Services (AWS)
  - SageMaker
  - Deep Learning Containers
- Microsoft Azure
- Google Cloud Platform (GCP)
  - Vertex AI

Machine learning libraries

classic ML
deep learning
- https://github.com/explosion/spaCy
- https://github.com/facebookresearch/fairscale
- https://github.com/facebookresearch/xformers
- https://github.com/google/jax
- https://github.com/huggingface/accelerate
  - A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
- https://github.com/huggingface/diffusers
- https://github.com/huggingface/text-generation-inference
- https://github.com/huggingface/transformers
- https://github.com/huggingface/peft
- https://keras.io
- https://pytorch.org
- https://www.tensorflow.org
tokenizers
- https://github.com/google/sentencepiece
- https://github.com/openai/tiktoken
  - https://platform.openai.com/tokenizer
LLM inference runners
- https://github.com/huggingface/transformers
- https://github.com/ggerganov/llama.cpp
  - runs on CPU
  - requires models in GGUF format
- https://github.com/vllm-project/vllm
- https://www.nomic.ai/gpt4all
- https://ollama.com/
- https://github.com/turboderp/exllamav2
- https://github.com/NVIDIA/TensorRT-LLM
LLM frontends
- https://github.com/open-webui/open-webui
embeddings
- 2013-01-16 Google word2vec
- 2015-09-01 Stanford GloVe
- 2015-11-09 Facebook fastText
  - https://fasttext.cc
- 2019-08-27 Sentence-BERT
  - https://github.com/UKPLab/sentence-transformers
- 2022-12-15 OpenAI text-embedding-ada-002
- MTEB leaderboard
approximate nearest neighbors (ANN) using vector databases
- Locality Sensitive Hashing (LSH)
- Facebook AI Similarity Search (FAISS; 2017)
  - https://github.com/facebookresearch/faiss
- Hierarchical Navigable Small Worlds (HNSW)
- ScaNN (Google, 2019)
  - https://arxiv.org/abs/1908.10396
  - each addition requires rebuilding the index
- https://qdrant.tech
- https://www.trychroma.com
- https://www.pinecone.io
- https://weaviate.io
- https://lancedb.com
- https://vespa.ai
- https://github.com/pgvector/pgvector
- https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.vectorstores
finetuning
- 2019-02-02 adapters
  - 2023-03-28 https://github.com/OpenGVLab/LLaMA-Adapter
- 2021-01-01 Stanford prefix tuning
- 2021-04-18 Google soft prompt tuning
- 2021-06-17 https://github.com/microsoft/LoRA
  - https://arxiv.org/abs/2106.09685
- 2023-05-23 https://github.com/artidoro/qlora
- https://github.com/huggingface/peft

Datasets

catalogs
datasets
- text
  - CommonCrawl
  - Stanford Question Answering Dataset (SQuAD) (2016)
  - GLUE (2018)
  - OpenWebText (2019)
  - Colossal Clean Crawled Corpus (C4) (2019)
  - The Pile (2020)
  - The Flan Collection (2023)
  - MATH (2021)
  - wikipedia
  - arxiv
  - gutenberg
  - stackoverflow
  - stackexchange
  - github
  - reddit
  - twitter
  - IMDb
- images
  - MNIST (Modified National Institute of Standards and Technology) (1994)
    - 28x28 handwritten digits
    - train: 60 000
    - test: 10 000
  - CIFAR (2009)
    - used for training AlexNet
    - CIFAR-10
      - 10 classes of 6000 32x32 colour images each
    - CIFAR-100
      - 100 classes of 600 32x32 colour images each
  - ImageNet (2009)
  - Common Objects in Context (COCO; 2014)
  - CelebFaces Attributes (CelebA; 2015)
  - Flickr Faces (FFHQ; 2019)
  - Met Faces (2020)
  - Large-scale AI Open Network (LAION)
    - LAION-400M (2021)
    - LAION-5B (2022)
      - 5B image-text pairs

Model hubs

Model metrics and benchmarks

Vision models

outdated
- MNIST error rate
- ImageNet error rate
recent
- ...

Language models

note: evaluation often depends not only on eval metric, but also on specific implementation of that metric
perplexity
- "a measurement of how well a probability model predicts a sample"
- $\displaystyle PPL(X) = \exp \left\{ - \frac{1}{t} \sum_{i=0}^t \log p_\theta (x_i \mid x_{<i}) \right\}$
- lower is better
- best suited for classic/causal/autoregressive models
  - not masked models like BERT
Bilingual Evaluation Understudy (BLEU; 2002)
- precision-oriented
- popular for machine translation
- cost-effective
Recall-Oriented Understudy for Gisting Evaluation (ROUGE; 2004)
- recall-oriented
- popular for summarization
BERTScore (2019)
- cosine similarity based on embedding
  - accounts for synonyms, paraphrasing
MoverScore (2019)
Measuring Massive Multitask Language Understanding (MMLU; 2020)
- test to measure a text model’s multitask accuracy
- covers 57 tasks
  - elementary mathematics
  - US history
  - computer science
  - law
Multilingual Grade School Math (MGSM; 2022)
Holistic Evaluation of Language Models (HELM; 2022)
- 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency)
- 16 core scenarios
G-Eval (2023)
- use another LLM as evaluator
AI2 Reasoning Challenge
- set of grade-school science questions
HellaSwag
- test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models
TruthfulQA
- test to measure a model’s propensity to reproduce falsehoods commonly found online
EleutherAI Eval
AlpacaEval
- for instruction-following LLMs
HumanEval
Chatbot Arena

Misc

Recent Generative AI Models

vhsven

2023-05-01

This page was last updated on 2025-09-13.

New Table

Table 1 - Models
Model	Company	Date	Params	Paper	Source	Website	Weights	Remarks
Robotic Transformer 1	Google DeepMind	2022-12-13		link		link
Einstein GPT	Salesforce	2023-03-07				link		uses OpenAI API?
🧨 Stable UnCLIP 2.1	Stability AI	2023-03-24			link	link	link	model behind Reimagine
LLaVA	University of Wisconsin-Madison	2023-04-17		link	link	link	link	LLaVA = Large Language and Vision Assistant
WizardLM	Microsoft	2023-04-24	7B, 13B, 30B, 70B	link	link		link	based on llama
Eleven Multilingual v1	ElevenLabs	2023-04-27			link	link		English, French, German, Hindi, Italian, Polish, Portuguese, Spanish
PaLM 2	Google	2023-05-10		link		link
LIMA	Meta AI	2023-05-18	65B	link				based on llama
🔈Massive Multilingual Speech	Meta AI	2023-05-22	300M, 1B	link	link	link	link
Falcon	TII.AE	2023-05-26	1B, 7B, 40B	coming soon			link
AlphaDev	Google DeepMind	2023-06-07		link		link
🔈 StyleTTS 2	Columbia University	2023-07-13		link	link	link	link
WizardCoder	Microsoft	2023-06-14	15B	link	link		link
Llama 2	Meta AI	2023-07-18	7B, 13B, 70B	link	link	link link2	link
Meta-Transformer		2023-07-20	85M, 302M	link	link	link	link	12 modalities
Stable Beluga 2	Stability AI	2023-07-21	70B			link	link	based on llama 2
🧨 Stable Diffusion XL 1.0	Stability AI	2023-07-26	3.5B	link	link	link	base refiner
Robotic Transformer 2	Google DeepMind	2023-07-28		link		link
StableCode	Stability AI	2023-08-08	3B			link	base instruct
🔈 AudioSep - Separate Anything You Describe	Audio-AGI	2023-08-09		link	link	link	link
🔈 AudioLDM2	ByteDance	2023-08-10		link	link	link	link
🔈 Eleven Multilingual v2	ElevenLabs	2023-08-22			link	link		English, French, German, Hindi, Italian, Polish, Portuguese, Spanish, Chinese, Korean, Dutch, Turkish, Swedish, Indonesian, Filipino, Japanese, Ukrainian, Greek, Czech, Finnish, Romanian, Danish, Bulgarian, Malay, Slovak, Croatian, Classic Arabic, Tamil
SeamlessM4T	Meta AI	2023-08-22	1.2B, 2.3B	link	link	link	link
Code Llama	Meta AI	2023-08-24	7B, 13B, 34B	link	link	link	link
Nougat OCR	Meta AI	2023-08-25		link	link	link	link	Specialized in academic documents
Falcon 180B	TII	2023-09-06	180B	coming soon		link	link	see also: falcon-40b
Persimmon	Adept	2023-09-07	8B		link	link	link
🔈 StableAudio	Stability AI	2023-09-13				link
🧨 DALL-E 3	OpenAI	2023-09-21				link
📽️ LaVie	Shanghai Artificial Intelligence Laboratory	2023-09-26		link	link	link	link
Mistral-7B	Mistral AI	2023-09-27	7B		link	link	link
Qwen	Alibaba	2023-09-28	7B, 14B	link	link		link
LLaVA 1.5	University of Wisconsin-Madison	2023-10-05		link	link	link	link
jina-embeddings-v2	Jina AI	2023-10-25		link		link	link
Yi	01.ai	2023-11-02	6B, 34B			link	link
📽️ Emu Video	Meta AI	2023-11-16		link		link
📽️ Stable Video Diffusion	Stability AI	2023-11-21		link	link	link	link
Meditron	École Polytechnique Fédérale de Lausanne (EPFL)	2023-11-27	7B, 70B	link	link		link
🧨 SDXL Turbo	Stability AI	2023-11-28		link	link	link	link
📽️ Animate Anyone	Alibaba	2023-11-28		link	link	link
Seamless	Meta AI	2023-11-30		link	link	link	link
OpenVoice	MyShell.ai	2023-12-03	7B, 13B, 34B, 70B	link	link	link
Gemini	Google DeepMind	2023-12-06		link		link		nano / pro / ultra, pro will power Bard
AlphaCode 2	Google DeepMind	2023-12-06		link		link
Stable LM Zephyr 3B	Stability AI	2023-12-07	3B			link	link
Mistral 8x7B	Mistral AI	2023-12-11	45B	link		link	link
🧨 Imagen 2	Google DeepMind	2023-12-13				link
Stable Code 3B	Stability AI	2024-01-16	3B			link	link
Stable LM 2	Stability AI	2024-01-19	1.6B			link	link
Eagle 7B	RWKV	2024-01-29	7B			link	link	RWKV-v5 architecture
Code Llama 70B	Meta AI	2024-01-29	7B, 13B, 34B, 70B	link	link	link	link
MGIE	Apple	2024-02-05		link	link	link	link
Sora	OpenAI	2024-02-15		link		link
Gemma	Google	2024-02-21	2B, 7B	link	link	link	link 1 link 2
🧨 Stable Diffusion 3	Stability AI	2024-02-22 (preview)	0.8B, ..., 8B			link

Old Table

Table 2 - Models (old)
Model	Company	Date	Base Model	Parameters	Training Data Size	Training Time	Context length	Paper	Source	Website	Training data	Code License	Weights License	Type	Model weights	Instruction Tuning	RLHF	Remarks
Deep Blue	IBM	1996-01-01	from scratch	N/A				https://www.sciencedirect.com/science/article/pii/S0004370201001291	N/A	https://www.ibm.com/ibm/history/ibm100/us/en/icons/deepblue/				games				Chess
Watson	IBM	2011-01-01	from scratch	N/A				https://doi.org/10.1609/aimag.v31i3.2303	N/A					games				Jeopardy
AlexNet	Krizhevsky, G. Hinton	2012-09-30	from scratch	60M				https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf	https://github.com/dansuh17/alexnet-pytorch (clone)					vision				won ImageNet LSVRC 2012 challenge with 15.3%
word2vec	Google	2013-01-16						https://arxiv.org/abs/1301.3781								no	no
Inception v1	Google	2014-09-17	from scratch					https://arxiv.org/abs/1409.4842	https://github.com/google/deepdream					vision				won ImageNet LSVRC 2014 challenge with 6.7%
DQN	Google DeepMind	2015-02-25	from scratch					https://www.nature.com/articles/nature14236	https://github.com/deepmind/dqn					deep RL
char-rnn	Andrej Karpathy	2015-05-21	from scratch					https://karpathy.github.io/2015/05/21/rnn-effectiveness/	https://github.com/karpathy/char-rnn					language				Features on https://www.aiweirdness.com/
GloVe	Stanford	2015-09-01						https://nlp.stanford.edu/pubs/glove.pdf	https://github.com/stanfordnlp/GloVe	https://nlp.stanford.edu/projects/glove/		Apache 2.0	Apache 2.0		yes	no	no
fastText	Facebook	2015-11-09						https://arxiv.org/abs/1607.04606	https://github.com/facebookresearch/fastText	https://fasttext.cc/		MIT	MIT		yes	no	no
Inception v3	Google	2015-12-02						https://arxiv.org/abs/1512.00567						vision	https://huggingface.co/timm/inception_v3.tv_in1k
ResNet	Microsoft	2015-12-10	from scratch					https://arxiv.org/abs/1512.03385						vision				won ImageNet LSVRC 2015 challenge with 3.57%; "better than humans"
AlphaGo	Google DeepMind	2016-01-27	from scratch					https://www.nature.com/articles/nature16961						games
Inception v4	Google	2016-02-23												vision	https://huggingface.co/timm/inception_v4.tf_in1k
Tay	Microsoft	2016-03-23						N/A	N/A	https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/				chatbot
CycleGAN	UC Berkeley	2017-03-30						https://arxiv.org/abs/1703.10593	https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix					GAN	yes
AlphaGo Zero	Google DeepMind	2017-10-19						https://www.nature.com/articles/nature24270						games
AlphaZero	Google DeepMind	2017-12-05						https://arxiv.org/abs/1712.01815						games
ELMo (Embeddings from Language Models)	Allen Institute for AI	2018-02-15		180M				https://arxiv.org/abs/1802.05365						language	yes
GPT (Generative Pre-trained Transformer)	OpenAI	2018-06-11	from scratch	117M				https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf	https://github.com/openai/finetune-transformer-lm					transformer	yes	no	no
BERT (Bidirectional Encoder Representations from Transformers)	Google	2018-10-11		108M, 334M				https://arxiv.org/abs/1810.04805	https://github.com/google-research/bert					transformer	yes
StyleGAN	Nvidia	2018-12-12						https://arxiv.org/abs/1812.04948	https://github.com/NVlabs/stylegan					GAN	yes			https://thispersondoesnotexist.com
GPT2	OpenAI	2019-02-14		1.5B				https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf	https://github.com/openai/gpt-2					transformer	yes	no	no
XLNet	CMU & Google	2019-06-19		117M, 360M				https://arxiv.org/abs/1906.08237	https://github.com/zihangdai/xlnet				Apache 2.0		yes
RoBERTa	Meta AI	2019-07-26	BERT	354M				https://arxiv.org/abs/1907.11692						transformer	yes
ALBERT (A Lite BERT)	Google	2019-09-26	BERT	12M, 18M, 60M, 235M				https://arxiv.org/abs/1909.11942	https://github.com/google-research/ALBERT				Apache 2.0	transformer	yes
DistilBERT	HuggingFace	2019-10-02	BERT	66M				https://arxiv.org/abs/1910.01108	https://github.com/huggingface/transformers				Apache 2.0	transformer	yes
Text-to-Text Transfer Transformer (T5)	Google	2019-10-23	from scratch	11B	1T tokens			https://arxiv.org/abs/1910.10683	https://github.com/google-research/text-to-text-transfer-transformer	https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html		Apache 2.0	Apache 2.0	transformer	yes	no	no
AlphaFold	Google DeepMind	2020-01-15	from scratch					https://www.nature.com/articles/s41586-019-1923-7	https://github.com/deepmind/deepmind-research/tree/master/alphafold_casp13						yes
Turing NLG	Microsoft	2020-02-13		17B				N/A		https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/
ELECTRA	Stanford & Google	2020-03-23	BERT?	14M, 110M, 335M				https://arxiv.org/abs/2003.10555							yes
DeBERTa	Microsoft	2020-06-05	BERT					https://arxiv.org/abs/2006.03654	https://github.com/microsoft/DeBERTa				MIT	transformer	yes
GPT3	OpenAI	2020-06-11	from scratch	175B	300B tokens			https://arxiv.org/abs/2005.14165	/			private	private	transformer	no	no	no
ImageGPT	OpenAI	2020-06-17						https://cdn.openai.com/papers/Generative_Pretraining_from_Pixels_V2.pdf	https://github.com/openai/image-gpt			private	private	transformer	no
mT5	Google	2020-10-22	from scratch	300M - 13B	1T tokens			https://arxiv.org/abs/2010.11934	https://github.com/google-research/multilingual-t5		mC4	Apache 2.0	Apache 2.0	transformer	https://huggingface.co/google/mt5-base
DALL-E	OpenAI	2021-01-05	GPT-3	12B				https://arxiv.org/abs/2102.12092				private	private	transformer	no
DeBERTa V2	Microsoft	2021-02-03		900M - 1.5B				N/A						transformer	yes
CLIP	OpenAI	2021-02-26						https://arxiv.org/abs/2103.00020	https://github.com/OpenAI/CLIP	https://openai.com/research/clip		MIT			yes
GLM	Tsinghua University	2021-03-18		110M - 10B				https://arxiv.org/abs/2103.10360	https://github.com/THUDM/GLM					transformer	yes
GPT-Neo	EleutherAI	2021-03-21		125M, 1.3B, 2.7B				N/A	https://github.com/EleutherAI/gpt-neo	https://www.eleuther.ai/artifacts/gpt-neo			MIT	transformer	https://huggingface.co/EleutherAI/gpt-neo-1.3B
LaMDA	Google	2021-05-18	from scratch	137B	2.8T tokens	58d		https://arxiv.org/abs/2201.08239	N/A				N/A	transformer	no
GPT-J	EleutherAI	2021-06-09		6B					https://github.com/kingoflolz/mesh-transformer-jax		yes	Apache 2.0	Apache 2.0	transformer	https://huggingface.co/EleutherAI/gpt-j-6b	no	no
CPM-2	Tsinghua University	2021-06-20		11B				https://arxiv.org/abs/2106.10715	https://github.com/TsinghuaAI/CPM						yes
Copilot	GitHub	2021-06-29	OpenAI Codex						N/A				N/A	code	no
ERNIE 3.0	Baidu	2021-07-05		10B	375B tokens			https://arxiv.org/abs/2107.02137	N/A	http://research.baidu.com/Blog/index-view?id=160		N/A	N/A	transformer	no
AlphaFold 2	Google DeepMind	2021-07-15		21B				https://www.nature.com/articles/s41586-021-03819-2	https://github.com/deepmind/alphafold						yes
Jurassic-1	AI21 Labs	2021-08-01		178B	300B tokens			N/A	N/A	https://www.ai21.com/blog/announcing-ai21-studio-and-jurassic-1		N/A	N/A		no
Codex	OpenAI	2021-08-10	GPT3	12B	100B tokens			https://arxiv.org/abs/2107.03374	N/A	https://openai.com/blog/openai-codex		private	private	code	no
T0	BigScience	2021-10-15	T5	11B		27h		https://arxiv.org/abs/2110.08207	https://github.com/bigscience-workshop/t-zero			Apache 2.0	Apache 2.0	transformer	https://huggingface.co/bigscience/T0
DeBERTa V3	Microsoft	2021-11-18						https://arxiv.org/abs/2111.09543	https://github.com/microsoft/DeBERTa				MIT	transformer	yes
Gopher	Google DeepMind	2021-12-08	from scratch	280B	300B tokens	38d		https://arxiv.org/abs/2112.11446							no	no	no
GLaM (Generalist Language Model)	Google	2021-12-13	from scratch	1.2T	280T tokens	24d		https://arxiv.org/abs/2112.06905
WebGPT	OpenAI	2021-12-17	GPT 3	175B				https://arxiv.org/abs/2112.09332	N/A			private	private	transformer	no	no	yes
ClipSeg		2021-12-18						https://arxiv.org/abs/2112.10003	https://github.com/timojl/clipseg
InstructGPT	OpenAI	2022-01-27	GPT3	175B				https://arxiv.org/abs/2203.02155	N/A			private	private	transformer	no	yes	yes
Megatron-Turing (MT) NLG	Microsoft	2022-01-28		530B	270B tokens			https://arxiv.org/abs/2201.11990					N/A	transformer	no
AlphaCode	Google DeepMind	2022-02-02		0.3B,1B,3B,9B,41B	967B tokens			https://arxiv.org/abs/2203.07814		https://www.deepmind.com/blog/competitive-programming-with-alphacode			N/A	code	no
GPT3.5	OpenAI	2022-03-15		355B					N/A			private	private	transformer	no
Imagen	Google	2022-03-23						https://arxiv.org/abs/2205.11487		https://imagen.research.google/
CodeGen-Multi	Salesforce	2022-03-25		350M - 16B			2048	https://arxiv.org/abs/2203.13474v1						code	https://huggingface.co/Salesforce/codegen-350M-multi
Chinchilla	Google DeepMind	2022-03-29		70B	1.4T tokens			https://arxiv.org/abs/2203.15556	N/A	https://www.deepmind.com/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training			N/A		no
T5X	Google	2022-03-31						https://arxiv.org/abs/2203.17189	https://github.com/google-research/t5x					transformer
PaLM (Pathways Language Model)	Google	2022-04-04		8B, 62B, 540B	780B tokens			https://arxiv.org/abs/2204.02311		https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html		N/A	N/A	transformer	no
GPT-NeoX	EleutherAI	2022-04-14		20B	825GB			https://arxiv.org/abs/2204.06745	https://github.com/EleutherAI/gpt-neox			Apache 2.0	Apache 2.0	transformer	https://huggingface.co/EleutherAI/gpt-neox-20b	no	no
Tk-Instruct	Allen Institute for AI	2022-04-16	T5	3B, 11B		4h		https://arxiv.org/abs/2204.07705	https://github.com/yizhongw/Tk-Instruct			Apache 2.0			https://huggingface.co/allenai/tk-instruct-11b-def	yes
Flamingo	Google DeepMind	2022-04-29						https://arxiv.org/abs/2204.14198		https://www.deepmind.com/blog/tackling-multiple-tasks-with-a-single-visual-language-model			N/A		no
OPT	Meta AI	2022-05-03	from scratch	125M - 175B	180B tokens			https://arxiv.org/abs/2205.01068				MIT	NC research	transformer	https://huggingface.co/facebook/opt-30b	no	no
UL2	Google Brain	2022-05-10		20B	1T tokens			https://arxiv.org/abs/2205.05131				Apache 2.0	Apache 2.0	transformer	yes	no	no
LaMDA 2	Google	2022-05-11											N/A	transformer	no
YaLM	Yandex	2022-06-22	from scratch	100B				N/A	https://github.com/yandex/YaLM-100B					transformer	yes
BLOOM	BigScience	2022-07-06	from scratch	up to 176B	366B tokens	105d		https://arxiv.org/abs/2211.05100				bigscience-bloom-rail-1.0	bigscience-bloom-rail-1.0	transformer	https://huggingface.co/bigscience/bloom	no	no
NLLB-200 (No Language Left Behind)	Meta AI	2022-07-06	from scratch	55B						https://about.fb.com/news/2022/07/new-meta-ai-model-translates-200-languages-making-technology-more-accessible/				translator				translate between 200 languages
Midjourney	Midjourney Inc.	2022-07-12	from scratch					N/A	N/A	https://www.midjourney.com			N/A	diffuser	no			Exposed as Discord bot
DALL-E 2	OpenAI	2022-07-20	GPT-3					https://cdn.openai.com/papers/dall-e-2.pdf				private	private	diffuser	no
AlexaTM	Amazon	2022-08-02	from scratch	20B	1.3T tokens	120d		https://arxiv.org/abs/2208.01448	https://github.com/amazon-science/alexa-teacher-models					transformer	via SageMaker	no	no
Stable Diffusion	Stability AI	2022-08-10	from scratch	890M				https://arxiv.org/abs/2112.10752	https://github.com/CompVis/stable-diffusion	https://stability.ai/blog/stable-diffusion-announcement				diffuser	yes			See also https://stablediffusionweb.com/
DreamBooth	Google	2022-08-25						https://arxiv.org/abs/2208.12242	https://github.com/google/dreambooth	https://dreambooth.github.io/			N/A		no
CodeGeeX	Tsinghua University	2022-09-19	from scratch	13B	850B tokens	60d		https://arxiv.org/abs/2303.17568	https://github.com/THUDM/CodeGeeX	https://models.aminer.cn/codegeex/blog/		Apache 2.0	CodeGeeX License	code	on request	N/A	N/A
WeLM	WeChat	2022-09-21	from scratch	10B	300B tokens	24d		https://arxiv.org/abs/2209.10372		https://welm.weixin.qq.com/docs/api/					yes	no	no	Chinese language
Sparrow	Google DeepMind	2022-09-22	from scratch	70B				https://arxiv.org/abs/2209.14375		https://www.deepmind.com/blog/building-safer-dialogue-agents			N/A		no	no	yes
GLM-130B	Tsinghua University	2022-10-05	from scratch	130B	400B tokens	60d		https://arxiv.org/abs/2210.02414	https://github.com/THUDM/GLM-130B					transformer	yes
Flan-T5	Google	2022-10-20	T5	60M - 11B				https://arxiv.org/abs/2210.11416	https://github.com/google-research/t5x			Apache 2.0	Apache 2.0	transformer	yes	yes	no
Flan-PaLM	Google	2022-10-20	PaLM	540B		37h		https://arxiv.org/abs/2210.11416	N/A	N/A		N/A	N/A	transformer	no	yes	no
U-PaLM	Google	2022-10-20	PaLM	8B, 62B, 540B		5d		https://arxiv.org/abs/2210.11399				N/A	N/A	transformer	no	no	no
BLOOMZ	BigScience	2022-11-03	BLOOM	176B				https://arxiv.org/abs/2211.01786	https://github.com/bigscience-workshop/xmtf			bigscience-bloom-rail-1.0	bigscience-bloom-rail-1.0	transformer	yes	yes	no	BLOOM + Multitask prompted finetuning (MTF)
mT0	BigScience	2022-11-03	mT5	300M - 13B				https://arxiv.org/abs/2211.01786	https://github.com/bigscience-workshop/xmtf			Apache 2.0	Apache 2.0		https://huggingface.co/bigscience/mt0-large			Google mT5 + Multitask prompted finetuning (MTF)
OpenJourney	PromptHero	2022-11-08	Stable Diffusion					N/A						diffuser	https://huggingface.co/prompthero/openjourney			Stable Diffusion finetuned to resemble MidJourney
Galactica	Meta AI	2022-11-16	from scratch	125M - 120B	106B tokens			https://arxiv.org/abs/2211.09085					cc-by-nc-4.0	transformer	https://huggingface.co/facebook/galactica-120b			Focussed on Science
Stable Diffusion v2	Stability AI	2022-11-24	from scratch					N/A	https://github.com/Stability-AI/stablediffusion	https://stability.ai/blog/stable-diffusion-v2-release				diffuser	yes
GPT-JT	TogetherComputer	2022-11-29	GPT-J	6B					N/A	https://www.together.xyz/blog/releasing-v1-of-gpt-jt-powered-by-open-source-ai		Apache 2.0	Apache 2.0	transformer	https://huggingface.co/togethercomputer/GPT-JT-6B-v1		no
ChatGPT	OpenAI	2022-11-30	GPT 3.5					N/A	N/A	https://openai.com/blog/chatgpt	no	private	private	chatbot	no	yes	yes
OpenCLIP	various	2022-12-14	from scratch					https://arxiv.org/pdf/2212.07143.pdf	https://github.com/LAION-AI/scaling-laws-openclip
OPT-IML	Meta AI	2022-12-22	OPT	30B, 175B				https://arxiv.org/abs/2212.12017				MIT	NC research	transformer	yes	yes	no
Bard	Google	2023-02-06	LaMDA 2 or PaLM 2?										N/A	chatbot	no
LLaMA	Meta AI	2023-02-23	from scratch	7B, 13B, 30B, 65B	1.4T tokens	21d		https://arxiv.org/abs/2302.13971	https://github.com/facebookresearch/llama	https://ai.facebook.com/blog/large-language-model-llama-meta-ai/		GPL 3.0	NC research	transformer	https://huggingface.co/decapoda-research/llama-65b-hf	no	no
Flan-UL2	Google Brain	2023-02-28	UL2	20B	Flan collection			https://arxiv.org/abs/2205.05131v3	https://github.com/google-research/google-research/tree/master/ul2			Apache 2.0	Apache 2.0		https://huggingface.co/google/flan-ul2	yes	no
Open-Assistant SFT-1	OpenAssistant	2023-03-09	Pythia 12B	12B				N/A	https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training	https://open-assistant.io/			Apache 2.0	transformer	https://huggingface.co/OpenAssistant/oasst-sft-1-pythia-12b
Jurassic-2	AI21 Labs	2023-03-09		?				N/A	N/A	https://www.ai21.com/blog/introducing-j2		N/A	N/A		no
Alpaca-LoRA	Eric J. Wang	2023-03-13	LLaMA					N/A	https://github.com/tloen/alpaca-lora					transformer	yes
Alpaca	Stanford	2023-03-13	LLaMA	7B				N/A	https://github.com/tatsu-lab/stanford_alpaca	https://crfm.stanford.edu/2023/03/13/alpaca.html				transformer	yes
h2oGPT	H2O.ai	2023-03-13	Pythia 12B, GPT-NeoX 20B	12B, 20B				N/A	https://github.com/h2oai/h2ogpt	https://gpt.h2o.ai/			Apache 2.0	transformer	https://huggingface.co/h2oai
ChatGLM	Tsinghua University	2023-03-14	GLM / GLM-130B?	6B					https://github.com/THUDM/ChatGLM-6B	https://chatglm.cn/blog				chatbot
GPT4	OpenAI	2023-03-14	from scratch	8x220B				https://arxiv.org/abs/2303.08774				private	private	transformer	no	yes	yes
Zero-1-to-3	Columbia University	2023-03-20						https://arxiv.org/abs/2303.11328	https://github.com/cvlab-columbia/zero123	https://zero123.cs.columbia.edu/				diffuser	yes
Dolly v1	Databricks	2023-03-24	GPT-J	6B				N/A	https://github.com/databrickslabs/dolly	https://www.databricks.com/blog/2023/03/24/hello-dolly-democratizing-magic-chatgpt-open-models.html			cc-by-nc-4.0	chatbot	https://huggingface.co/databricks/dolly-v1-6b
GPT4All	Nomic AI	2023-03-28	LLaMA	7B				https://static.nomic.ai/gpt4all/2023_GPT4All_Technical_Report.pdf	https://github.com/nomic-ai/gpt4all		yes		GPL 3.0	chatbot	https://huggingface.co/nomic-ai/gpt4all-lora			Finetuned LLaMA 7B based on GPT3.5 chats
Cerebras-GPT	Cerebras Systems	2023-03-28	from scratch	111M - 13B				https://arxiv.org/abs/2304.03208	https://github.com/Cerebras/modelzoo	https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-efficient-large-language-models/		Apache 2.0	Apache 2.0	transformer	https://huggingface.co/cerebras/Cerebras-GPT-13B	no	no	Reproduction of GPT 3 training process
LLaMA-Adapter	Shanghai AI Lab	2023-03-28	LLaMA	7B				https://arxiv.org/abs/2303.16199	https://github.com/ZrrSkywalker/LLaMA-Adapter
ColossalChat	Colossal AI	2023-03-29	LLaMA						https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat	https://chat.colossalai.org/			Apache 2.0	chatbot
Vicuna	LM-SYS	2023-03-30	LLaMA	7B, 13B				N/A	https://github.com/lm-sys/FastChat	https://vicuna.lmsys.org/			see LLaMA	transformer	yes
BloombergGPT	Bloomberg	2023-03-30		50B				https://arxiv.org/abs/2303.17564		https://www.bloomberg.com/company/press/bloomberggpt-50-billion-parameter-llm-tuned-finance/				transformer
RWKV-4 Raven	BlinkDL	2023-04-01		1.5B, 3B, 7B, 14B				https://arxiv.org/abs/2305.13048	https://github.com/BlinkDL/RWKV-LM					RNN	https://huggingface.co/BlinkDL/rwkv-4-raven
Pythia	EleutherAI	2023-04-03		70M - 12B	300B tokens			https://arxiv.org/abs/2304.01373	https://github.com/EleutherAI/pythia			Apache 2.0	Apache 2.0	transformer	https://huggingface.co/EleutherAI/pythia-12b	no	no
Koala	UC Berkeley	2023-04-03	LLaMA	7B, 13B				N/A	https://github.com/young-geng/EasyLM#koala	https://bair.berkeley.edu/blog/2023/04/03/koala				transformer	https://huggingface.co/young-geng/koala/tree/main
Baize	Baize Project	2023-04-03	LLaMA	7B, 13B, 30B				https://arxiv.org/abs/2304.01196	https://github.com/project-baize/baize-chatbot					transformer	https://huggingface.co/project-baize/baize-lora-7B			Finetuned LLaMA with LoRA
SAM	Meta AI	2023-04-05						https://arxiv.org/abs/2304.02643	https://github.com/facebookresearch/segment-anything	https://ai.facebook.com/blog/segment-anything-foundation-model-image-segmentation/	yes			vision
Bark	Suno	2023-04-09		80M				N/A	https://github.com/suno-ai/bark				cc-by-nc-4.0	voice	yes
Dolly v2	Databricks	2023-04-12	Pythia	3B, 7B, 12B				N/A	https://github.com/databrickslabs/dolly			Apache 2.0	MIT	chatbot	https://huggingface.co/databricks/dolly-v2-12b	yes	no
CodeWhisperer	Amazon	2023-04-13		N/A				N/A	N/A	https://aws.amazon.com/blogs/aws/amazon-codewhisperer-free-for-individual-use-is-now-generally-available/			N/A	code	no			Self-hosted Copilot clone
GPT4All-J	Nomic AI	2023-04-14	GPT-J	6.7B				https://static.nomic.ai/gpt4all/2023_GPT4All-J_Technical_Report_2.pdf	https://github.com/nomic-ai/gpt4all		yes	Apache 2.0	Apache 2.0	transformer	https://huggingface.co/nomic-ai/gpt4all-j	yes	no
DINOv2	Meta AI	2023-04-14	from scratch	21M - 1.1B				https://arxiv.org/abs/2304.07193	https://github.com/facebookresearch/dinov2	https://ai.facebook.com/blog/dino-v2-computer-vision-self-supervised-learning/				vision	yes
VideoLDM	Nvidia	2023-04-18	Stable Diffusion					https://arxiv.org/abs/2304.08818	N/A	https://research.nvidia.com/labs/toronto-ai/VideoLDM/
StableLM	Stability AI	2023-04-19	from scratch	3B, 7B, (15B, 65B, 175B)				N/A	https://github.com/stability-AI/stableLM/	https://stability.ai/blog/stability-ai-launches-the-first-of-its-stablelm-suite-of-language-models			cc-by-nc-4.0	transformer	yes
Open-Assistant SFT-6	OpenAssistant	2023-04-22	LLaMA	30B				https://arxiv.org/abs/2304.07327					see LLaMA	transformer	https://huggingface.co/OpenAssistant/oasst-sft-6-llama-30b-xor
WizardLM	Microsoft	2023-04-24	LLaMA	7B				https://arxiv.org/abs/2304.12244	https://github.com/nlpxucan/WizardLM					transformer	yes
DeepFloyd IF	Stability AI	2023-04-28						N/A	https://github.com/deep-floyd/IF	https://stability.ai/blog/deepfloyd-if-text-to-image-model
StableVicuna	Stability AI	2023-04-28	Vicuna 13B	13B				N/A	https://github.com/Stability-AI/StableLM	https://stability.ai/blog/stablevicuna-open-source-rlhf-chatbot			cc-by-nc-4.0	transformer	https://huggingface.co/CarperAI/stable-vicuna-13b-delta			Vicuna 13B + RLHF
FastChat-T5	LM-SYS	2023-04-28	Flan-T5-XL	3B				N/A	https://github.com/lm-sys/FastChat#fastchat-t5				Apache 2.0	transformer	https://huggingface.co/lmsys/fastchat-t5-3b-v1.0
LLaMA-Adapter V2	Shanghai AI Lab	2023-04-28	LLaMA					https://arxiv.org/abs/2304.15010	https://github.com/ZrrSkywalker/LLaMA-Adapter					transformer
Replit Code	Replit	2023-05-02	from scratch	2.7B				N/A	https://github.com/replit/ReplitLM	https://replit.com/site/ghostwriter			cc-by-sa-4.0	code	https://huggingface.co/replit/replit-code-v1-3b
OpenLLaMA	OpenLM Research	2023-05-02	from scratch	7B					https://github.com/openlm-research/open_llama		RedPajama		Apache 2.0	transformer	https://huggingface.co/openlm-research/open_llama_7b_preview_300bt			Apache 2.0 LLaMA clone based on RedPajama data
Shap-E	OpenAI	2023-05-03	from scratch	300M				https://arxiv.org/pdf/2305.02463.pdf	https://github.com/openai/shap-e			MIT		diffuser	https://github.com/openai/shap-e/blob/main/shap_e/models/download.py			3D image generation
StarCoder	BigCode	2023-05-04		15B	1T tokens + 35B python tokens		8k	https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view	https://github.com/bigcode-project/starcoder	https://huggingface.co/blog/starcoder			BigCode OpenRAIL-M v1	code	https://huggingface.co/bigcode/starcoder
RedPajama	TogetherComputer	2023-05-05	from scratch	3B, 7B				N/A	https://github.com/togethercomputer/RedPajama-Data	https://www.together.xyz/blog/redpajama-models-v1			Apache 2.0		https://huggingface.co/togethercomputer/RedPajama-INCITE-Base-7B-v0.1			Open reproduction of LLaMA
MPT-7B (MosaicML Pretrained Transformer)	MosaicML	2023-05-05	from scratch	7B				N/A	https://github.com/mosaicml/llm-foundry	https://www.mosaicml.com/blog/mpt-7b			Apache 2.0	transformer	https://huggingface.co/mosaicml/mpt-7b-instruct			Open reproduction of LLaMA
MPT-30B (MosaicML Pretrained Transformer)	MosaicML	2023-06-22	from scratch	30B				N/A	https://github.com/mosaicml/llm-foundry	https://www.mosaicml.com/blog/mpt-30b			Apache 2.0	transformer	https://huggingface.co/mosaicml/mpt-30b-instruct			Open reproduction of LLaMA
PanGu-sigma	Huawei
AnthropicLM	Anthropic AI												N/A		no
Lit-LLaMA			LLaMA	7B, 13B, 30B, 65B								Apache 2.0	NC research			optional with Alcapa	no
ImageBind	Meta AI	2023-05-09	from scratch					https://arxiv.org/abs/2305.05665	https://github.com/facebookresearch/ImageBind	https://ai.facebook.com/blog/imagebind-six-modalities-binding-ai/		cc-by-nc-4.0	cc-by-nc-4.0	transformer	https://dl.fbaipublicfiles.com/imagebind/imagebind_huge.pth			six different modalities: images, text, audio, depth, thermal, and IMU
Open-LLaMA V2	s-JoL	2023-05-11	from scratch					N/A	https://github.com/s-JoL/Open-Llama			MIT	MIT	transformer	https://huggingface.co/s-JoL/Open-Llama-V2	yes	yes
PaLM 2	Google	2023-05-10	from scratch					https://ai.google/static/documents/palm2techreport.pdf	N/A	https://ai.google/discover/palm2				transformer	no

AWS certification

vhsven

2023-04-14

Last month, I decided to get some IT certifications for the first time. I have always been sceptical of such certifications, but a few colleagues managed to convince me that some are worth it. The ones from AWS for example are reasonably priced and the exams require more than just rote memorization.

This blog post from A Cloud Guru helped me decide where to start. The following image in particular was very helpful:

AGC certification guide

Next, I followed their training on PluralSight, and one month later I am happy to report that I am now triple-certified!

Over the summer, I might give the Machine Learning Specialty a try as well.

Update 2024-03: it took a bit longer than anticipated, but I finally obtained the AWS MLS certificate. The example questions you can find online are quite hard and the answers not always as clear-cut as you would want them to be. This held me back from quickly doing the exam. However, in the end, the questions on the real exam were fairly straightforward. Anyone with a few years of machine learning experience should be able to handle them.

The confusing world of OpenAI pricing

vhsven

2023-04-02

As we all know by now, a free version of ChatGPT exists with unpredictable levels of availability. This free version is based on a model called GPT-3.5. If you want higher availability or if you want to be able to switch to the newer GPT-4 model, you need a ChatGPT Plus subscription. That will cost you $20 per month (excl. tax). So far so good.

Confusingly, this subscription will not help you when you want to use the OpenAI API to access GPT models. That requires a separate subscription with a different pricing model. Instead of a fixed price per month, you pay per 1000 tokens as shown in the table below.

model	1K prompt tokens	1K completion tokens	context size
`gpt-3.5-turbo`	$0.002	$0.002	4,096 tokens
`gpt-4`	$0.030	$0.060	8,192 tokens
`gpt-4-32k`	$0.060	$0.120	32,768 tokens

On average, a token corresponds to roughly 4 characters of text. Check out this interactive tool to see exactly how text is parsed into tokens. To give you an idea of how to interpret the context size, the current version of the ChatGPT wikipedia page up to and including the "See also" section contains a little under 8000 tokens. That is about 12.5 pages. That means the 32000 tokens of gpt-4-32k correspond to about 50 pages.

When I wanted to try the API, I installed the openai python package in a conda environment and created the following code snippet.

conda create -n openai python=3 openai
conda activate openai

import os
import openai

openai.api_key = os.environ["OPENAI_API_KEY"]

# print([model["id"] for model in openai.Model.list()["data"]])
# model = "gpt-3.5-turbo"
model = "gpt-4"
# model = "gpt-4-32k"
response = openai.ChatCompletion.create(model=model, messages=[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "ELI5: quantum computing"}
])
# print(response)
print(response["choices"][0]["message"]["content"])

To my surprise, the requested model could not be found. Turns out there is a GPT-4 API waiting list that you have to sign up for, even if you are already a ChatGPT Plus subscriber and as such have access to GPT-4 via the chat interface.

In conclusion: this code snippet will unfortunately only work when you subscribe to the API and got an invite after signing up for the waiting list. You could switch to gpt-3.5-turbo while waiting for the invitation. For light to medium usage, that might be a cheaper and more reliable way to access a GPT assistant than spending $20 per month on ChatGPT Plus.

Update 2023-07-06: the GPT-4 API is now generally available tyo all paying customers without having to join a waitlist.

Step 1: Download the Leaked Source Code

Step 2: Build the Source Code

Prerequisites

Build the Engine

Step 3: Download Game Assets from Steam

Step 4: Combine the Engine and Assets

Step 5: Run the Game

What about Portal 2?

Introduction

System overview

Linux display driver Setup

Automatic Installation

Manual Installation

Important Notes

Compute capability

CUDA and cuDNN

System-wide installation

Isolated installation (recommended)

Relevance of virtual packages in mamba environments

Creating a new mamba environment

Package overview

Channel selection

Setting environment variables

Install frameworks and libraries

Topics for another time

Appendix

Glossary

Mamba: CUDA 12 package overview

Hierarchical meta package list

Flat package list

Repoqueries

Intro

Component selection

GPU

Multi-GPU considerations

CPU

CPU cooler

Motherboard

Other

RAM

Storage

PSU

Case

Cooling fans

Further reading

Step 1 - Install Packages required by PTI¶

Step 2 - Download Pretrained models¶

Step 3 - Configuration Setup¶

Step 4 - Preproccess Data¶

Step 5 - Invert images using PTI¶

Visualize results¶

Export¶

DragGAN¶

Intro

How to set up starcoder in AWS

Local, out of the box usage

Regularly updated sources

Big tech news

Adobe

AI21 labs

Amazon

Andrej Karpathy / Eureka Labs

Anthropic

Apple

Black Forest Labs

Chip Huyen

Cohere

Databricks

Google

Huggingface

Inflection

Lightning AI

Meta

Microsoft

Midjourney

Mistral

NVIDIA

Ollama

OpenAI

Runway ML