Catching up with the Deep Learning revolution

Timeline

Information sources

Important people

  • Geoffrey Hinton (1947): Google Brain, 1/3 godfathers of AI, backpropagation
  • Yann LeCun (1960): FB, 1/3 godfathers of AI, CNN
  • Yoshua Bengio (1964): Deep Learning book, 1/3 godfathers of AI
  • Andrew Ng (1976): Google Brain, Baidu, Coursera, deeplearning.ai,
  • Ian Goodfellow (1986): Deep Learning book, Google Brain, OpenAI, Apple, GANs, supervised by Ng + Bengio
  • François Chollet: Google, Keras
  • Aaron Courville
  • Pieter Abbeel, prof EE/robotics/AI @ UC Berkeley
    • ESAT at KUL
    • PhD at Stanford under Andrew Ng
    • podcast: The Robot Brains
  • Andrej Karpathy: Stanford, Tesla, OpenAI, Eureka Labs
  • Chip Huyen: Stanford, Claypot AI, Voltron Data
  • Ilya Sutskever: AlexNet, Google, OpenAI
  • Tim Dettmers: QLoRA, bitsandbytes, GPU comparison

Modalities

  • input
    • text
      • code
    • audio
      • speech / voice
    • visual
      • image
      • video
  • output
    • text
      • code
    • audio
      • speech / voice
      • music
    • actions
      • movement (robots)
      • tools/APIs (agents)

Glossary

  • AE: auto encoder
  • AI: artificial intelligence
  • ANN: artificial neural network
  • BERT: bidirectional encoder representations from transformers
  • BPE: byte pair encoding
  • CLIP: contrastive language-image pretraining
  • CNN: convolutional neural network
  • CoT: chain of thought
  • CPU: central processing unit
  • DBN: deep belief network
  • DL: deep learning
  • DNN: deep neural network
  • DRL: deep reinforcement learning
  • EM: expectation maximization
  • Flan: finetuned language model
  • FNN: feedforward neural network
  • GAN: generative adversarial network
  • GPT: generative pre-trained transformer
  • GPU: graphical processing unit
  • HF: HuggingFace
  • LiT: locked image tuning
  • LLM: large language model
  • LoRA: low-rank adaptation
  • LSTM: long short term memory
  • ML: machine learning
  • MLP: multilayer perceptron
  • MoE: mixture of experts
  • MP: max pooling
  • NLG: natural language generation
  • NLP: natural language processing
  • NLU: natural language understanding
  • PEFT: parameter-efficient fine-tuning
  • RAG: retrieval-augmented generation
  • RBM: restricted Boltzmann machine
  • ReLU: rectified linear unit
  • RL: reinforcement learning
  • RNN: recurrent neural network
  • SFT: supervised finetuning
  • SGD: stochastic gradient descent
  • SL: supervised learning
  • SOTA: state of the art
  • SSL: self-supervised learning
  • SVM: support vector machines
  • TPU: tensor processing unit
  • UL: unsupervised learning
  • VAE: variational auto encoder
  • ViT: vision transformer
  • VRAM: video RAM (i.e., the memory of the GPU)

Infrastructure

  • you will need one or more Nvidia GPUs
    • with CUDA, Tensor Cores and cuDNN support
    • overview of recent Nvidia GPU architectures:
Architecture Desktop Workstation Datacenter
Pascal (2016) GeForce GTX 10xx Quadro P Tesla P4 / Tesla P100
Volta (2017) N/A Quadro GV100 Tesla V100
Turing (2018) GeForce RTX 20xx Quadro RTX Tesla T4
Ampere (2020) GeForce RTX 30xx RTX A series A100
Ada (2022) GeForce RTX 40xx RTX 6000 Ada N/A?
Hopper (2022) N/A N/A H100, H200
Blackwell GeForce RTX 50xx ? B100, B200

Cloud environments

Accelerator Standard RAM High RAM*
None 12.7 GB 25.5 GB
Standard GPU 12.7 GB 25.5 GB
Premium GPU* 12.7 GB 25.5 GB
TPU 12.7 GB 35.2 GB

Machine learning libraries

Datasets

Model hubs

Model metrics and benchmarks

Vision models

  • outdated
    • MNIST error rate
    • ImageNet error rate
  • recent
    • ...

Language models

Misc