Catching up with the Deep Learning revolution

Timeline
Information sources
Important people
Modalities
Glossary
Infrastructure
- Cloud environments
Machine learning libraries
Datasets
Model hubs
Model metrics and benchmarks
- Vision models
- Language models
Misc

Timeline

(dates vary by source)
two main AI research directions
- symbolic AI
  - good match with classic CS
  - 1948 - 1966: summer 1
    - logic
      - logic programming
      - theorem provers
    - search
    - 1966: ELIZA chatbot
  - 1967 - 1977: winter 1
  - 1978 - 1987: summer 2
    - expert systems
  - 1988 - 1993: winter 2
  - 1993 - 2012
    - Hidden Markov Models (HMM)
    - Bayesian reasoning
    - decision trees
  - 2012 - present
    - deep learning (non-symbolic)
- neural networks
  - uses techniques typically found in electrical engineering
  - 1943 - 1969: wave 1
    - 1943: McCulloch-Pitts neuron, neural networks
      - neural networks are Turing complete
    - 1950: Turing: imitation game (Turing test)
    - 1950s: stochastic gradient descent
    - 1956: Darthmouth workshop
    - 1957: Rosenblatt perceptron
    - 1969: Minsky & Papert: Perceptrons book
      - pessimism caused decline in neural net funding
      - shift towards symbolic AI
  - 1969 - 1987: winter
    - 1982: Hopfield networks, a type of RNN
    - 1982: Self-organizing maps
    - 1986: Rumelhart & Hinton: backpropagation
  - 1987 - 2012: wave 2
    - 1987: Rumelhart: connectionism
      - shift from symbolic AI back to neural networks
      - hidden layers
      - sigmoid activation function
    - 1989: Lecun: handwritten digit recognition with CNNs and backpropagation
    - 1992: Vapnik: SVM kernel trick
    - 1997: LSTM
    - 1997: bidirectional RNNs
    - 1998: Lecun & Bengio: handwritten character recognition with CNNs
    - 2003: Bengio: deep learning for language modeling
    - 2006: Hinton: deep learning for handwritten digit recognition
  - 2012 - present: wave 3 - more compute and storage
    - 2012: AlexNet CNN wins ImageNet ILSVRC 2012
    - 2014: Goodfellow: GANs
    - 2016: Google DeepMind: AlphaGo
    - 2017: Google: transformers

Information sources

recent LLM surveys (early 2023)
books
newsletters
tutorials
- https://www.tensorflow.org/tutorials
- https://pytorch.org/tutorials
blogs
- GAFA
- other
courses
- classic ML
  - https://www.coursera.org/learn/machine-learning by Andrew Ng, Stanford
  - https://www.coursera.org/learn/ai-for-everyone by Andrew Ng for non technical people (2019)
  - https://app.ai-cursus.nl by The Netherlands
  - https://www.elementsofai.com by Finland (2019)
    - https://www.elementsofai.be
  - https://developers.google.com/machine-learning/crash-course
  - https://cloud.google.com/blog/topics/developers-practitioners/ai-all-humans-course-delight-and-inspire
    - internal Google course by Cassie Kozyrkov
  - https://aws.amazon.com/training/learning-paths/machine-learning
  - https://aws.amazon.com/machine-learning/mlu
  - https://www.gptandchill.ai/codingproblems
  - https://www.deep-ml.com/
- deep learning
podcasts
- https://www.deepmind.com/the-podcast
  - with Hannah Fry
- https://twimlai.com/podcast/twimlai
- https://lexfridman.com/podcast
- https://www.therobotbrains.ai
  - with Pieter Abbeel
conferences
- general
- computer vision
  - CVPR
  - ICCV
  - ECCV
visualizations / misc
implementations

Important people

Geoffrey Hinton (1947): Google Brain, 1/3 godfathers of AI, backpropagation
Yann LeCun (1960): FB, 1/3 godfathers of AI, CNN
Yoshua Bengio (1964): Deep Learning book, 1/3 godfathers of AI
Andrew Ng (1976): Google Brain, Baidu, Coursera, deeplearning.ai,
Ian Goodfellow (1986): Deep Learning book, Google Brain, OpenAI, Apple, GANs, supervised by Ng + Bengio
François Chollet: Google, Keras
Aaron Courville
Pieter Abbeel, prof EE/robotics/AI @ UC Berkeley
- ESAT at KUL
- PhD at Stanford under Andrew Ng
- podcast: The Robot Brains
Andrej Karpathy: Stanford, Tesla, OpenAI, Eureka Labs
Chip Huyen: Stanford, Claypot AI, Voltron Data
Ilya Sutskever: AlexNet, Google, OpenAI
Tim Dettmers: QLoRA, bitsandbytes, GPU comparison

Modalities

input
- text
  - code
- audio
  - speech / voice
- visual
  - image
  - video
output
- text
  - code
- audio
  - speech / voice
  - music
- actions
  - movement (robots)
  - tools/APIs (agents)

Glossary

AE: auto encoder
AI: artificial intelligence
ANN: artificial neural network
BERT: bidirectional encoder representations from transformers
BPE: byte pair encoding
CLIP: contrastive language-image pretraining
CNN: convolutional neural network
CoT: chain of thought
CPU: central processing unit
DBN: deep belief network
DL: deep learning
DNN: deep neural network
DRL: deep reinforcement learning
EM: expectation maximization
Flan: finetuned language model
FNN: feedforward neural network
GAN: generative adversarial network
GPT: generative pre-trained transformer
GPU: graphical processing unit
HF: HuggingFace
LiT: locked image tuning
LLM: large language model
LoRA: low-rank adaptation
LSTM: long short term memory
ML: machine learning
MLP: multilayer perceptron
MoE: mixture of experts
MP: max pooling
NLG: natural language generation
NLP: natural language processing
NLU: natural language understanding
PEFT: parameter-efficient fine-tuning
RAG: retrieval-augmented generation
RBM: restricted Boltzmann machine
ReLU: rectified linear unit
RL: reinforcement learning
RNN: recurrent neural network
SFT: supervised finetuning
SGD: stochastic gradient descent
SL: supervised learning
SOTA: state of the art
SSL: self-supervised learning
SVM: support vector machines
TPU: tensor processing unit
UL: unsupervised learning
VAE: variational auto encoder
ViT: vision transformer
VRAM: video RAM (i.e., the memory of the GPU)

Infrastructure

you will need one or more Nvidia GPUs
- with CUDA, Tensor Cores and cuDNN support
- overview of recent Nvidia GPU architectures:

Architecture	Desktop	Workstation	Datacenter
Pascal (2016)	GeForce GTX 10xx	Quadro P	Tesla P4 / Tesla P100
Volta (2017)	N/A	Quadro GV100	Tesla V100
Turing (2018)	GeForce RTX 20xx	Quadro RTX	Tesla T4
Ampere (2020)	GeForce RTX 30xx	RTX A series	A100
Ada (2022)	GeForce RTX 40xx	RTX 6000 Ada	N/A?
Hopper (2022)	N/A	N/A	H100, H200
Blackwell	GeForce RTX 50xx	?	B100, B200

Cloud environments

https://cloud-gpus.com/
✨ https://colab.research.google.com
- tiers
  - free tier
    - access to Standard GPUs
      - e.g., Telse T4 16GB
  - Colab Pro (~$10/month)
    - access to Premium GPUs
      - e.g., V100 16GB or A100 40GB (subject to availability)
    - access to High RAM environments

Accelerator	Standard RAM	High RAM*
None	12.7 GB	25.5 GB
Standard GPU	12.7 GB	25.5 GB
Premium GPU*	12.7 GB	25.5 GB
TPU	12.7 GB	35.2 GB

https://www.paperspace.com
https://lambdalabs.com
https://vast.ai
https://jarvislabs.ai
https://modal.com
https://replicate.com
https://www.latitude.sh/accelerate
https://huggingface.co
https://www.anyscale.com
https://www.beam.cloud
traditional cloud
- Amazon Web Services (AWS)
  - SageMaker
  - Deep Learning Containers
- Microsoft Azure
- Google Cloud Platform (GCP)
  - Vertex AI

Machine learning libraries

classic ML
deep learning
- https://github.com/explosion/spaCy
- https://github.com/facebookresearch/fairscale
- https://github.com/facebookresearch/xformers
- https://github.com/google/jax
- https://github.com/huggingface/accelerate
  - A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
- https://github.com/huggingface/diffusers
- https://github.com/huggingface/text-generation-inference
- https://github.com/huggingface/transformers
- https://github.com/huggingface/peft
- https://keras.io
- https://pytorch.org
- https://www.tensorflow.org
tokenizers
- https://github.com/google/sentencepiece
- https://github.com/openai/tiktoken
  - https://platform.openai.com/tokenizer
LLM inference runners
- https://github.com/huggingface/transformers
- https://github.com/ggerganov/llama.cpp
  - runs on CPU
  - requires models in GGUF format
- https://github.com/vllm-project/vllm
- https://www.nomic.ai/gpt4all
- https://ollama.com/
- https://github.com/turboderp/exllamav2
- https://github.com/NVIDIA/TensorRT-LLM
LLM frontends
- https://github.com/open-webui/open-webui
embeddings
- 2013-01-16 Google word2vec
- 2015-09-01 Stanford GloVe
- 2015-11-09 Facebook fastText
  - https://fasttext.cc
- 2019-08-27 Sentence-BERT
  - https://github.com/UKPLab/sentence-transformers
- 2022-12-15 OpenAI text-embedding-ada-002
- MTEB leaderboard
approximate nearest neighbors (ANN) using vector databases
- Locality Sensitive Hashing (LSH)
- Facebook AI Similarity Search (FAISS; 2017)
  - https://github.com/facebookresearch/faiss
- Hierarchical Navigable Small Worlds (HNSW)
- ScaNN (Google, 2019)
  - https://arxiv.org/abs/1908.10396
  - each addition requires rebuilding the index
- https://qdrant.tech
- https://www.trychroma.com
- https://www.pinecone.io
- https://weaviate.io
- https://lancedb.com
- https://vespa.ai
- https://github.com/pgvector/pgvector
- https://api.python.langchain.com/en/latest/api_reference.html#module-langchain.vectorstores
finetuning
- 2019-02-02 adapters
  - 2023-03-28 https://github.com/OpenGVLab/LLaMA-Adapter
- 2021-01-01 Stanford prefix tuning
- 2021-04-18 Google soft prompt tuning
- 2021-06-17 https://github.com/microsoft/LoRA
  - https://arxiv.org/abs/2106.09685
- 2023-05-23 https://github.com/artidoro/qlora
- https://github.com/huggingface/peft

Datasets

catalogs
datasets
- text
  - CommonCrawl
  - Stanford Question Answering Dataset (SQuAD) (2016)
  - GLUE (2018)
  - OpenWebText (2019)
  - Colossal Clean Crawled Corpus (C4) (2019)
  - The Pile (2020)
  - The Flan Collection (2023)
  - MATH (2021)
  - wikipedia
  - arxiv
  - gutenberg
  - stackoverflow
  - stackexchange
  - github
  - reddit
  - twitter
  - IMDb
- images
  - MNIST (Modified National Institute of Standards and Technology) (1994)
    - 28x28 handwritten digits
    - train: 60 000
    - test: 10 000
  - CIFAR (2009)
    - used for training AlexNet
    - CIFAR-10
      - 10 classes of 6000 32x32 colour images each
    - CIFAR-100
      - 100 classes of 600 32x32 colour images each
  - ImageNet (2009)
  - Common Objects in Context (COCO; 2014)
  - CelebFaces Attributes (CelebA; 2015)
  - Flickr Faces (FFHQ; 2019)
  - Met Faces (2020)
  - Large-scale AI Open Network (LAION)
    - LAION-400M (2021)
    - LAION-5B (2022)
      - 5B image-text pairs

Model hubs

Model metrics and benchmarks

Vision models

outdated
- MNIST error rate
- ImageNet error rate
recent
- ...

Language models

note: evaluation often depends not only on eval metric, but also on specific implementation of that metric
perplexity
- "a measurement of how well a probability model predicts a sample"
- $\displaystyle PPL(X) = \exp \left\{ - \frac{1}{t} \sum_{i=0}^t \log p_\theta (x_i \mid x_{<i}) \right\}$
- lower is better
- best suited for classic/causal/autoregressive models
  - not masked models like BERT
Bilingual Evaluation Understudy (BLEU; 2002)
- precision-oriented
- popular for machine translation
- cost-effective
Recall-Oriented Understudy for Gisting Evaluation (ROUGE; 2004)
- recall-oriented
- popular for summarization
BERTScore (2019)
- cosine similarity based on embedding
  - accounts for synonyms, paraphrasing
MoverScore (2019)
Measuring Massive Multitask Language Understanding (MMLU; 2020)
- test to measure a text model’s multitask accuracy
- covers 57 tasks
  - elementary mathematics
  - US history
  - computer science
  - law
Multilingual Grade School Math (MGSM; 2022)
Holistic Evaluation of Language Models (HELM; 2022)
- 7 metrics (accuracy, calibration, robustness, fairness, bias, toxicity, and efficiency)
- 16 core scenarios
G-Eval (2023)
- use another LLM as evaluator
AI2 Reasoning Challenge
- set of grade-school science questions
HellaSwag
- test of commonsense inference, which is easy for humans (~95%) but challenging for SOTA models
TruthfulQA
- test to measure a model’s propensity to reproduce falsehoods commonly found online
EleutherAI Eval
AlpacaEval
- for instruction-following LLMs
HumanEval
Chatbot Arena

Timeline

Information sources

Important people

Modalities

Glossary

Infrastructure

Cloud environments

Machine learning libraries

Datasets

Model hubs

Model metrics and benchmarks

Vision models

Language models

Misc