StarCoder (WIP)

Intro

How to set up starcoder in AWS

  • create S3 bucket
  • create policy that allows read/write to that bucket
  • create EC2 role containing that policy
  • start a new EC2 instance
    • TODO select right instance type
    • t2.micro for now to set up S3 properly
    • use newly created IAM role
  • sudo yum install git
  • Amazon Linux 2023 does not support git-lfs out of the box, workaround:
    • curl -LO https://github.com/git-lfs/git-lfs/releases/download/v3.3.0/git-lfs-linux-amd64-v3.3.0.tar.gz
    • tar xvfz git-lfs-linux-amd64-v3.3.0.tar.gz
    • sudo ./install.sh instead of git lfs install
    • git lfs version
  • git clone https://huggingface.co/bigcode/starcoder
    • takes a while, needs to download 65GB
  • cd starcoder
  • TODO save to S3
  • don't forget to stop the instance when you're done

Local, out of the box usage

  • conda create -n starcoder python=3.11
  • conda activate starcoder
  • git clone https://github.com/bigcode-project/starcoder.git
  • cd starcoder
  • pip install -r requirements.txt
  • set ENV HUGGING_FACE_HUB_TOKEN
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
checkpoint = "bigcode/starcoder"

model = AutoModelForCausalLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
print( pipe("def hello():") )