Intro
How to set up starcoder in AWS
- create S3 bucket
- create policy that allows read/write to that bucket
- create EC2 role containing that policy
- start a new EC2 instance
- TODO select right instance type
- t2.micro for now to set up S3 properly
- use newly created IAM role
sudo yum install git
- Amazon Linux 2023 does not support git-lfs out of the box, workaround:
curl -LO https://github.com/git-lfs/git-lfs/releases/download/v3.3.0/git-lfs-linux-amd64-v3.3.0.tar.gz
tar xvfz git-lfs-linux-amd64-v3.3.0.tar.gz
-
sudo ./install.sh
instead of git lfs install
git lfs version
-
git clone https://huggingface.co/bigcode/starcoder
- takes a while, needs to download 65GB
cd starcoder
- TODO save to S3
- don't forget to stop the instance when you're done
Local, out of the box usage
conda create -n starcoder python=3.11
conda activate starcoder
git clone https://github.com/bigcode-project/starcoder.git
cd starcoder
pip install -r requirements.txt
- set ENV
HUGGING_FACE_HUB_TOKEN
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
checkpoint = "bigcode/starcoder"
model = AutoModelForCausalLM.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, device=0)
print( pipe("def hello():") )