Reading time: 3 minutes
Hello, today we are going to see how we can install and download llama 2, the AI from Meta that competes with chatgpt 3.5. To do this, I have created a Docker Compose that will help us set up the environment.
The first thing we need to do is go to the llama 2 page and request a version on their own website https://ai.meta.com/llama/
Now it will ask for an email account, I recommend using a university or research account to get the approval as quickly as possible:
Next, you will receive an email with the installation instructions.
To start the installation, first, we need to download the Llama 2 model environment. Go to their GitHub page and clone it:
https://github.com/facebookresearch/llama
Once downloaded, you can either install it on your machine or use a Docker Compose.
I will create a Docker Compose that you can use to install this AI:
- Create a folder named “app” in the root of the project
- Now create another folder named “Dockerfile” where we will add our custom environment configuration file.
Create a file named “container_python” inside Dockerfile (Dockerfile/container_python) and add the following content:
# syntax=docker/dockerfile:1 #FROM python:3.11.3 FROM python:3.8 #WORKDIR /app #-y is used for when there is a "yes" error RUN apt-get update -y RUN apt install python3-pip -y RUN pip install --upgrade pip # Execute a command to keep the container running CMD tail -f /dev/null
Now create a file in the root called docker-compose.yml
version: "3.1" services: llama_ia_python: build: context: ./Dockerfile dockerfile: container_python restart: unless-stopped container_name: llama_ia_python working_dir: /app volumes: - ./app:/app
With this, we have our environment set up.
Now we need to launch the container using the command:
docker compose up -d
This will leave the container running in the background, and we can execute our commands. I recommend accessing the console directly from Docker:
docker exec -it llama_ia_python bash
This will open the console.
The first thing we need to do is navigate to the llama repository:
cd llama-main
And run download.sh to download the necessary models.
./download.sh
Now it will ask for the installation URL, which is received from the email (“When asked for your unique custom URL, please insert the following:“), copy it and paste it.
Next, it will ask which models we want to download; if we want all of them, press Enter.
It will start downloading the models:
Once the models are downloaded, I recommend saving them separately; they are in the folders:
Since you can only download them for 24 hours and 5 times.
Once the models are downloaded, we can install llama:
pip install -e .
Now, to verify that it works, we need to use this command:
torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ --max_seq_len 128 --max_batch_size 4
With this, you can test the chat:
torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir llama-2-7b-chat/ \ --tokenizer_path tokenizer.model \ --max_seq_len 512 --max_batch_size 4
*Note: If you are using Docker, make sure it can access the GPU; otherwise, this service will not work. If it doesn’t work with Docker Compose, you can try running it locally.
Here you can download more models compatible with llama: https://huggingface.co/meta-llama/