Install Llama 2, Meta’s Open Source and Commercializable AI that Competes with Chat GPT for Use with Python and Docker (with GPU Access)

Tiempo de lectura: 3 minutos

Reading time: 3 minutes

Hello, today we are going to see how we can install and download llama 2, the AI from Meta that competes with chatgpt 3.5. To do this, I have created a Docker Compose that will help us set up the environment.

The first thing we need to do is go to the llama 2 page and request a version on their own website https://ai.meta.com/llama/

Now it will ask for an email account, I recommend using a university or research account to get the approval as quickly as possible:

Next, you will receive an email with the installation instructions.

To start the installation, first, we need to download the Llama 2 model environment. Go to their GitHub page and clone it:

https://github.com/facebookresearch/llama

Once downloaded, you can either install it on your machine or use a Docker Compose.

I will create a Docker Compose that you can use to install this AI:

  • Create a folder named “app” in the root of the project
  • Now create another folder named “Dockerfile” where we will add our custom environment configuration file.

Create a file named “container_python” inside Dockerfile (Dockerfile/container_python) and add the following content:

# syntax=docker/dockerfile:1
#FROM python:3.11.3
FROM python:3.8

#WORKDIR /app

#-y is used for when there is a "yes" error 
RUN apt-get update -y
RUN apt install python3-pip -y
RUN pip install --upgrade pip
# Execute a command to keep the container running
CMD tail -f /dev/null

Now create a file in the root called docker-compose.yml

version: "3.1"

services:

  llama_ia_python:
      build:
        context: ./Dockerfile
        dockerfile: container_python
      restart: unless-stopped
      container_name: llama_ia_python
      working_dir: /app
      volumes:
        - ./app:/app

With this, we have our environment set up.

Now we need to launch the container using the command:

docker compose up -d

This will leave the container running in the background, and we can execute our commands. I recommend accessing the console directly from Docker:

docker exec -it llama_ia_python bash

This will open the console.

The first thing we need to do is navigate to the llama repository:

cd llama-main

And run download.sh to download the necessary models.

./download.sh

Now it will ask for the installation URL, which is received from the email (“When asked for your unique custom URL, please insert the following:“), copy it and paste it.

Next, it will ask which models we want to download; if we want all of them, press Enter.

It will start downloading the models:

Once the models are downloaded, I recommend saving them separately; they are in the folders:

Since you can only download them for 24 hours and 5 times.

Once the models are downloaded, we can install llama:

pip install -e .

Now, to verify that it works, we need to use this command:

torchrun --nproc_per_node 1 example_text_completion.py \
    --ckpt_dir llama-2-7b/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 128 --max_batch_size 4

With this, you can test the chat:

torchrun --nproc_per_node 1 example_chat_completion.py \
    --ckpt_dir llama-2-7b-chat/ \
    --tokenizer_path tokenizer.model \
    --max_seq_len 512 --max_batch_size 4

*Note: If you are using Docker, make sure it can access the GPU; otherwise, this service will not work. If it doesn’t work with Docker Compose, you can try running it locally.

Here you can download more models compatible with llama: https://huggingface.co/meta-llama/

https://huggingface.co/meta-llama/

Leave a Comment