Using Llama-2 with Python and ARM64 or AMD64 Environment in a Docker Compose Container.

Tiempo de lectura: 2 minutos

Hello, today we are going to learn how to deploy llama-2 on a server with an ARM64 environment, such as an Ampere 1 offered by Oracle Cloud. It’s worth noting that it’s also compatible with AMD64.

The program in question is called LlaMA C++ and is available for multiple environments:

In our case, we are going to use the Python one: https://github.com/abetlen/llama-cpp-python

First, we have generated a Docker Compose with the following configuration:

Folders:

  • app
  • config
    • models
  • Dockerfile

With this folder structure created, let’s add the file in the root called docker-compose.yml with this content:

version: "3.1"

services:

  llama2:
      build:
        context: ./Dockerfile
        dockerfile: container_python
      restart: unless-stopped
      container_name: llama2
      working_dir: /app
      volumes:
        - ./app:/app/code
        - ./config/models:/app/models

Once created, we will add the file called container_python inside the

Once created, we will add the file named `container_python` to the `Dockerfile` folder with the following content:

# syntax=docker/dockerfile:1
FROM python:3.11.3
RUN apt-get update -y
RUN apt install python3-pip -y
RUN pip install --upgrade pip
RUN pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
# Run a command to keep the container running
CMD tail -f /dev/null

We install a Python 3.11.3 environment and the `llama-cpp-python` library with the options to force-reinstall when running the container.

Now, let’s create the Python script in the `app` directory and name it `code.py`:

from llama_cpp import Llama
llm = Llama(model_path="./../models/7b.Q4_K_S.gguf")
output = llm("Q: What is a dog? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)

In my case, I am specifying the model `7b.Q4_K_S.gguf` that I have downloaded in the next step. I’m asking, “What is a dog?”

You can also modify it like this:

output = llm("What is a dog?")

Now, let’s download this model. Note that with the latest update of `llama-2`, it only supports `.gguf` models. You can download compatible models from the official website: https://huggingface.co

Here’s the model I used: https://huggingface.co/TheBloke/Llama-2-Coder-7B-GGUF/tree/main

Specifically, the model is `llama-2-coder-7b.Q4_K_S.gguf`. Download it and add it to the `config/models` folder. Make sure to rename it to `7b.Q4_K_S.gguf` so that it loads correctly.

Finally, the folder structure looks like this:
app
code.py
config
models
7b.Q4_K_S.gguf
Dockerfile
container_python
docker-compose.yml

Now, you can run your container:

docker-compose up -d

Once it’s up, the container won’t do anything. You have to access it and run the code manually.

To do that, you can use the following commands:

docker exec -it llama2 /bin/bash
cd code
python code.py

It takes some time. In my case, I received the following result:

It responded with a dog emoji.

If you don’t like this model, you can search for other compatible models and try them.

Here’s a list of all compatible models: https://github.com/ggerganov/llama.cpp

Leave a Comment