Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Using Llama-2 with Python and ARM64 or AMD64 Environment in a Docker Compose Container.

Tiempo de lectura: 2 minutos

Hello, today we are going to learn how to deploy llama-2 on a server with an ARM64 environment, such as an Ampere 1 offered by Oracle Cloud. It’s worth noting that it’s also compatible with AMD64.

The program in question is called LlaMA C++ and is available for multiple environments:

In our case, we are going to use the Python one: https://github.com/abetlen/llama-cpp-python

First, we have generated a Docker Compose with the following configuration:

Folders:

  • app
  • config
    • models
  • Dockerfile

With this folder structure created, let’s add the file in the root called docker-compose.yml with this content:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
version: "3.1"
services:
llama2:
build:
context: ./Dockerfile
dockerfile: container_python
restart: unless-stopped
container_name: llama2
working_dir: /app
volumes:
- ./app:/app/code
- ./config/models:/app/models
version: "3.1" services: llama2: build: context: ./Dockerfile dockerfile: container_python restart: unless-stopped container_name: llama2 working_dir: /app volumes: - ./app:/app/code - ./config/models:/app/models
version: "3.1"

services:

  llama2:
      build:
        context: ./Dockerfile
        dockerfile: container_python
      restart: unless-stopped
      container_name: llama2
      working_dir: /app
      volumes:
        - ./app:/app/code
        - ./config/models:/app/models

Once created, we will add the file called container_python inside the

Once created, we will add the file named `container_python` to the `Dockerfile` folder with the following content:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
# syntax=docker/dockerfile:1
FROM python:3.11.3
RUN apt-get update -y
RUN apt install python3-pip -y
RUN pip install --upgrade pip
RUN pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
# Run a command to keep the container running
CMD tail -f /dev/null
# syntax=docker/dockerfile:1 FROM python:3.11.3 RUN apt-get update -y RUN apt install python3-pip -y RUN pip install --upgrade pip RUN pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir # Run a command to keep the container running CMD tail -f /dev/null
# syntax=docker/dockerfile:1
FROM python:3.11.3
RUN apt-get update -y
RUN apt install python3-pip -y
RUN pip install --upgrade pip
RUN pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir
# Run a command to keep the container running
CMD tail -f /dev/null

We install a Python 3.11.3 environment and the `llama-cpp-python` library with the options to force-reinstall when running the container.

Now, let’s create the Python script in the `app` directory and name it `code.py`:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
from llama_cpp import Llama
llm = Llama(model_path="./../models/7b.Q4_K_S.gguf")
output = llm("Q: What is a dog? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)
from llama_cpp import Llama llm = Llama(model_path="./../models/7b.Q4_K_S.gguf") output = llm("Q: What is a dog? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True) print(output)
from llama_cpp import Llama
llm = Llama(model_path="./../models/7b.Q4_K_S.gguf")
output = llm("Q: What is a dog? A: ", max_tokens=32, stop=["Q:", "\n"], echo=True)
print(output)

In my case, I am specifying the model `7b.Q4_K_S.gguf` that I have downloaded in the next step. I’m asking, “What is a dog?”

You can also modify it like this:

output = llm("What is a dog?")
output = llm("What is a dog?")

Now, let’s download this model. Note that with the latest update of `llama-2`, it only supports `.gguf` models. You can download compatible models from the official website: https://huggingface.co

Here’s the model I used: https://huggingface.co/TheBloke/Llama-2-Coder-7B-GGUF/tree/main

Specifically, the model is `llama-2-coder-7b.Q4_K_S.gguf`. Download it and add it to the `config/models` folder. Make sure to rename it to `7b.Q4_K_S.gguf` so that it loads correctly.

Finally, the folder structure looks like this:

app<br>code.py<br>config<br>models<br>7b.Q4_K_S.gguf<br>Dockerfile<br>container_python<br>docker-compose.yml
app
code.py
config
models
7b.Q4_K_S.gguf
Dockerfile
container_python
docker-compose.yml

Now, you can run your container:

<br>docker-compose up -d

docker-compose up -d

Once it’s up, the container won’t do anything. You have to access it and run the code manually.

To do that, you can use the following commands:

docker exec -it llama2 /bin/bash<br>cd code<br>python code.py
docker exec -it llama2 /bin/bash
cd code
python code.py

It takes some time. In my case, I received the following result:

It responded with a dog emoji.

If you don’t like this model, you can search for other compatible models and try them.

Here’s a list of all compatible models: https://github.com/ggerganov/llama.cpp

0

Leave a Comment