How to activate GPU access from a Docker container, for example, to access an LLM model.

Tiempo de lectura: 2 minutos

Enable GPU access is essential if we need to start a LLM model and use GPU with VRAM.

eye - pexels

Previous requirements

Please ensure you have an NVIDIA GPU installed on your machine with the drivers installed and updated.

Check using:

nvidia-smi 

You should see your GPU and driver version.

Additionally, Docker must be installed.

NVIDIA Container Toolkit needs to be installed.

This allows Docker to detect and use the GPU.

In Ubuntu or Debian:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker 

Check that it works:

docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi

You should see your GPU listed inside the container.

Running a container with GPU

Using Docker CLI:

html

docker run --gpus all -it --rm vllm/vllm-openai:latest --model meta-llama/Meta-Llama-3-8B-Instruct

--gpus all grants access to all available GPUs.

VLLM will detect the GPU automatically and use it.

Using Docker Compose

Minimum example docker-compose.yml:

version: "3.9" services: vllm: image: vllm/vllm-openai:latest container_name: vllm_gpu ports: - "8000:8000" volumes: - ./models:/model deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] command: > --model /model --host 0.0.0.0 --port 8000 

Runs:

docker compose up -d 

Check GPU inside the container

docker exec -it vllm_gpu bash python -c "import torch; print(torch.cuda.is_available())" 

If returns True, the GPU is active.
You can also use:

nvidia-smi 

Production Tips

To use multiple GPUs:

--gpus '"device=0,1"' 

Adjust GPU memory in vLLM:

GPU Memory Utilization 0.9

Mounting local models in /model avoids downloading each time:

Volumes: - ./models:/model

Leave a Comment