Enable GPU access is essential if we need to start a LLM model and use GPU with VRAM.

Previous requirements
Please ensure you have an NVIDIA GPU installed on your machine with the drivers installed and updated.
Check using:
nvidia-smi
You should see your GPU and driver version.
Additionally, Docker must be installed.
NVIDIA Container Toolkit needs to be installed.
This allows Docker to detect and use the GPU.
In Ubuntu or Debian:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update sudo apt-get install -y nvidia-docker2 sudo systemctl restart docker
Check that it works:
docker run --rm --gpus all nvidia/cuda:12.2.0-base-ubuntu22.04 nvidia-smi
You should see your GPU listed inside the container.
Running a container with GPU
Using Docker CLI:
html
docker run --gpus all -it --rmvllm/vllm-openai:latest--model meta-llama/Meta-Llama-3-8B-Instruct
--gpus all grants access to all available GPUs.
VLLM will detect the GPU automatically and use it.
Using Docker Compose
Minimum example docker-compose.yml:
version: "3.9" services: vllm: image: vllm/vllm-openai:latest container_name: vllm_gpu ports: - "8000:8000" volumes: - ./models:/model deploy: resources: reservations: devices: - driver: nvidia count: all capabilities: [gpu] command: > --model /model --host 0.0.0.0 --port 8000
Runs:
docker compose up -d
Check GPU inside the container
docker exec -it vllm_gpu bash python -c "import torch; print(torch.cuda.is_available())"
If returns True, the GPU is active.
You can also use:
nvidia-smi
Production Tips
To use multiple GPUs:
--gpus '"device=0,1"'
Adjust GPU memory in vLLM:
GPU Memory Utilization 0.9
Mounting local models in /model avoids downloading each time:
Volumes: - ./models:/model
