GPU Support
GPU Support#
Nixi can detect, configure, and monitor both NVIDIA and AMD GPUs on NixOS. GPU support is important for local LLM acceleration – Ollama and other inference engines use GPU compute to dramatically speed up model inference.
NVIDIA#
The nvidia tool manages NVIDIA GPU drivers, container GPU passthrough, and monitoring.
What enable configures#
| NixOS option | Purpose |
|---|---|
nixpkgs.config.allowUnfree = true | Required for proprietary NVIDIA drivers |
hardware.graphics.enable = true | Enable hardware graphics stack |
services.xserver.videoDrivers = ["nvidia"] | Load NVIDIA video driver |
hardware.nvidia.modesetting.enable = true | Kernel modesetting for Wayland and smooth display |
hardware.nvidia.open = false | Use proprietary driver (more stable than open kernel module) |
hardware.nvidia-container-toolkit.enable = true | GPU passthrough to Podman/Docker containers |
LLM acceleration#
NVIDIA GPUs work with Ollama out of the box once drivers are installed. The proprietary driver includes CUDA, which Ollama auto-detects via nvidia-smi. No additional configuration is needed.
The container toolkit also enables GPU passthrough, so Ollama running inside a Podman container can access the GPU.
Configuration#
The configure action lets you tune NVIDIA driver settings after enabling:
power_management– Enable/disable suspend/resume support. Important for desktops and laptops that sleep.open_driver– Switch between the proprietary and open-source NVIDIA kernel module. The open module is available for Turing (RTX 20 series) and newer GPUs. Proprietary is more stable and recommended for most users.
Example: “enable NVIDIA power management” or “switch to the open NVIDIA driver”
After enabling#
A reboot is required after the initial enable for the NVIDIA kernel module to load. After reboot, verify with:
- “show GPU status” – should show driver version, CUDA version, memory, utilization, and temperature
nvidia-smi– direct driver verification from the command line
AMD#
The amdgpu tool manages AMD GPU drivers, ROCm compute stack, and monitoring.
What enable configures#
| NixOS option | Purpose |
|---|---|
boot.initrd.kernelModules = ["amdgpu"] | Load amdgpu driver early for KMS |
hardware.graphics.enable = true | Enable hardware graphics stack |
services.xserver.videoDrivers = ["amdgpu"] | Load amdgpu video driver |
hardware.graphics.extraPackages = [rocmPackages.clr.icd] | ROCm OpenCL ICD for GPU compute |
systemd.tmpfiles.rules with /opt/rocm/hip symlink | HIP runtime path (required by most ROCm software) |
LLM acceleration#
AMD GPUs use ROCm (specifically HIP) for GPU compute. Ollama supports AMD GPUs through the ROCm stack. The enable action sets up everything needed:
- ROCm OpenCL ICD – provides the OpenCL runtime
- HIP runtime symlink (
/opt/rocm/hip) – most HIP-enabled software (including llama.cpp, which Ollama uses) expects libraries at/opt/rocm/hip. The tmpfiles rule creates this symlink automatically.
AMD uses the open-source amdgpu kernel driver, so no unfree packages are needed.
Configuration#
The configure action lets you tune AMD GPU settings after enabling:
gtt_size– Set GTT shared memory size in MB. Controls how much system RAM the GPU can access beyond the BIOS VRAM carveout. Set viaamdgpu.gttsizekernel parameter. Requires reboot.rocm_override– SetHSA_OVERRIDE_GFX_VERSIONsystem-wide. Required for most APUs and some older discrete GPUs to use ROCm/HIP.
Example: “set GTT size to 8192” or “set ROCm override to 11.0.0”
GPU compatibility#
ROCm officially supports GCN 3rd gen and newer architectures:
| Architecture | Series | ROCm support |
|---|---|---|
| GCN 4.0 (Polaris) | RX 400/500 | ROCm 5.6 and older |
| GCN 5.0 (Vega) | RX Vega | ROCm 5.6 and older |
| RDNA 1.0 (Navi 1x) | RX 5000 | Yes |
| RDNA 2.0 (Navi 2x) | RX 6000 | Yes |
| RDNA 3.0 (Navi 3x) | RX 7000 | Yes |
| RDNA 4.0 (Navi 4x) | RX 9000 | Yes |
For older Polaris/Vega cards, you may need to set HSA_OVERRIDE_GFX_VERSION to a supported target (e.g., HSA_OVERRIDE_GFX_VERSION=10.3.0 for RDNA 2 compatibility).
After enabling#
Unlike NVIDIA, the amdgpu kernel driver is open-source and often already loaded by the kernel. The enable action ensures it loads early (via initrd) and sets up the full ROCm compute stack. Verify with:
- “show GPU status” – should show VRAM, GTT, utilization, temperature, power draw, and Ollama GPU config
APUs (integrated AMD GPUs)#
AMD APUs (like those in Ryzen processors) have an integrated GPU that shares system RAM instead of having dedicated GDDR memory. Nixi detects APUs automatically and shows relevant details in the status output.
How APU memory works#
APU “VRAM” actually comes from two pools of system RAM:
- VRAM (UMA framebuffer) – A chunk of system RAM carved out exclusively for the GPU at boot. The OS cannot use this memory. Set in BIOS/UEFI under “UMA Frame Buffer Size” or “VRAM Size”. Typically 256 MB - 2 GB by default, some boards allow up to 8-16 GB.
- GTT (Graphics Translation Table) – Additional system RAM the GPU can access dynamically through the kernel driver. Shared with the OS. Set via kernel parameter
amdgpu.gttsize=XXXX(in MB), or use Nixi’sconfigure gtt_sizeaction.
Both pools are backed by system DDR memory. On a 64 GB system, you could allocate a large BIOS carveout (if the board supports it) and set a large GTT size to give the GPU access to significant memory. However, the total cannot exceed your physical RAM, and anything allocated to VRAM is unavailable to the OS.
Performance#
With ROCm enabled, APU inference is roughly 1.5-3x faster than CPU-only for token generation. The APU’s compute units handle matrix math in parallel, which beats pure CPU SIMD, but the bottleneck is memory bandwidth:
| Memory type | Typical bandwidth |
|---|---|
| DDR4-3200 (dual channel) | ~50 GB/s |
| DDR5-5600 (dual channel) | ~80 GB/s |
| GDDR6 (discrete RX 7600) | ~288 GB/s |
| GDDR6X (discrete RX 7900 XTX) | ~960 GB/s |
LLM inference is almost entirely memory-bandwidth-bound (loading model weights from memory is the bottleneck). An APU with DDR5 will be roughly 4-10x slower than a comparable discrete GPU, even with plenty of memory available.
The APU advantage is capacity over speed: with 64 GB of system RAM you can load models (like qwen3:30b-a3b at ~18 GB) that wouldn’t fit in most discrete GPUs’ VRAM, while still getting meaningful GPU acceleration over CPU-only inference.
ROCm on APUs#
ROCm does not officially support APUs, but many work with the HSA_OVERRIDE_GFX_VERSION environment variable. The install script auto-detects your APU and sets this automatically. If it can’t detect the correct value, it will prompt you to choose one.
| APU architecture | Example processors | Override value |
|---|---|---|
| Vega (GCN 5.0) | Ryzen 4000/5000 series | HSA_OVERRIDE_GFX_VERSION=9.0.0 |
| RDNA 2 | Ryzen 6000 series | HSA_OVERRIDE_GFX_VERSION=10.3.0 |
| RDNA 3 | Ryzen 7000/8000 series (780M, etc.) | HSA_OVERRIDE_GFX_VERSION=11.0.0 |
To set or change this manually on NixOS, you need both the system-wide variable and the Ollama service config (since the systemd service has its own environment):
environment.variables.HSA_OVERRIDE_GFX_VERSION = "11.0.0";
services.ollama.rocmOverrideGfx = "11.0.0";
services.ollama.environmentVariables = {
GPU_MAX_ALLOC_PERCENT = "100"; # Allow ROCm to use GTT shared memory
HSA_ENABLE_SDMA = "0"; # Avoid SDMA bugs on RDNA 3 iGPUs
};
Or use Nixi’s built-in tool: “set ROCm override to 11.0.0”
Model selection on APUs#
Because APUs share system RAM, model selection should be based on total system RAM (minus what the OS needs), not GPU VRAM. The install script accounts for this automatically. For example, a system with 64 GB of DDR5 can comfortably run qwen3:30b-a3b (~18 GB), which would not fit on most discrete GPUs with 8-16 GB VRAM.
Ollama loads model weights into the GPU-accessible memory pool (GTT + VRAM carveout). The model size is limited by available system RAM, not by a fixed VRAM amount – the APU will use as much system RAM as the model requires.
Monitoring#
Both tools provide a status action that shows runtime metrics without requiring any additional software:
NVIDIA reads from nvidia-smi:
- Driver and CUDA version
- Memory usage (used/total)
- GPU utilization percentage
- Temperature
AMD reads directly from sysfs (/sys/class/drm/):
- VRAM usage (used/total)
- GTT shared RAM usage (used/total)
- GPU utilization percentage
- Temperature
- Power draw (watts)
- APU detection with shared memory notes
Install script#
The install script auto-detects your GPU during setup:
- NVIDIA: Offers to install proprietary drivers and container toolkit
- AMD (discrete): Offers to configure amdgpu and ROCm HIP/OpenCL
- AMD APU: Detects shared memory (GTT), configures ROCm, auto-sets
HSA_OVERRIDE_GFX_VERSION, configures Ollama service withrocmOverrideGfxand GTT env vars (GPU_MAX_ALLOC_PERCENT,HSA_ENABLE_SDMA), and recommends models based on total system RAM instead of VRAM - No GPU: Warns that LLM inference will run on CPU only