GPU Deployments
Run GPU-accelerated workloads like ML training, inference, rendering, and video encoding on Kova.
Overview
Kova providers can offer GPU resources alongside CPU and memory. You request GPUs in your SDL manifest, and the marketplace matches you with providers that have the hardware you need. Pricing is set by providers and varies based on GPU model and availability.
Specifying GPU Resources
Add a gpu section to your resource profile:
version: "2.0"
services:
ml:
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
expose:
- port: 8888
as: 8888
to:
- global: true
profiles:
compute:
ml:
resources:
cpu:
units: 4
memory:
size: 16Gi
storage:
- size: 100Gi
gpu:
units: 1
attributes:
vendor:
nvidia:
deployment:
ml:
anywhere:
profile: ml
count: 1
GPU Attributes
You can optionally specify a model to target specific hardware:
gpu:
units: 1
attributes:
vendor:
nvidia:
- model: a100
- model: rtx4090
When multiple models are listed, providers with any of the listed models can bid on your deployment.
Available GPU Models
GPU availability depends on what providers bring online. Common models include:
| Category | Models |
|---|---|
| Data Center | NVIDIA A100, A10, L4, L40S, H100 |
| Consumer | NVIDIA RTX 4090, RTX 3090, RTX 3080 |
GPU availability varies throughout the day. Submit your deployment during off-peak hours (evenings and weekends UTC) for better pricing and faster matching.
Filtering Providers by GPU
When your deployment requests GPU resources, only providers with matching hardware will bid. In the dashboard:
- Submit your GPU-enabled SDL
- The bids tab will show only providers with matching GPUs
- Each bid shows the GPU model and price
- Accept the bid with the best price or specs for your workload
Common GPU Workloads
ML Training with PyTorch
services:
train:
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
env:
- NVIDIA_VISIBLE_DEVICES=all
params:
storage:
data:
mount: /workspace
source: uploads
Inference with Ollama
services:
ollama:
image: ollama/ollama:latest
expose:
- port: 11434
as: 11434
to:
- global: true
env:
- NVIDIA_VISIBLE_DEVICES=all
Jupyter Notebook
services:
jupyter:
image: jupyter/tensorflow-notebook:latest
expose:
- port: 8888
as: 8888
to:
- global: true
env:
- JUPYTER_TOKEN=your-token-here
CUDA and Driver Compatibility
GPU containers run with NVIDIA Container Runtime on the provider. Compatibility depends on the provider's driver version:
| CUDA Version | Minimum Driver |
|---|---|
| CUDA 12.x | 525.60+ |
| CUDA 11.8 | 520.61+ |
| CUDA 11.7 | 515.43+ |
Use container images with CUDA bundled (e.g. pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime) rather than relying on the host CUDA installation. This ensures compatibility across different providers.
Resource Recommendations
| Workload | GPU | CPU | Memory | Storage |
|---|---|---|---|---|
| Small model inference | 1x RTX 3080 | 4 | 8Gi | 50Gi |
| Large model inference | 1x A100 | 8 | 32Gi | 200Gi |
| Training (small) | 1x RTX 4090 | 8 | 16Gi | 100Gi |
| Training (large) | 1x A100 / H100 | 16 | 64Gi | 500Gi |
| Rendering | 1x RTX 4090 | 4 | 16Gi | 50Gi |