Infrastructure · For developers

The hardware
behind AI.

The computer, the graphics card, model precision, and the data stack. These are the cornerstones when you start running AI on your own hardware — for on-premise solutions, local model training, or serious image generation. Not in English. Not in marketing speak.

For the developer track · If you're plugging into AI for the first time — start in Glossary instead.
Part 1

The computer itself

Three things determine if a computer can run AI locally: the processor, the graphics card and the graphics card's memory[. The rest — RAM, SSD, cooling — matters, but it is these three that determine which models you can load and how quickly they respond.]

CPU (Central Processing Unit)

The computer's main processor. It handles the operating system, applications, and everything that isn't specialized. For AI, the CPU is secondary — inference itself happens on the GPU — but a weak CPU can bottleneck data processing, data input, and Python code surrounding the model. AMD 9950X3D and Intel Core Ultra are the top CPUs of 2026 for AI workstations.

Practical

Daniel runs an AMD Ryzen 9 9950X3D in his master node. 3D V-Cache helps with compiling and data distilling — but for pure AI inference, a cheaper CPU would yield the same results. Put the budget toward the GPU, not the CPU.

GPU (Graphics Processing Unit)

The graphics card. Designed to run thousands of calculations in parallel — which happens to be exactly what AI models need. A modern GPU is an AI accelerator first, and a graphics card second. NVIDIA dominates the AI market thanks to the CUDA ecosystem; AMD and Intel are chasing close behind but still lag behind in software support.

Practical

For local AI in 2026: NVIDIA RTX 5090 (consumer flagship), NVIDIA RTX 6000 Ada (pro), or an H100/H200 if the budget is unlimited. Never buy an AMD GPU for AI unless you know exactly what you're doing — you'll be fighting with ROCm the whole time.

VRAM (Video RAM)

The built-in memory on a graphics card. The single most important specification for AI. A model must fit into the VRAM to be able to run — if it doesn't fit, it simply won't run. Llama 3.1 70B in full precision requires 140 GB VRAM. Quantized to 4-bit, it fits on 40 GB. The RTX 5090 has 32 GB, the RTX 4090 has 24 GB.

Practical

Rule of thumb: count ~2 GB VRAM per billion parameters in 4-bit. A 7B model fits on 8 GB. A 32B model needs ~20 GB. A 70B model needs ~40 GB. If you need to run larger models — you must either sacrifice precision (quantization) or buy multiple GPUs.

Part 2

AI workstations

When a gaming PC is no longer enough. Specially designed computers for AI development — with massive amounts of VRAM and unified memory that allow models far beyond what an RTX 5090 can handle. This is where production builds, training, and on-premise delivery live.

NVIDIA RTX 5090

The world's fastest consumer graphics card in 2026. 32 GB GDDR7 VRAM, Blackwell architecture. The sweet spot for developers who want to run 32B models locally without moving to professional-grade hardware. ~25,000-30,000 SEK depending on the model. Sufficient for local Llama 3.1 70B (4-bit), Flux Dev image generation, Whisper Large-v3 transcription, and Wan video generation.

VRAM32 GB GDDR7
ArchitectureBlackwell
TDP575 W
Price (2026)~25,000-30,000 SEK

NVIDIA DGX Spark

NVIDIA's AI workstation for developers — a compact "AI computer" with GB10 architecture (Grace Blackwell) and 128 GB unified memory. Designed to run models up to ~200B parameters locally without a data center budget. The middle ground between a high-end PC and an RTX 6000 rig. Delivered at around 30,000-40,000 SEK.

Memory128 GB unified
ArchitectureGB10 (Grace Blackwell)
Cluster barYes (ConnectX-7)
Price (2026)~30,000-40,000 SEK
Practical

For Hrafninn 2026: DGX Spark replaces a previous slave cluster and becomes a dedicated LLM node. The Master PC with RTX 5090 remains for image and video generation. Two separate nodes are often better than one monster rig — you avoid fighting over VRAM between jobs.

GB10 / Grace Blackwell

NVIDIA's new system architecture powering DGX Spark and similar workstations. Grace is the CPU part (ARM-based, designed for AI workloads), Blackwell is the GPU part. Together they are called GB10. The key feature: unified memory — CPU and GPU share the same memory instead of copying data back and forth. The result is that large models fit into a shared memory pool.

Practical

This is the architecture that allows a 128 GB workstation to run models that previously required 4-8 separate graphics cards. For teams who don't want to build a data center but need to run 70B+ models, this is the shortcut.

AMD Ryzen 9 9950X3D

AMD's flagship consumer CPU for 2026. 16 cores, 32 threads, 3D V-Cache. For AI: never a bottleneck, but an excellent all-around CPU for a development machine. A better choice than Intel if you also want to game or compile heavy projects in parallel.

Part 3

Model precision

How many decimal places a model's parameters are stored with. Lower precision = smaller file size = faster execution = lower VRAM requirements, but also (potentially) lower quality. This is the trick that lets you run a 70B model on a gaming PC instead of a supercomputer.

FP16 (16-bit Floating Point)

The standard format for AI models. Each parameter is stored as a 16-bit floating point. Good balance between precision and size — it's what most models are trained in. A 7B model in FP16 takes about 14 GB. For full precision, there is FP32 (32-bit), but it is almost never used in inference anymore — it's twice as large without a noticeable gain in quality.

NVFP4 (NVIDIA 4-bit Floating Point)

NVIDIA's new 4-bit format specifically designed for the Blackwell architecture (RTX 50 series and DGX Spark). Compresses the model size to one-fourth of FP16 with almost no loss in quality — and Blackwell hardware runs it natively, providing a dramatic speed increase compared to standard 4-bit quantization.

Practical

For Llama 3.1 70B on RTX 5090: FP16 requires 140 GB (impossible). 4-bit GGUF requires 40 GB (barely possible with swap). NVFP4 fits in ~35 GB and runs 2-3x faster than standard 4-bit. It is the format that allows you to run a frontier model on a gaming PC.

Quantization (4-bit, 8-bit)

The general method for compressing models from FP16 to 8-bit, 4-bit, or lower. The GGUF format (llama.cpp) is the de facto standard for distribution. Different variants — Q4_K_M, Q5_K_S, Q8_0 — represent different trade-offs between size, speed, and quality. See also Quantization in the Roadmap.

Part 4

Data Handling on GPU

AI isn't just models — it's also the data processing around the models. When working with millions of rows, billions of transactions, or large document collections, Pandas on a CPU is not enough. NVIDIA's RAPIDS suite moves the entire data pipeline to the GPU.

Apache Spark

The industry standard for large-scale data processing (Big Data). Open source, runs distributed across a cluster. Handles datasets that don't fit on a single machine by distributing the work across many. Used in financial institutions, telecommunications, e-commerce — everywhere where data is measured in terabytes or petabytes.

Practical

Spark traditionally runs on CPU clusters. With NVIDIA RAPIDS Accelerator for Spark, the same jobs can run 5-50x faster on GPUs — without changing the code. Large Swedish companies like Klarna and Spotify run Spark in production.

NVIDIA RAPIDS

An open source software suite from NVIDIA that moves classic data processing from CPU to GPU. Designed as a drop-in replacement for well-known Python libraries — you replace import pandas with import cudf and often get 10-100x speedup. Consists of several modules: cuDF, cuML, cuGraph, cuSpatial.

cuDF (GPU Pandas)

The RAPIDS module that replaces Pandas. Same API — DataFrames, filter, groupby, joins — but runs on the GPU. For data volumes over ~1 GB, the difference is huge. For small datasets (under 100 MB), Pandas is actually faster because the GPU overhead doesn't pay off.

cuML (GPU Scikit-Learn)

The RAPIDS module that replaces Scikit-Learn for classic machine learning (random forests, k-means, PCA, regression). Not deep learning — that is the domain of PyTorch and TensorFlow. cuML is for "traditional" ML that you do before or after the LLM step in a pipeline.

cuGraph (GPU graph analysis)

The RAPIDS module for network and graph analysis on GPU. Finds patterns in transaction flows, social networks, supply chains. Common use cases: fraud detection (which accounts are linked?), recommendation systems, knowledge graphs.

Pandas & Scikit-Learn

The classic Python libraries for data processing and traditional machine learning. Pandas (data manipulation) and Scikit-Learn (ML algorithms) are the foundation of the Python ecosystem — they existed long before AI became a consumer product and will continue to be there long after. Runs on CPU. RAPIDS is NVIDIA's GPU-accelerated versions of these — designed as drop-in replacements for large datasets.

Part 5

Networks in AI clusters

When a single workstation is no longer enough — and you start connecting multiple machines into a cluster. Standard Ethernet won't cut it then. AI training and large inference jobs require extreme bandwidth and low latency between nodes, otherwise the network becomes the bottleneck.

ConnectX-7 (QSFP112)

NVIDIA's latest high-speed network card, designed for AI clusters. Delivers 400 Gbps over QSFP112 ports — roughly 40 times faster than standard home fiber. Used to connect DGX nodes so they can share training or inference across multiple machines without the network becoming the bottleneck.

Practical

For 99% of developers, ConnectX-7 is overkill. It becomes relevant when you cluster two or more DGX Spark or H100 nodes — then the interconnect must be fast enough that it doesn't slow down the matrix multiplication being distributed. Think: an AI cluster is only as fast as its slowest link.

Mon · Wed · Fri

Polaris in your inbox —
Monday, Wednesday, Friday — always free.

Deeper analyses, practical walkthroughs, and the latest from the AI field — in plain English. For those who want to understand more than just the headlines.

Free forever · cancel anytime