← Articles
Infrastructure · Data Protection · May 2026

Local AI models:
When you no longer want to send
your data to the USA

Ollama, open models, and local hardware make it possible to run powerful AI directly on your server — without a single word leaving the office. It is no longer a solution just for researchers.

The game-changer

In late 2023, something happened that most business owners missed entirely. Meta, Mistral AI, and a dozen research groups released their model weights openly — meaning the actual AI brain, not just an API to call. Suddenly, anyone could download a model performing close to GPT-3.5 and run it on their own computer.

It sounded academic. It wasn't. It was the starting point for a movement that now, over a year later, allows medium-sized Swedish companies to systematically build AI infrastructure without sending a single word to OpenAI, Google, or Anthropic.

The most important piece of that puzzle is called Ollama.


What is Ollama — and why does it matter?

Ollama is an open tool that lets you download and run large language models locally with a single command. You install it on your server or workstation, type ollama run llama3 and have a functioning AI system that responds in milliseconds — offline, without subscription fees and without data leaving your machine.

It is not a product you pay for. It is an infrastructure tool, much like Docker — you install it, configure it, and then it's just there. It runs as a local API server that you then connect to your own applications.

Once configured, local AI is invisible infrastructure— like electricity in the wall, but for intelligence.

The models you can run via Ollama include Llama 3 (Meta), Mistral and Mixtral (Mistral AI), Qwen (Alibaba), Phi-3 (Microsoft) and a growing list of specialized variants. There are models optimized for code, for medical text, for legal documents, for Swedish—and combinations thereof.


Why do Swedish companies choose local AI?

There are three reasons that come up time and again when I talk to Swedish business leaders about this. They aren't technical. They are business-oriented.

1. GDPR and data minimization

Every time you send data to ChatGPT, Claude, or Gemini, you are sending data to a server in the USA. It isn't illegal—but it requires you to be careful about what data you send, to have a DPA (Data Processing Agreement) with the provider, and to be able to explain to your customers what you do with their information.

With local AI, the answer is trivially simple: data does not leave the office. There is nothing to explain. There is no third party. There is no provider changing terms in their acceptable use policy without warning. Especially for companies handling sensitive information— health data, legal documents, financial information, trade secrets— this is an argument that is hard to ignore.

2. Cost control at high volume

API prices for cloud-based AI look reasonable per request. They stop looking reasonable when you calculate 100,000 requests per month. Claude Sonnet costs about 3 dollars per million input tokens. A local Llama 3 instance on a used workstation with an RTX 4090 costs electricity—about 50 öre per hour. For continuous workflows, the math isn't complicated.

It is not always the right choice. If you run AI rarely and need top-5% performance every time, the cloud is still best. But for high-volume, lower-stakes tasks—classification, summarization, data normalization, internal search—local AI is often cheaper per request after 3–6 months.

3. Control and customization

With a local model, you can fine-tune on your own data. You can add domain-specific terminology, train the model to follow your specific formats and rules, and build something that actually understands your industry—not just general text. This is not possible with the large cloud models unless you are a Fortune 500 company with an Enterprise agreement.


What do you need to run it?

The short answer: it depends on which model and how critical the response time is. The long answer is in the table below.

Small model (7B parameters) — suitable for:

Example: Llama 3.2 8B, Mistral 7B, Phi-3 Mini

Requirement: 8 GB RAM (CPU inference) or 8 GB VRAM (GPU inference)

Performance: ~15–30 tokens/second on CPU, 60–120 on GPU

Suitable for: summarization, classification, simple Q&A

Not for: complex reasoning, code generation, legal analysis

Medium-sized model (30–70B parameters) — suitable for:

Example: Llama 3.1 70B, Qwen2.5 32B, Mixtral 8x7B

Requirement: 48–96 GB RAM (CPU) or 24–48 GB VRAM (GPU/multi-GPU)

Performance: 5–15 tokens/second on CPU, 30–80 on GPU

Suitable for: complex analysis, code generation, longer documents

Hardware: A good workstation with an RTX 4090 (24 GB) is sufficient for 32B

A practical starting scenario for a small to medium-sized business: a used HP Z8 workstation with dual Xeon, 256 GB RAM and a NVIDIA RTX 4090 costs approximately 30,000–50,000 SEK. It handles Qwen2.5 32B excellently, with 24h/7d operation, and pays for itself compared to cloud prices if you process more than about 500 documents per day.


Which models perform best for Swedish texts?

It is a common question — and the answer has changed rapidly over the last year. Early open models were incredibly poor at Swedish. That has changed significantly.

Recommended models for Swedish (May 2026)

Best performance relative to size

Qwen2.5:32b — excellent Swedish, strong reasoning, requires GPU with 24 GB VRAM

Mistral-Nemo:12b — good Swedish, fast, works with 16 GB VRAM or 32 GB RAM

Llama3.1:8b — acceptable Swedish, low requirements, good starting point

Specialized alternatives

Codestral (code in Swedish): commands and comments in Swedish

Nomic-embed-text (embeddings): searching in Swedish documents, RAG pipeline

The most important thing for Swedish: avoid models under the 7B class if quality in Swedish is a requirement. Smaller models tend to drift toward English or mix in anglicisms in a way that looks unprofessional in a business context.


Four concrete use cases for Swedish SME

1. Internal document search (RAG)

You have 10 years of internal routine descriptions, quotes, minutes, and project documents. No one can find anything. With a local RAG pipeline (Retrieval-Augmented Generation) — an embedding model + your local LLM + a vector database — anyone can ask questions in natural Swedish and get correct answers from your actual documents. Everything stays with you.

2. Automatic categorization of incoming email

A small locally running model can read subject line + sender + the first 200 words of every email and categorize it as "order", "complaint", "supplier invoice" or "other" — with 90–95% accuracy. It doesn't require GPT-4. It requires Mistral 7B and a couple of hours of configuration.

3. Contract analysis without outside eyes

No law firms want you copy-pasting their drafts into ChatGPT. With a local model, you can analyze contracts, identify unusual clauses, compare against templates, and flag deviations — without a single word leaving the company. Not replacing the lawyer, but a first filter that saves billable hours.

4. Customer service bot with your product catalog

Fine-tune a small model on your product descriptions, common questions, and return policy. Answer 80% of incoming chat queries automatically — in Swedish, without latency, without cost per interaction, and without sharing your customer conversations with a third party.


What local AI is not suitable for

It is important to be honest about the limitations. Local AI is not always the right choice, and real infrastructure planning requires you to understand the trade-offs.

Tasks requiring top performance: The most difficult reasoning tasks — complex legal arguments, advanced medical diagnostics, mathematical proofs — still require the large cloud models. A local 32B model is impressive but not in the same league as GPT-4o or Claude 3.5 Sonnet for truly complex multi-step problems.

Multimodal input: If you need to analyze images, redraw diagrams, or understand complex tables from scanned PDFs, the vision capabilities of cloud models are still superior to most local alternatives.

Low volume + high requirements: If you run AI ten times a week and need the best possible answer every time — pay for the cloud. It is cheaper and simpler.

Local AI is not for everyone. It is for those who have the volume, data protection requirements, or the need for control that justifies the investment.


How to start — in three steps

Step 1: Install Ollama and test locally

Terminal (Mac/Linux/Windows WSL)

curl -fsSL https://ollama.ai/install.sh | sh

ollama run qwen2.5:7b

# Type your first question — the model will download automatically (~5 GB)

It takes about ten minutes. You now have a working local AI system. Try asking questions about your own work — see how it performs compared to ChatGPT for your specific typical tasks.

Step 2: Identify a high-volume task

Choose a task you do often and repetitively: categorize customer inquiries, summarize meeting minutes, extract data from PDFs. It doesn't have to be the most complex task — on the contrary, simple repetitive tasks are those that provide the best ROI with local AI.

Step 3: Build an API call, not an integration

Ollama exposes a REST API on port 11434. Your existing web app, your Python script, or your Node backend can call it just like it calls OpenAI — often with just a rename of a URL variable. This is intentional: Ollama implements OpenAI's API format.

Python — just change the base_url

from openai import OpenAI

client = OpenAI(

  base_url="http://localhost:11434/v1",

  api_key="ollama" # dummy, not required

)

response = client.chat.completions.create(

  model="qwen2.5:7b",

  messages=[{"role": "user", "content": "Summarize: ..."}]

)


Local AI is no longer a solution just for nerds with servers in the basement. It is a serious business strategy for Swedish companies that want control, predictable costs, and a GDPR-compliant setup that doesn't require legal consultation. Get started with Ollama, test a real use case, and see if the math works for you.

If you want help building a local AI infrastructure tailored to your company — contact me directly.

Mon · Wed · Fri

Polaris in your inbox —
Monday, Wednesday, Friday — always free.

The most important things in AI, filtered for Swedish entrepreneurs. No noise. Just what actually matters.

Free forever · cancel anytime