Setting Up Local Large Language Models on Windows: Ollama, RAG, and Opne-webui

This is draft version

With the everyday new large language model (LLM) or reasoning model (LRL), it is tedious to keep track of. As you can use chatgpt or deepseek chat online there are some caveats. You are limited by cost, and privacy. Thus, I thought of setting LLM locally in my windows machine. In this journey I learn a lot about nuance of LLM. First you can download many LLM that are open source (Llama, Deepseek, etc…) and other you can access via APIs with some cost (OpneAI’s chatGPT, Anthropic’s Claude, Google’s Bard, etc…). As I want to use only freely available models. Now, where do we start, we can use tools such as Ollama.cpp, Ollama, LMStudio for run LLM locally (for more details)

Based on some research I choose Ollama:

Ollama Website screenshow. — Screenshot of the Ollama website, showcasing its support for running large language models like Llama 3.3, DeepSeek-R1, Phi-4, Mistral, and Gemma 2 locally on macOS, Linux, and Windows.

Command	Description
`ollama run <model>`	Runs the specified model interactively.
`ollama pull <model>`	Downloads a model from the repository.
`ollama list`	Lists all available models on the system.
`ollama create <modelfile>`	Creates a new model from a Modelfile.
`ollama show <model>`	Displays details about a specific model.
`ollama push <model>`	Uploads a locally created model to a repository.
`ollama rm <model>`	Removes a specific model from the system.
`ollama serve`	Starts an API server for using models programmatically.
`ollama run --system <text>`	Runs a model with a system-level instruction.
`ollama help`	Displays help information for `ollama` commands.
`ollama version`	Display the installed Ollama version.
`ollama ps`	Show last running model memory usage (CPU/GPU).

  ██████╗ ██████╗ ███████╗███╗   ██╗    ██╗    ██╗███████╗██████╗ ██╗   ██╗██╗
 ██╔═══██╗██╔══██╗██╔════╝████╗  ██║    ██║    ██║██╔════╝██╔══██╗██║   ██║██║
 ██║   ██║██████╔╝█████╗  ██╔██╗ ██║    ██║ █╗ ██║█████╗  ██████╔╝██║   ██║██║
 ██║   ██║██╔═══╝ ██╔══╝  ██║╚██╗██║    ██║███╗██║██╔══╝  ██╔══██╗██║   ██║██║
 ╚██████╔╝██║     ███████╗██║ ╚████║    ╚███╔███╔╝███████╗██████╔╝╚██████╔╝██║
  ╚═════╝ ╚═╝     ╚══════╝╚═╝  ╚═══╝     ╚══╝╚══╝ ╚══════╝╚═════╝  ╚═════╝ ╚═╝

This my learnings of LLMs and setting up LLMs in my local machine.

Welcome to the definitive guide for Windows users looking to harness the power of Deepseek-R1 14B Q6 on a high-end PC. Whether you’re an AI researcher, developer, or hobbyist, this guide will transform your Intel i9-13900K, 64GB DDR5, and RTX 3060 12GB into a local LLM powerhouse. We’ll dive into Windows-specific setups, technical deep dives, and a head-to-head comparison with OpenAI’s models. Let’s get started!

A photograph of my Task Manager. — Configuration of my machine, running RTX 3060 (12 GB vRAM) and Intel i9 13900k CPU with 64GB RAM.

Why Deepseek-R1 14B Q6? A Technical Showdown vs. OpenAI

Before we dive into setup, let’s address the elephant in the room: How does Deepseek-R1 stack up against OpenAI’s GPT-4 or GPT-3.5? Here’s a detailed breakdown:

Feature	Deepseek-R1 14B Q6	OpenAI GPT-4	Winner
Model Size	14B parameters (6-bit quantized)	~1.8T parameters (proprietary)	OpenAI (scale)
Quantization Support	Yes (4-bit, 6-bit, 8-bit)	No (cloud-only, full precision)	Deepseek
Cost	Free (open-weight)	$0.03–$ 0.12 per 1k tokens	Deepseek
Customization	Fully customizable (fine-tuning possible)	Zero access to weights	Deepseek
Privacy	Fully local (no data leaks)	Cloud-based (API logging risks)	Deepseek
Hardware Requirements	Runs on consumer GPUs (e.g., RTX 3060 12GB)	Requires API access (no local)	Deepseek
RAG Support	Native integration with tools like AnythingLLM	Limited to API-based workarounds	Deepseek
Performance (MT-Bench)	8.32 (outperforms Llama2-13B and Mistral-7B)	8.99 (GPT-4)	OpenAI (margin)

Key Takeaway: Deepseek-R1 offers 95% of GPT-3.5’s performance at zero cost, with full control over privacy and customization. For advanced users, it’s a no-brainer.

Technical Deep Dive: What Makes Deepseek-R1 Tick?

Model Architecture

Layers: 40 transformer layers with grouped-query attention (GQA) for faster inference.
Context Window: 32k tokens (supports long-form tasks).
Training Data: 2 trillion tokens from diverse sources (books, code, scientific papers).
Quantization: 6-bit (Q6) reduces VRAM usage by 40% vs. full precision (FP16) with minimal accuracy loss.

Hardware Optimization

CPU: Intel i9-13900K’s 24 cores (8P+16E) excel at parallelizing inference tasks.
GPU: RTX 3060’s 12GB VRAM fits the entire 14B Q6 model (requires ~10GB VRAM).
RAM: DDR5’s 5600MT/s bandwidth ensures rapid data loading.

Step-by-Step Windows Setup

1. Installing Ollama (Windows Edition)

Ollama simplifies local LLM management. Here’s how to set it up on Windows:

Download the Windows Build: Visit Ollama’s GitHub and download the latest .exe installer.
Install via PowerShell:
```
winget install Ollama.Ollama
```
Start the Ollama Service:
```
ollama serve
```
Keep this running in the background.

Pull Deepseek-R1 14B Q6:

ollama pull deepseek-r1-14b-q6
# you can use `huggingface` or `ollama` website to find model of your choice.

Run the Model with Verbose Logging:
```
ollama run deepseek-r1-14b-q6 --verbose
```
The --verbose flag shows token generation speed and GPU/CPU utilization.

3. Open WebUI: ChatGPT-Style Interface

Transform Ollama into a web-based chatbot with document upload support.

Install Docker Desktop:

Enable WSL2 or Hyper-V in Windows Features (WSL2 recommended).
Download Docker Desktop.

Run Open WebUI:

docker run -d -p 3000:8080 --name open-webui --restart always openwebui/open-webui:latest

An image of docker desktop homepage — Screenshot of Docker Desktop showing a running container named ‘open-webui’ with minimal CPU and memory usage.

Access at http://localhost:3000:

In Settings, set Ollama Base URL to http://localhost:11434.
Select deepseek-r1-14b-q6 and start chatting!

Open-webui application windows — Screenshot of Open WebUI running on localhost:3000, featuring the Qwen 2.5:14b model with web search and code interpreter options.

Simple use of the opne-webui are plenty.

Conclusion

With this guide, you’ve unlocked the full potential of your Windows machine, turning it into a privacy-first, cost-free alternative to OpenAI. Deepseek-R1 14B Q6 isn’t just a model—it’s a statement against closed-source AI monopolies. Now go forth and build, innovate, and experiment. The future of open-weight AI is in your hands. 🚀

Why Deepseek-R1 14B Q6? A Technical Showdown vs. OpenAI#

Technical Deep Dive: What Makes Deepseek-R1 Tick?#

Model Architecture#

Hardware Optimization#

Step-by-Step Windows Setup#

1. Installing Ollama (Windows Edition)#

3. Open WebUI: ChatGPT-Style Interface#

Conclusion#

References & Tools#