This is draft version
With the everyday new large language model (LLM) or reasoning model (LRL), it is tedious to keep track of. As you can use chatgpt or deepseek chat online there are some caveats. You are limited by cost, and privacy. Thus, I thought of setting LLM locally in my windows machine. In this journey I learn a lot about nuance of LLM. First you can download many LLM that are open source (Llama, Deepseek, etc…) and other you can access via APIs with some cost (OpneAI’s chatGPT, Anthropic’s Claude, Google’s Bard, etc…). As I want to use only freely available models. Now, where do we start, we can use tools such as Ollama.cpp, Ollama, LMStudio for run LLM locally (for more details)
Based on some research I choose Ollama:

Screenshot of the Ollama website, showcasing its support for running large language models like Llama 3.3, DeepSeek-R1, Phi-4, Mistral, and Gemma 2 locally on macOS, Linux, and Windows.
Command | Description |
---|---|
ollama run <model> |
Runs the specified model interactively. |
ollama pull <model> |
Downloads a model from the repository. |
ollama list |
Lists all available models on the system. |
ollama create <modelfile> |
Creates a new model from a Modelfile. |
ollama show <model> |
Displays details about a specific model. |
ollama push <model> |
Uploads a locally created model to a repository. |
ollama rm <model> |
Removes a specific model from the system. |
ollama serve |
Starts an API server for using models programmatically. |
ollama run --system <text> |
Runs a model with a system-level instruction. |
ollama help |
Displays help information for ollama commands. |
ollama version |
Display the installed Ollama version. |
ollama ps |
Show last running model memory usage (CPU/GPU). |
██████╗ ██████╗ ███████╗███╗ ██╗ ██╗ ██╗███████╗██████╗ ██╗ ██╗██╗
██╔═══██╗██╔══██╗██╔════╝████╗ ██║ ██║ ██║██╔════╝██╔══██╗██║ ██║██║
██║ ██║██████╔╝█████╗ ██╔██╗ ██║ ██║ █╗ ██║█████╗ ██████╔╝██║ ██║██║
██║ ██║██╔═══╝ ██╔══╝ ██║╚██╗██║ ██║███╗██║██╔══╝ ██╔══██╗██║ ██║██║
╚██████╔╝██║ ███████╗██║ ╚████║ ╚███╔███╔╝███████╗██████╔╝╚██████╔╝██║
╚═════╝ ╚═╝ ╚══════╝╚═╝ ╚═══╝ ╚══╝╚══╝ ╚══════╝╚═════╝ ╚═════╝ ╚═╝
This my learnings of LLMs and setting up LLMs in my local machine.
Welcome to the definitive guide for Windows users looking to harness the power of Deepseek-R1 14B Q6 on a high-end PC. Whether you’re an AI researcher, developer, or hobbyist, this guide will transform your Intel i9-13900K, 64GB DDR5, and RTX 3060 12GB into a local LLM powerhouse. We’ll dive into Windows-specific setups, technical deep dives, and a head-to-head comparison with OpenAI’s models. Let’s get started!

Configuration of my machine, running RTX 3060 (12 GB vRAM) and Intel i9 13900k CPU with 64GB RAM.
Why Deepseek-R1 14B Q6? A Technical Showdown vs. OpenAI
Before we dive into setup, let’s address the elephant in the room: How does Deepseek-R1 stack up against OpenAI’s GPT-4 or GPT-3.5? Here’s a detailed breakdown:
Feature | Deepseek-R1 14B Q6 | OpenAI GPT-4 | Winner |
---|---|---|---|
Model Size | 14B parameters (6-bit quantized) | ~1.8T parameters (proprietary) | OpenAI (scale) |
Quantization Support | Yes (4-bit, 6-bit, 8-bit) | No (cloud-only, full precision) | Deepseek |
Cost | Free (open-weight) | 0.12 per 1k tokens | Deepseek |
Customization | Fully customizable (fine-tuning possible) | Zero access to weights | Deepseek |
Privacy | Fully local (no data leaks) | Cloud-based (API logging risks) | Deepseek |
Hardware Requirements | Runs on consumer GPUs (e.g., RTX 3060 12GB) | Requires API access (no local) | Deepseek |
RAG Support | Native integration with tools like AnythingLLM | Limited to API-based workarounds | Deepseek |
Performance (MT-Bench) | 8.32 (outperforms Llama2-13B and Mistral-7B) | 8.99 (GPT-4) | OpenAI (margin) |
Key Takeaway: Deepseek-R1 offers 95% of GPT-3.5’s performance at zero cost, with full control over privacy and customization. For advanced users, it’s a no-brainer.
Technical Deep Dive: What Makes Deepseek-R1 Tick?
Model Architecture
- Layers: 40 transformer layers with grouped-query attention (GQA) for faster inference.
- Context Window: 32k tokens (supports long-form tasks).
- Training Data: 2 trillion tokens from diverse sources (books, code, scientific papers).
- Quantization: 6-bit (Q6) reduces VRAM usage by 40% vs. full precision (FP16) with minimal accuracy loss.
Hardware Optimization
- CPU: Intel i9-13900K’s 24 cores (8P+16E) excel at parallelizing inference tasks.
- GPU: RTX 3060’s 12GB VRAM fits the entire 14B Q6 model (requires ~10GB VRAM).
- RAM: DDR5’s 5600MT/s bandwidth ensures rapid data loading.
Step-by-Step Windows Setup
1. Installing Ollama (Windows Edition)
Ollama simplifies local LLM management. Here’s how to set it up on Windows:
-
Download the Windows Build: Visit Ollama’s GitHub and download the latest
.exe
installer. -
Install via PowerShell:
winget install Ollama.Ollama
-
Start the Ollama Service:
ollama serve
Keep this running in the background.
-
Pull Deepseek-R1 14B Q6:
ollama pull deepseek-r1-14b-q6 # you can use `huggingface` or `ollama` website to find model of your choice.
-
Run the Model with Verbose Logging:
ollama run deepseek-r1-14b-q6 --verbose
The
--verbose
flag shows token generation speed and GPU/CPU utilization.
3. Open WebUI: ChatGPT-Style Interface
Transform Ollama into a web-based chatbot with document upload support.
- Install Docker Desktop:
- Enable WSL2 or Hyper-V in Windows Features (WSL2 recommended).
- Download Docker Desktop.
- Run Open WebUI:
docker run -d -p 3000:8080 --name open-webui --restart always openwebui/open-webui:latest

Screenshot of Docker Desktop showing a running container named ‘open-webui’ with minimal CPU and memory usage.
- Access at
http://localhost:3000
:
- In Settings, set Ollama Base URL to
http://localhost:11434
. - Select
deepseek-r1-14b-q6
and start chatting!

Screenshot of Open WebUI running on localhost:3000, featuring the Qwen 2.5:14b model with web search and code interpreter options.

A LLM running with 30+ token/second. Fairly good for a 14B parameter model model.
Simple use of the opne-webui are plenty.
Conclusion
With this guide, you’ve unlocked the full potential of your Windows machine, turning it into a privacy-first, cost-free alternative to OpenAI. Deepseek-R1 14B Q6 isn’t just a model—it’s a statement against closed-source AI monopolies. Now go forth and build, innovate, and experiment. The future of open-weight AI is in your hands. 🚀