Deepseek R1 Enterprise Local Deployment Complete Manual

1.2K 0

I. Introduction

Deepseek R1 is a high-performance general-purpose large language model that supports complex reasoning, multimodal processing, and technical document generation. This manual provides a complete local deployment guide for technical teams, covering hardware configurations, domestic chip adaptations, quantization solutions, heterogeneous solutions, cloud alternatives and deployment methods for the complete 671B MoE model.

II. Core configuration requirements for local deployment

1. Table of model parameters and hardware correspondence

Model parameters (B)	Windows Configuration Requirements	Mac Configuration Requirements	Applicable Scenarios
1.5B	- RAM: 4GB- GPU: Integrated Graphics/Modern CPU- Storage: 5GB	- Memory: 8GB (M1/M2/M3) - Storage: 5GB	Simple text generation, basic code completion
7B	- RAM: 8-10GB- GPU: GTX 1680 (4-bit quantized)- Storage: 8GB	- Memory: 16GB (M2 Pro/M3) - Storage: 8GB	Medium Complexity Quiz, Code Debugging
8B	- RAM: 16GB - GPU: RTX 4080 (16GB VRAM) - Storage: 10GB	- Memory: 32GB (M3 Max) - Storage: 10GB	Medium complexity reasoning, document generation
14B	- RAM: 24GB- GPU: RTX 3090 (24GB VRAM)	- Memory: 32GB (M3 Max) - Storage: 20GB	Complex reasoning, technical documentation generation
32B	Enterprise deployment (requires multiple cards in parallel)	Not supported at this time	Scientific computing, large-scale data processing
70B	Enterprise deployment (requires multiple cards in parallel)	Not supported at this time	Large-scale reasoning, ultra-complex tasks
671B	Enterprise deployment (requires multiple cards in parallel)	Not supported at this time	Ultra-large-scale research computing, high-performance computing

2. Analysis of computing power requirements

model version	Parameter (B)	calculation accuracy	Model size	VRAM Requirements (GB)	Reference GPU Configuration
DeepSeek-R1	671B	FP8	~1,342GB	≥1,342GB	Multi-GPU configurations (e.g. NVIDIA A100 80GB * 16)
DeepSeek-R1-Distill-Llama-70B	70B	BF16	43GB	~32.7GB	Multi-GPU configurations (e.g. NVIDIA A100 80GB * 2)
DeepSeek-R1-Distill-Qwen-32B	32B	BF16	20GB	~14.9GB	Multi-GPU configurations (e.g. NVIDIA RTX 4090 * 4)
DeepSeek-R1-Distill-Qwen-14B	14B	BF16	9GB	~6.5GB	NVIDIA RTX 3080 10GB or higher
DeepSeek-R1-Distill-Llama-8B	8B	BF16	4.9GB	~3.7GB	NVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Qwen-7B	7B	BF16	4.7GB	~3.3GB	NVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Qwen-1.5B	1.5B	BF16	1.1GB	~0.7GB	NVIDIA RTX 3060 12GB or higher

Additional Notes:

VRAM RequirementsThe VRAM requirements listed in the table are minimum requirements, and it is recommended that 20%-30% of additional video memory be reserved for actual deployments to handle peak demand during model loading and operation.
Multi-GPU Configuration: For large-scale models (e.g. 32B+), it is recommended to use multiple GPUs in parallel to improve computational efficiency and stability.
calculation accuracy: FP8 and BF16 are the current mainstream high-efficiency computational accuracies, which can guarantee the model performance while reducing the graphics memory usage.
Applicable Scenarios: Models with different parameter scales are suitable for tasks of different complexity, and users can choose the appropriate model version according to their actual needs.
Enterprise Deployment: For very large-scale models such as 671B, it is recommended that a professional-grade GPU cluster (such as the NVIDIA A100) be deployed to meet high-performance computing requirements.

III. Domestic Chip and Hardware Adaptation Program

1. Domestic eco-partnership dynamics

corporations	Adaptation content	Performance Benchmarking (vs NVIDIA)
Huawei Rise	The Rise 910B natively supports the full R1 family and provides end-to-end inference optimization.
Mu Xi GPU	MXN series support 70B model BF16 inference, video memory utilization increased by 30%	RTX 3090 equivalent
Sea Light DCU	Adapts to V3/R1 models, performance against NVIDIA A100	Equivalent A100 (BF16)

2. Recommended configuration for national hardware

model parameter	Recommended Programs	Applicable Scenarios
1.5B	Taichu T100 Accelerator Card	Individual developer prototype validation
14B	Kunlun Core K200 Cluster	Enterprise-level complex task reasoning
32B	Wallchurch Computing Power Platform + Rise 910B Cluster	Scientific Computing and Multimodal Processing

IV. Cloud deployment alternatives

1. Recommended domestic cloud service providers

flat-roofed building	Core Advantages	Applicable Scenarios
Silicon-based flow	Officially recommended API, low latency, multimodal model support	Enterprise-class high-concurrency reasoning
Tencent cloud	One-click deployment + free trial for a limited time with VPC privatization support	Small and Medium Scale Models Go Live Quickly
PPIO Paio Cloud	1/20th of the price of OpenAI, 50 million free with registration tokens	Low-cost tasting and testing

2. International access channels (requires magic or foreign enterprise Internet access)

NVIDIA NIM: Enterprise GPU Cluster Deployment (link)
Groq: ultra-low latency reasoning (link)

V. Ollama+Unsloth deployment

1. Quantification scheme and model selection

quantized version	file size	Minimum RAM + VRM Requirements	Applicable Scenarios
DeepSeek-R1-UD-IQ1_M	158GB	≥200GB	Consumer-grade hardware (e.g., Mac Studio)
DeepSeek-R1-Q4_K_M	404 GB	≥500GB	High Performance Servers/Cloud GPUs

Download Address:

HuggingFace Model Library
Unsloth AI Official Description

2. Hardware configuration recommendations

Hardware type	Recommended Configurations	Performance performance (short text generation)
Consumer-grade devices	Mac Studio (192GB unified memory)	10+ tokens/second
High-performance servers	4 RTX 4090 (96GB video memory + 384GB RAM)	7-8 tokens/second (mixed reasoning)

3. Deployment steps (Linux example)

1. Installation of dependent tools:

# 安装 llama.cpp（用于合并分片文件）
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install llama.cpp

2. Download and merge model slices:

llama-gguf-split --merge DeepSeek-R1-UD-IQ1_M-00001-of-00004.gguf DeepSeek-R1-UD-IQ1_S.gguf

3. Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

4. Create the Modelfile:

FROM /path/to/DeepSeek-R1-UD-IQ1_M.gguf
PARAMETER num_gpu 28  # 每块 RTX 4090 加载 7 层（共 4 卡）
PARAMETER num_ctx 2048
PARAMETER temperature 0.6
TEMPLATE "<|end▁of▁thinking $|>{{{ .Prompt }}}<|end▁of▁thinking|>"

5. Run the model:

ollama create DeepSeek-R1-UD-IQ1_M -f DeepSeekQ1_Modelfile

4. Performance tuning and testing

Low GPU utilization: Upgrade high-bandwidth memory (e.g. DDR5 5600+).
Extended Swap Space::

sudo fallocate -l 100G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Full blood 671B Deployment order

VLLM::

vllm serve deepseek-ai/deepseek-r1-671b --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager

SGLang::

python3 -m sglang.launch_server --model deepseek-ai/deepseek-r1-671b --trust-remote-code --tp 2

VI. Notes and Risks

1. Cost alerts:

70B Model: Requires 3 or more 80G RAM graphics cards (e.g. RTX A6000), not feasible for single card users.
671B Model: Requires 8xH100 clusters for supercomputing center deployments only.

2. Alternative programs:

Individual users are recommended to use cloud-based APIs (e.g., Silicon Mobility), which are maintenance-free and compliant.

3. National hardware compatibility:

A customized version of the framework is required (e.g., Rise CANN, Mu Xi MXMLLM).

VII. Appendix: Technical support and resources

Huawei Rise: Rise Cloud Services
Mu Xi GPU: Free API Experience
Lee Seok Han Blog: Full deployment tutorial

VIII. Heterogeneous GPUStack solutions

GPUStack Open Source Project

https://github.com/gpustack/gpustack/

Model Resource Measurement Tool

GGUF Parser(https://github.com/gpustack/gguf-parser-go) is used to manually calculate the video memory requirements.

GPUStack

DeepSeek Full Platform Private Deployment

Model	Context Size	VRAM Requirement	Recommended GPUs
R1-Distill-Qwen-1.5B (Q4_K_M)	32K	2.86 GiB	RTX 4060 8GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-1.5B (Q8_0)	32K	3.47 GiB	RTX 4060 8GB, MacBook Pro M4 Max 36G
r1-distill-qwen-1.5b (fp16)	32K	4.82 GiB	RTX 4060 8GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (Q4_K_M)	32K	7.90 GiB	RTX 4070 12GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (Q8_0)	32K	10.83 GiB	RTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (FP16)	32K	17.01 GiB	RTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (Q4_K_M)	32K	10.64 GiB	RTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (Q8_0)	32K	13.77 GiB	RTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (FP16)	32K	20.32 GiB	RTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (Q4_K_M)	32K	16.80 GiB	RTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (Q8_0)	32K	22.69 GiB	RTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (FP16)	32K	34.91 GiB	RTX 4090 24GB x2, MacBook Pro M4 Max 48G
R1-Distill-Qwen-32B (Q4_K_M)	32K	28.92 GiB	RTX 4080 16GB x2, MacBook Pro M4 Max 48G
R1-Distill-Qwen-32B (Q8_0)	32K	42.50 GiB	RTX 4090 24GB x3, MacBook Pro M4 Max 64G
R1-Distill-Qwen-32B (FP16)	32K	70.43 GiB	RTX 4090 24GB x4, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (Q4_K_M)	32K	53.41 GiB	RTX 4090 24GB x5, A100 80GB x1, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (Q8_0)	32K	83.15 GiB	RTX 4090 24GB x5, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (FP16)	32K	143.83 GiB	A100 80GB x2, Mac Studio M2 Ultra 192G
R1-671B (UD-IQ1_S)	32K	225.27 GiB	A100 80GB x4, Mac Studio M2 Ultra 192G
R1-671B (UD-IQ1_M)	32K	251.99 GiB	A100 80GB x4, Mac Studio M2 Ultra 192G x2
R1-671B (UD-IQ2_XXS)	32K	277.36 GiB	A100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (UD-Q2_K_XL)	32K	305.71 GiB	A100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (Q2_K_XS)	32K	300.73 GiB	A100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (Q2_K/Q2_K_L)	32K	322.14 GiB	A100 80GB x6, Mac Studio M2 Ultra 192G x2
R1-671B (Q3_K_M)	32K	392.06 GiB	A100 80GB x7
R1-671B (Q4_K_M)	32K	471.33 GiB	A100 80GB x8
R1-671B (Q5_K_M)	32K	537.31 GiB	A100 80GB x9
R1-671B (Q6_K)	32K	607.42 GiB	A100 80GB x11
R1-671B (Q8_0)	32K	758.54 GiB	A100 80GB x13
R1-671B (FP8)	32K	805.2 GiB	H200 141GB x8

concluding remarks

Deepseek R1 Localized deployment requires extremely high hardware investment and technical thresholds, so individual users should be cautious and enterprise users should fully assess the needs and costs. Through localized adaptation and cloud services, you can significantly reduce the risk and improve efficiency. Technology has no limits, rational planning can reduce costs and increase efficiency!

Global Enterprise Personal Channel Schedule

Secret Tower Search
360 Nano AI Search
Silicon-based flow
Byte Jump Volcano Engine
Baidu cloud Chifan, a virtual virtualization system created by Baidu.com
NVIDIA NIM
Groq
Fireworks
Chutes
Github
POE
Cursor
Monica
lambda (Greek letter Λλ)
Cerebras
Perplexity
Alibaba Cloud 100 Refinement

For environments that require magic or foreign corporate Internet access

Chip Business Support Schedule

Table 1: Cloud Vendors Supporting DeepSeek-R1

dates	Name/website	Publishing relevant information
January 28	lit. not knowing the core dome of the sky	A great combination of heterogeneous clouds
January 28	PPIO Paio Cloud	DeepSeek-R1 goes live on PPIO Paio Computing Cloud!
February 1	Silicon-based mobility x Huawei	First Release! Silicon Mobility x Huawei Cloud Jointly Launches DeepSeekR1&V3 Inference Service Based on Rise Cloud!
February 2	Z stark (Cloud Axis Technology)	ZStack supports DeepSeekV3/R1/JanusPro, multiple homegrown CPU/GPUs for private deployment!
February 3	Baidu Intelligent Cloud Chifan	Baidu Intelligent Cloud Chifan Fully Supports DeepSeek-R1/V3 Calls at Ultra-Low Prices
February 3	supercomputing Internet	Supercomputing Internet Goes Live with DeepSeek Series of Models for Superintelligent Fusion Arithmetic Support
February 4	Huawei (Rise Community)	DeepSeek series of new models are officially launched on Rise Community.
February 4	Lu Chen x Huawei Rise	LU Chen x Huawei Rise, together launching DeepSeekR1 series inference API and cloud mirroring service based on domestic arithmetic power
February 4	GreenCloud Technologies, Inc.	Free for a limited time, one-click deployment! Keystone Smart Computing Officially Goes Live with DeepSeek-R1 Series Models
February 4	Tennessee Intelligent Core (TIC), computing technology	One Day Adaptation! DeepseekR1 Modeling Service with GiteeAi
February 4	molecular biology	Tribute to Deepseek: Starting a Fire for China's Al Ecosystem with Domestic GPUs
February 4	Hai Guang Information	DeepSeekV3 and R1, Training Completes SeaLight DCU Adaptation and Goes Live
February 5	first light of shower	DeepSeek-V3 full-blooded version goes live in domestic MuXi GPU premiere experience
February 5	Hai Guang Information	Haidu Ang DcCU Chen Gong adapts DeepSeek-Janus-pro multimodal macromodels
February 5	Jingdong Yun (Beijing 2008-), China's largest cloud provider	One Click Deployment! Jingdong Cloud goes fully live with DeepSeek-R1/V3
February 5	(measure)	DeepSeekR1 in the wall ren domestic Ai arithmetic platform released, the full range of models one-stop empower developers
February 5	Unicom Cloud (China Unicom)	"Nezha in the Sea"! Connect Cloud shelves DeepSeek-R1 series models!
February 5	Mobile Cloud (China Mobile)	Full version, full size, full functionality! Mobile Cloud goes fully live with DeepSeek
February 5	Ucotex (brand)	UXTECH adapts DeepSeek's full range of models based on domestic chips.
February 5	Acer, a Taiwanese-American writer	Based on Taichu T100 acceleration card 2 hours to adapt DeepSeek-R1 series models, one-click experience, free API service
February 5	Reed Yun Tian Fei (1902-1985), Taiwanese master puppeteer	DeepEdge10 has completed DeepSeek-R1 series model adaptation
February 6	SkyCloud (China Telecom)	New breakthrough in domestic Al ecology! "Hibiscus" + DeepSeek, the king bomb!
February 6	Suwon Technology	Original Technology Realizes Deployment of Full Volume Reasoning Service for DeepSeek in Smart Computing Centers Across the Country
February 6	Kunlun Core	Domestic Alka Deepseek training inference full version adapted, excellent performance, one-key deployment and so on you!
February 7	Wave Cloud	Wave Cloud First to Release 671BDeepSeek Big Model All-in-One Solution
February 7	Beijing Supercomputer	Beijing Supercomputing xDeepSeek:Dual engines burst into flames, driving a storm of hundreds of billions of Al innovations
February 8	China E-Cloud	China eCloud Goes Live with DeepSeek-R1/V3 Full Volume Model Opens New Chapter of Private Deployment
February 8	Kingsoft Cloud	Kingsoft Cloud Supports DeepSeek-R1/V3
February 8	Shang Tang's big device	Shangtang's big device shelves DeepSeek series of models with limited experience and upgraded services!

Table 2: Enterprises Supporting DeepSeek-R1

dates	Name/website	Publishing relevant information
January 30	360 Nano AI Search	Nano AI search online "DeepSeek-R1" big model full-blooded version
February 3	Secret Tower AI Search	Secret Tower AI accesses full-blooded version of DeepSeekR1 inference model
February 5	Xiaoyi Assistant (Huawei)	Huawei Xiaoyi Assistant has access to DeepSeek, after Huawei Cloud announced the launch of DeepSeekR1/V3 inference service based on the Rise Cloud service
February 5	Writer's Assistant (Reading Group)	The first in the industry! ReadWrite Deploys DeepSeek, "Writer's Assistant" Upgrades Three Auxiliary Creative Functions
February 5	Wanxing Technology Co., Ltd.	Wanxing Technology: Completed DeepSeek-R1 Large Model Adaptation and Landed Multiple Products
February 6	Aldo P. (1948-), Hong Kong businessman and politician, prime minister 2007-2010	Embracing DeepSeek as the representative reasoning big model, NetEase has accelerated the landing of AI education
February 6	cloud school (computing)	Cloud Learning access to DeepSeek product AI capabilities comprehensively upgraded
February 7	staple	Nail AI assistant access DeepSeek, support for deep thinking
February 7	What's Worth Buying	Worth Buying: Access to DeepSeek Modeling Products
February 7	flush (finance)	Flush ask money 2.0 upgrade: inject "slow thinking" wisdom, to create a more rational investment decision-making assistant
February 8	Tiangong AI(Kunlun Wanwei)	Kunlun Wanwei's Tiangong AI Officially Launches DeepSeekR1+ Connected Search
February 8	Phantom of the Stars	FlymeAIOS has completed DeepSeek-R1 big model access!
February 8	glorify	Glory has access to DeepSeek

Table 3: Summary of enterprises supporting DeepSeek-R1

Name/website	Publishing relevant information
DeepSeek	DeepSeek-R1 released, performance benchmarked against OpenAI o1 official version
lit. not knowing the core dome of the sky	Infini-Al Heterogeneous Cloud Now Available DeepSeek-R1-Distill, a Great Combination of Domestic Models and Heterogeneous Clouds
PPIO Paio Cloud	DeepSeek-R1 goes live on PPIO Paio Computing Cloud!
Silicon-based flow Huawei	First Release! Silicon Mobility x Huawei Cloud Jointly Launches DeepSeekR1&V3 Inference Service Based on Rise Cloud!
Z stark (Cloud Axis Technology)	ZStack supports DeepSeekV3/R1/JanusPro, multiple homegrown CPU/GPU for private deployment.
Baidu Intelligent Cloud Chifan	Baidu Intelligent Cloud Chifan Fully Supports DeepSeek-R1/V3 Calls at Ultra-Low Prices
supercomputing Internet	Supercomputing Internet Goes Live with DeepSeek Series of Models, Providing Superintelligent Fusion Arithmetic Support
Huawei (Rise Community)	DeepSeek series of new models are officially launched on Rise community!
Lu Chen x Huawei Rise	LU Chen x Huawei Rise, Launching DeepSeekR1 Series of Inference APIs and Cloud Distribution Services Based on Domestic Arithmetic Power
GreenCloud Technologies, Inc.	Free for a limited time, one-click deployment! Cornerstone Computing Launches DeepSeek-R1 Series of Models
Jingdong Yun (Beijing 2008-), China's largest cloud provider	One Click Deployment! Jingdong Cloud goes fully live with DeepSeek-R1/V3
Unicom Cloud (China Unicom)	"Ne Zha in the Sea"! Connect Cloud shelves DeepSeek-R1 series models!
Mobile Cloud (China Mobile)	Full version, full size, full functionality! Mobile Cloud goes fully live DeepSeek
Ucotex (brand)	UQD adapts the full range of DeepSeek models based on a domestic chip
SkyCloud (China Telecom)	New breakthrough in domestic AI ecosystem! "Hibernate" + DeepSeek, the king bomb!
Digital China	3-minute deployment of high-performance AI model DeepSeek, Digital China to help enterprises intelligent transformation
Kaplan	Cape Cloud Enlightened Large Model Application and End-Side All-in-One Fully Accessible to DeepSeek
Gold Butterfly Cloud Dome	Kingdee's full access to DeepSeek big model helps enterprises accelerate AI application!
parallel technology	Server busy? Parallel Technologies helps you DeepSeek Freedom!
Capital Online (CAPITAL)	Capital Online Cloud Platform Goes Live with DeepSeek-R1 Family of Models
Wave Cloud	Wave Cloud First to Release 671B DeepSeek Large Model All-in-One Solution
Beijing Supercomputer	Beijing Supercomputing x DeepSeek: Twin Engines Explode, Driving a Storm of Hundreds of Billions of AI Innovations
Rhinoceros Enablement (Ziguang)	ZiGuang: Rhinoceros Enablement Platform Realizes Nascent Pipe and Upper Shelf for DeepSeekV3/R1 Models
China E-Cloud	China eCloud Goes Live with DeepSeek-R1/V3 Full Volume Models to Open New Chapter of Private Deployment
Kingsoft Cloud	Kingsoft Cloud Support DeepSeek-R1/V3
Shang Tang's big device	Shangtang's Big Installation shelves DeepSeek series of models with limited experience and upgraded services!
360 Nano AI Search	Nano AI Search Goes Live with "DeepSeek-R1" Large Model Full Blooded Version
Secret Tower AI Search	minaret AI access to full-blooded DeepSeek R1 inference models
Xiaoyi Assistant (Huawei)	Huawei Xiaoyi Assistant has access to DeepSeek, after Huawei Cloud announced the launch of DeepSeek R1/V3 inference service based on Rise Cloud service.
Writer's Assistant (Reading Group)	The first in the industry! ReadWrite Deploys DeepSeek, "Writer's Assistant" Upgrades Three Creative Assistance Functions
Wanxing Technology Co., Ltd.	Wanxing Technology: Completed DeepSeek-R1 Large Model Adaptation and Landed Multiple Products
Aldo P. (1948-), Hong Kong businessman and politician, prime minister 2007-2010	Embracing DeepSeek's big model of reasoning, NetEaseYouDao accelerates the landing of AI education
cloud school (computing)	Cloud Learning Access to DeepSeek Product AI Capabilities Comprehensively Upgraded
staple	Nail AI assistant access DeepSeek, support for deep thinking
What's Worth Buying	Worth Buying: Access to DeepSeek Modeling Products
Summary of AI capabilities related to Flybook x DeepSeek (public version)
flush (finance)	Flush Q&C 2.0 Upgrades: Injecting "Slow Thinking" Wisdom to Create a More Rational Investment Decision Assistant
heavenly workmanship AI (Kunlun Wanwei)	Tiangong AI, a subsidiary of Kunlun Wanwei, officially launches DeepSeek R1 + Networked Search
Phantom of the Stars	Flyme AI OS has completed DeepSeek-R1 big model access!
glorify	Glory has access to DeepSeek