Deepseek R1 Enterprise Local Deployment Complete Manual

AI Answers3mos agorelease Sharenet.ai
765 0
Trae

I. Introduction

Deepseek R1 is a high-performance general-purpose large language model that supports complex reasoning, multimodal processing, and technical document generation. This manual provides a complete local deployment guide for technical teams, covering hardware configurations, domestic chip adaptations, quantization solutions, heterogeneous solutions, cloud alternatives and deployment methods for the complete 671B MoE model.

II. Core configuration requirements for local deployment

1. Table of model parameters and hardware correspondence

Model parameters (B)Windows Configuration RequirementsMac Configuration RequirementsApplicable Scenarios
1.5B- RAM: 4GB- GPU: Integrated Graphics/Modern CPU- Storage: 5GB- Memory: 8GB (M1/M2/M3) - Storage: 5GBSimple text generation, basic code completion
7B- RAM: 8-10GB- GPU: GTX 1680 (4-bit quantized)- Storage: 8GB- Memory: 16GB (M2 Pro/M3) - Storage: 8GBMedium Complexity Quiz, Code Debugging
8B- RAM: 16GB - GPU: RTX 4080 (16GB VRAM) - Storage: 10GB- Memory: 32GB (M3 Max) - Storage: 10GBMedium complexity reasoning, document generation
14B- RAM: 24GB- GPU: RTX 3090 (24GB VRAM)- Memory: 32GB (M3 Max) - Storage: 20GBComplex reasoning, technical documentation generation
32BEnterprise deployment (requires multiple cards in parallel)Not supported at this timeScientific computing, large-scale data processing
70BEnterprise deployment (requires multiple cards in parallel)Not supported at this timeLarge-scale reasoning, ultra-complex tasks
671BEnterprise deployment (requires multiple cards in parallel)Not supported at this timeUltra-large-scale research computing, high-performance computing

2. Analysis of computing power requirements

model versionParameter (B)calculation accuracyModel sizeVRAM Requirements (GB)Reference GPU Configuration
DeepSeek-R1671BFP8~1,342GB≥1,342GBMulti-GPU configurations (e.g. NVIDIA A100 80GB * 16)
DeepSeek-R1-Distill-Llama-70B70BBF1643GB~32.7GBMulti-GPU configurations (e.g. NVIDIA A100 80GB * 2)
DeepSeek-R1-Distill-Qwen-32B32BBF1620GB~14.9GBMulti-GPU configurations (e.g. NVIDIA RTX 4090 * 4)
DeepSeek-R1-Distill-Qwen-14B14BBF169GB~6.5GBNVIDIA RTX 3080 10GB or higher
DeepSeek-R1-Distill-Llama-8B8BBF164.9GB~3.7GBNVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Qwen-7B7BBF164.7GB~3.3GBNVIDIA RTX 3070 8GB or higher
DeepSeek-R1-Distill-Qwen-1.5B1.5BBF161.1GB~0.7GBNVIDIA RTX 3060 12GB or higher
Deepseek R1 企业本地部署完全手册

Additional Notes:

  1. VRAM RequirementsThe VRAM requirements listed in the table are minimum requirements, and it is recommended that 20%-30% of additional video memory be reserved for actual deployments to handle peak demand during model loading and operation.
  2. Multi-GPU Configuration: For large-scale models (e.g. 32B+), it is recommended to use multiple GPUs in parallel to improve computational efficiency and stability.
  3. calculation accuracy: FP8 and BF16 are the current mainstream high-efficiency computational accuracies, which can guarantee the model performance while reducing the graphics memory usage.
  4. Applicable Scenarios: Models with different parameter scales are suitable for tasks of different complexity, and users can choose the appropriate model version according to their actual needs.
  5. Enterprise Deployment: For very large-scale models such as 671B, it is recommended that a professional-grade GPU cluster (such as the NVIDIA A100) be deployed to meet high-performance computing requirements.

III. Domestic Chip and Hardware Adaptation Program

1. Domestic eco-partnership dynamics

corporationsAdaptation contentPerformance Benchmarking (vs NVIDIA)
Huawei RiseThe Rise 910B natively supports the full R1 family and provides end-to-end inference optimization.
Mu Xi GPUMXN series support 70B model BF16 inference, video memory utilization increased by 30%RTX 3090 equivalent
Sea Light DCUAdapts to V3/R1 models, performance against NVIDIA A100Equivalent A100 (BF16)

2. Recommended configuration for national hardware

model parameterRecommended ProgramsApplicable Scenarios
1.5BTaichu T100 Accelerator CardIndividual developer prototype validation
14BKunlun Core K200 ClusterEnterprise-level complex task reasoning
32BWallchurch Computing Power Platform + Rise 910B ClusterScientific Computing and Multimodal Processing

IV. Cloud deployment alternatives

1. Recommended domestic cloud service providers

flat-roofed buildingCore AdvantagesApplicable Scenarios
Silicon-based flowOfficially recommended API, low latency, multimodal model supportEnterprise-class high-concurrency reasoning
Tencent cloudOne-click deployment + free trial for a limited time with VPC privatization supportSmall and Medium Scale Models Go Live Quickly
PPIO Paio Cloud1/20th of the price of OpenAI, 50 million free with registration tokensLow-cost tasting and testing

2. International access channels (requires magic or foreign enterprise Internet access)

  • NVIDIA NIM: Enterprise GPU Cluster Deployment (link)
  • Groq: ultra-low latency reasoning (link)

V. Ollama+Unsloth deployment

1. Quantification scheme and model selection

quantized versionfile sizeMinimum RAM + VRM RequirementsApplicable Scenarios
DeepSeek-R1-UD-IQ1_M158GB≥200GBConsumer-grade hardware (e.g., Mac Studio)
DeepSeek-R1-Q4_K_M404 GB≥500GBHigh Performance Servers/Cloud GPUs

Download Address:

  • HuggingFace Model Library
  • Unsloth AI Official Description

2. Hardware configuration recommendations

Hardware typeRecommended ConfigurationsPerformance performance (short text generation)
Consumer-grade devicesMac Studio (192GB unified memory)10+ tokens/second
High-performance servers4 RTX 4090 (96GB video memory + 384GB RAM)7-8 tokens/second (mixed reasoning)

3. Deployment steps (Linux example)

1. Installation of dependent tools:

# 安装 llama.cpp(用于合并分片文件)
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install llama.cpp

2. Download and merge model slices:

llama-gguf-split --merge DeepSeek-R1-UD-IQ1_M-00001-of-00004.gguf DeepSeek-R1-UD-IQ1_S.gguf

3. Install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

4. Create the Modelfile:

FROM /path/to/DeepSeek-R1-UD-IQ1_M.gguf
PARAMETER num_gpu 28  # 每块 RTX 4090 加载 7 层(共 4 卡)
PARAMETER num_ctx 2048
PARAMETER temperature 0.6
TEMPLATE "<|end▁of▁thinking $|>{{{ .Prompt }}}<|end▁of▁thinking|>"

5. Run the model:

ollama create DeepSeek-R1-UD-IQ1_M -f DeepSeekQ1_Modelfile

4. Performance tuning and testing

  • Low GPU utilization: Upgrade high-bandwidth memory (e.g. DDR5 5600+).
  • Extended Swap Space::
sudo fallocate -l 100G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Full blood 671B Deployment order

  • VLLM::
vllm serve deepseek-ai/deepseek-r1-671b --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
  • SGLang::
python3 -m sglang.launch_server --model deepseek-ai/deepseek-r1-671b --trust-remote-code --tp 2

VI. Notes and Risks

1. Cost alerts:

  • 70B Model: Requires 3 or more 80G RAM graphics cards (e.g. RTX A6000), not feasible for single card users.
  • 671B Model: Requires 8xH100 clusters for supercomputing center deployments only.

2. Alternative programs:

  • Individual users are recommended to use cloud-based APIs (e.g., Silicon Mobility), which are maintenance-free and compliant.

3. National hardware compatibility:

  • A customized version of the framework is required (e.g., Rise CANN, Mu Xi MXMLLM).

VII. Appendix: Technical support and resources

  • Huawei Rise: Rise Cloud Services
  • Mu Xi GPU: Free API Experience
  • Lee Seok Han Blog: Full deployment tutorial

VIII. Heterogeneous GPUStack solutions

GPUStack Open Source Project

https://github.com/gpustack/gpustack/

Model Resource Measurement Tool

  • GGUF Parser(https://github.com/gpustack/gguf-parser-go) is used to manually calculate the video memory requirements.

GPUStack

DeepSeek Full Platform Private Deployment

ModelContext SizeVRAM RequirementRecommended GPUs
R1-Distill-Qwen-1.5B (Q4_K_M)32K2.86 GiBRTX 4060 8GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-1.5B (Q8_0)32K3.47 GiBRTX 4060 8GB, MacBook Pro M4 Max 36G
r1-distill-qwen-1.5b (fp16)32K4.82 GiBRTX 4060 8GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (Q4_K_M)32K7.90 GiBRTX 4070 12GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (Q8_0)32K10.83 GiBRTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-7B (FP16)32K17.01 GiBRTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (Q4_K_M)32K10.64 GiBRTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (Q8_0)32K13.77 GiBRTX 4080 16GB, MacBook Pro M4 Max 36G
R1-Distill-Llama-8B (FP16)32K20.32 GiBRTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (Q4_K_M)32K16.80 GiBRTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (Q8_0)32K22.69 GiBRTX 4090 24GB, MacBook Pro M4 Max 36G
R1-Distill-Qwen-14B (FP16)32K34.91 GiBRTX 4090 24GB x2, MacBook Pro M4 Max 48G
R1-Distill-Qwen-32B (Q4_K_M)32K28.92 GiBRTX 4080 16GB x2, MacBook Pro M4 Max 48G
R1-Distill-Qwen-32B (Q8_0)32K42.50 GiBRTX 4090 24GB x3, MacBook Pro M4 Max 64G
R1-Distill-Qwen-32B (FP16)32K70.43 GiBRTX 4090 24GB x4, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (Q4_K_M)32K53.41 GiBRTX 4090 24GB x5, A100 80GB x1, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (Q8_0)32K83.15 GiBRTX 4090 24GB x5, MacBook Pro M4 Max 128G
R1-Distill-Llama-70B (FP16)32K143.83 GiBA100 80GB x2, Mac Studio M2 Ultra 192G
R1-671B (UD-IQ1_S)32K225.27 GiBA100 80GB x4, Mac Studio M2 Ultra 192G
R1-671B (UD-IQ1_M)32K251.99 GiBA100 80GB x4, Mac Studio M2 Ultra 192G x2
R1-671B (UD-IQ2_XXS)32K277.36 GiBA100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (UD-Q2_K_XL)32K305.71 GiBA100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (Q2_K_XS)32K300.73 GiBA100 80GB x5, Mac Studio M2 Ultra 192G x2
R1-671B (Q2_K/Q2_K_L)32K322.14 GiBA100 80GB x6, Mac Studio M2 Ultra 192G x2
R1-671B (Q3_K_M)32K392.06 GiBA100 80GB x7
R1-671B (Q4_K_M)32K471.33 GiBA100 80GB x8
R1-671B (Q5_K_M)32K537.31 GiBA100 80GB x9
R1-671B (Q6_K)32K607.42 GiBA100 80GB x11
R1-671B (Q8_0)32K758.54 GiBA100 80GB x13
R1-671B (FP8)32K805.2 GiBH200 141GB x8

concluding remarks

Deepseek R1 Localized deployment requires extremely high hardware investment and technical thresholds, so individual users should be cautious and enterprise users should fully assess the needs and costs. Through localized adaptation and cloud services, you can significantly reduce the risk and improve efficiency. Technology has no limits, rational planning can reduce costs and increase efficiency!

Global Enterprise Personal Channel Schedule

  1. Secret Tower Search
  2. 360 Nano AI Search
  3. Silicon-based flow
  4. Byte Jump Volcano Engine
  5. Baidu cloud Chifan, a virtual virtualization system created by Baidu.com
  6. NVIDIA NIM
  7. Groq
  8. Fireworks
  9. Chutes
  10. Github
  11. POE
  12. Cursor
  13. Monica
  14. lambda (Greek letter Λλ)
  15. Cerebras
  16. Perplexity
  17. Alibaba Cloud 100 Refinement

For environments that require magic or foreign corporate Internet access

Chip Business Support Schedule

Table 1: Cloud Vendors Supporting DeepSeek-R1

datesName/websitePublishing relevant information
January 28lit. not knowing the core dome of the skyA great combination of heterogeneous clouds
January 28PPIO Paio CloudDeepSeek-R1 goes live on PPIO Paio Computing Cloud!
February 1Silicon-based mobility x HuaweiFirst Release! Silicon Mobility x Huawei Cloud Jointly Launches DeepSeekR1&V3 Inference Service Based on Rise Cloud!
February 2Z stark (Cloud Axis Technology)ZStack supports DeepSeekV3/R1/JanusPro, multiple homegrown CPU/GPUs for private deployment!
February 3Baidu Intelligent Cloud ChifanBaidu Intelligent Cloud Chifan Fully Supports DeepSeek-R1/V3 Calls at Ultra-Low Prices
February 3supercomputing InternetSupercomputing Internet Goes Live with DeepSeek Series of Models for Superintelligent Fusion Arithmetic Support
February 4Huawei (Rise Community)DeepSeek series of new models are officially launched on Rise Community.
February 4Lu Chen x Huawei RiseLU Chen x Huawei Rise, together launching DeepSeekR1 series inference API and cloud mirroring service based on domestic arithmetic power
February 4GreenCloud Technologies, Inc.Free for a limited time, one-click deployment! Keystone Smart Computing Officially Goes Live with DeepSeek-R1 Series Models
February 4Tennessee Intelligent Core (TIC), computing technologyOne Day Adaptation! DeepseekR1 Modeling Service with GiteeAi
February 4molecular biologyTribute to Deepseek: Starting a Fire for China's Al Ecosystem with Domestic GPUs
February 4Hai Guang InformationDeepSeekV3 and R1, Training Completes SeaLight DCU Adaptation and Goes Live
February 5first light of showerDeepSeek-V3 full-blooded version goes live in domestic MuXi GPU premiere experience
February 5Hai Guang InformationHaidu Ang DcCU Chen Gong adapts DeepSeek-Janus-pro multimodal macromodels
February 5Jingdong Yun (Beijing 2008-), China's largest cloud providerOne Click Deployment! Jingdong Cloud goes fully live with DeepSeek-R1/V3
February 5(measure)DeepSeekR1 in the wall ren domestic Ai arithmetic platform released, the full range of models one-stop empower developers
February 5Unicom Cloud (China Unicom)"Nezha in the Sea"! Connect Cloud shelves DeepSeek-R1 series models!
February 5Mobile Cloud (China Mobile)Full version, full size, full functionality! Mobile Cloud goes fully live with DeepSeek
February 5Ucotex (brand)UXTECH adapts DeepSeek's full range of models based on domestic chips.
February 5Acer, a Taiwanese-American writerBased on Taichu T100 acceleration card 2 hours to adapt DeepSeek-R1 series models, one-click experience, free API service
February 5Reed Yun Tian Fei (1902-1985), Taiwanese master puppeteerDeepEdge10 has completed DeepSeek-R1 series model adaptation
February 6SkyCloud (China Telecom)New breakthrough in domestic Al ecology! "Hibiscus" + DeepSeek, the king bomb!
February 6Suwon TechnologyOriginal Technology Realizes Deployment of Full Volume Reasoning Service for DeepSeek in Smart Computing Centers Across the Country
February 6Kunlun CoreDomestic Alka Deepseek training inference full version adapted, excellent performance, one-key deployment and so on you!
February 7Wave CloudWave Cloud First to Release 671BDeepSeek Big Model All-in-One Solution
February 7Beijing SupercomputerBeijing Supercomputing xDeepSeek:Dual engines burst into flames, driving a storm of hundreds of billions of Al innovations
February 8China E-CloudChina eCloud Goes Live with DeepSeek-R1/V3 Full Volume Model Opens New Chapter of Private Deployment
February 8Kingsoft CloudKingsoft Cloud Supports DeepSeek-R1/V3
February 8Shang Tang's big deviceShangtang's big device shelves DeepSeek series of models with limited experience and upgraded services!

Table 2: Enterprises Supporting DeepSeek-R1

datesName/websitePublishing relevant information
January 30360 Nano AI SearchNano AI search online "DeepSeek-R1" big model full-blooded version
February 3Secret Tower AI SearchSecret Tower AI accesses full-blooded version of DeepSeekR1 inference model
February 5Xiaoyi Assistant (Huawei)Huawei Xiaoyi Assistant has access to DeepSeek, after Huawei Cloud announced the launch of DeepSeekR1/V3 inference service based on the Rise Cloud service
February 5Writer's Assistant (Reading Group)The first in the industry! ReadWrite Deploys DeepSeek, "Writer's Assistant" Upgrades Three Auxiliary Creative Functions
February 5Wanxing Technology Co., Ltd.Wanxing Technology: Completed DeepSeek-R1 Large Model Adaptation and Landed Multiple Products
February 6Aldo P. (1948-), Hong Kong businessman and politician, prime minister 2007-2010Embracing DeepSeek as the representative reasoning big model, NetEase has accelerated the landing of AI education
February 6cloud school (computing)Cloud Learning access to DeepSeek product AI capabilities comprehensively upgraded
February 7stapleNail AI assistant access DeepSeek, support for deep thinking
February 7What's Worth BuyingWorth Buying: Access to DeepSeek Modeling Products
February 7flush (finance)Flush ask money 2.0 upgrade: inject "slow thinking" wisdom, to create a more rational investment decision-making assistant
February 8Tiangong AI(Kunlun Wanwei)Kunlun Wanwei's Tiangong AI Officially Launches DeepSeekR1+ Connected Search
February 8Phantom of the StarsFlymeAIOS has completed DeepSeek-R1 big model access!
February 8glorifyGlory has access to DeepSeek

Table 3: Summary of enterprises supporting DeepSeek-R1

Name/websitePublishing relevant information
DeepSeekDeepSeek-R1 released, performance benchmarked against OpenAI o1 official version
lit. not knowing the core dome of the skyInfini-Al Heterogeneous Cloud Now Available DeepSeek-R1-Distill, a Great Combination of Domestic Models and Heterogeneous Clouds
PPIO Paio CloudDeepSeek-R1 goes live on PPIO Paio Computing Cloud!
Silicon-based flow HuaweiFirst Release! Silicon Mobility x Huawei Cloud Jointly Launches DeepSeekR1&V3 Inference Service Based on Rise Cloud!
Z stark (Cloud Axis Technology)ZStack supports DeepSeekV3/R1/JanusPro, multiple homegrown CPU/GPU for private deployment.
Baidu Intelligent Cloud ChifanBaidu Intelligent Cloud Chifan Fully Supports DeepSeek-R1/V3 Calls at Ultra-Low Prices
supercomputing InternetSupercomputing Internet Goes Live with DeepSeek Series of Models, Providing Superintelligent Fusion Arithmetic Support
Huawei (Rise Community)DeepSeek series of new models are officially launched on Rise community!
Lu Chen x Huawei RiseLU Chen x Huawei Rise, Launching DeepSeekR1 Series of Inference APIs and Cloud Distribution Services Based on Domestic Arithmetic Power
GreenCloud Technologies, Inc.Free for a limited time, one-click deployment! Cornerstone Computing Launches DeepSeek-R1 Series of Models
Jingdong Yun (Beijing 2008-), China's largest cloud providerOne Click Deployment! Jingdong Cloud goes fully live with DeepSeek-R1/V3
Unicom Cloud (China Unicom)"Ne Zha in the Sea"! Connect Cloud shelves DeepSeek-R1 series models!
Mobile Cloud (China Mobile)Full version, full size, full functionality! Mobile Cloud goes fully live DeepSeek
Ucotex (brand)UQD adapts the full range of DeepSeek models based on a domestic chip
SkyCloud (China Telecom)New breakthrough in domestic AI ecosystem! "Hibernate" + DeepSeek, the king bomb!
Digital China3-minute deployment of high-performance AI model DeepSeek, Digital China to help enterprises intelligent transformation
KaplanCape Cloud Enlightened Large Model Application and End-Side All-in-One Fully Accessible to DeepSeek
Gold Butterfly Cloud DomeKingdee's full access to DeepSeek big model helps enterprises accelerate AI application!
parallel technologyServer busy? Parallel Technologies helps you DeepSeek Freedom!
Capital Online (CAPITAL)Capital Online Cloud Platform Goes Live with DeepSeek-R1 Family of Models
Wave CloudWave Cloud First to Release 671B DeepSeek Large Model All-in-One Solution
Beijing SupercomputerBeijing Supercomputing x DeepSeek: Twin Engines Explode, Driving a Storm of Hundreds of Billions of AI Innovations
Rhinoceros Enablement (Ziguang)ZiGuang: Rhinoceros Enablement Platform Realizes Nascent Pipe and Upper Shelf for DeepSeekV3/R1 Models
China E-CloudChina eCloud Goes Live with DeepSeek-R1/V3 Full Volume Models to Open New Chapter of Private Deployment
Kingsoft CloudKingsoft Cloud Support DeepSeek-R1/V3
Shang Tang's big deviceShangtang's Big Installation shelves DeepSeek series of models with limited experience and upgraded services!
360 Nano AI SearchNano AI Search Goes Live with "DeepSeek-R1" Large Model Full Blooded Version
Secret Tower AI Searchminaret AI access to full-blooded DeepSeek R1 inference models
Xiaoyi Assistant (Huawei)Huawei Xiaoyi Assistant has access to DeepSeek, after Huawei Cloud announced the launch of DeepSeek R1/V3 inference service based on Rise Cloud service.
Writer's Assistant (Reading Group)The first in the industry! ReadWrite Deploys DeepSeek, "Writer's Assistant" Upgrades Three Creative Assistance Functions
Wanxing Technology Co., Ltd.Wanxing Technology: Completed DeepSeek-R1 Large Model Adaptation and Landed Multiple Products
Aldo P. (1948-), Hong Kong businessman and politician, prime minister 2007-2010Embracing DeepSeek's big model of reasoning, NetEaseYouDao accelerates the landing of AI education
cloud school (computing)Cloud Learning Access to DeepSeek Product AI Capabilities Comprehensively Upgraded
stapleNail AI assistant access DeepSeek, support for deep thinking
What's Worth BuyingWorth Buying: Access to DeepSeek Modeling Products
Summary of AI capabilities related to Flybook x DeepSeek (public version)
flush (finance)Flush Q&C 2.0 Upgrades: Injecting "Slow Thinking" Wisdom to Create a More Rational Investment Decision Assistant
heavenly workmanship AI (Kunlun Wanwei)Tiangong AI, a subsidiary of Kunlun Wanwei, officially launches DeepSeek R1 + Networked Search
Phantom of the StarsFlyme AI OS has completed DeepSeek-R1 big model access!
glorifyGlory has access to DeepSeek
© Copyright notes
AiPPT

Related posts

No comments

none
No comments...