Affiner le modèle DeepSeek R1 pour permettre des questions-réponses médicales de précision : libérer le potentiel de l'IA open source

Tutoriels pratiques sur l'IAPosté il y a 5 mois Sharenet.ai

1K 0

DeepSeek 推出一系列先进推理模型，挑战 OpenAI 行业地位，且完全免费、无使用限制，惠及所有用户。

本文将介绍如何使用 Hugging Face 的医学思维链数据集，对 DeepSeek-R1-Distill-Llama-8B 模型进行微调。这款精简版 Profondeur de l'eau-R1 模型，通过在 DeepSeek-R1 生成的数据上微调 Llama 3 8B 模型而得，展现出与原模型相近的卓越推理能力。

精微调校 DeepSeek R1 模型，赋能医疗精准问答：开源 AI 的潜力释放

DeepSeek R1 解密

DeepSeek-R1 与 DeepSeek-R1-Zero 在数学、编程及逻辑推理任务中，性能均可比肩 OpenAI 的 o1 模型。值得一提的是，R1 和 R1-Zero 均为开源模型.

DeepSeek-R1-Zero

DeepSeek-R1-Zero 作为首个完全采用大规模强化学习 (RL, Reinforcement Learning) 训练的开源模型，有别于传统以监督微调 (SFT, Supervised Fine-Tuning) 为初始步骤的模型。这种创新方法赋予了模型独立探索思维链 (CoT, Chain-of-Thought) 推理的能力，使其能够解决复杂问题并迭代优化输出结果。然而，此方法也带来了一些挑战，例如推理步骤可能出现重复、可读性降低以及语言风格不统一等问题，进而影响模型的清晰度和实用性。

Profondeur de l'eau-R1

DeepSeek-R1 的发布旨在克服 DeepSeek-R1-Zero 的不足。通过在强化学习之前引入冷启动数据，DeepSeek-R1 为推理和非推理任务奠定了更坚实的基础。这种多阶段训练策略使得 DeepSeek-R1 在数学、编程和推理基准测试中，能够达到与 OpenAI-o1 匹敌的领先水平，并显著提升了输出内容的可读性与连贯性。

DeepSeek 蒸馏模型

DeepSeek 还推出了蒸馏模型系列。这些模型在保持卓越推理性能的同时，体积更小、效率更高。虽然参数规模从 1.5B 到 70B 不等，但这些模型均保留了强大的推理能力。其中，DeepSeek-R1-Distill-Qwen-32B 在多项基准测试中，性能超越了 OpenAI-o1-mini 模型。更小规模的模型继承了大型模型的推理模式，充分证明了蒸馏技术的有效性。

DeepSeek R1 微调实战

1. configuration de l'environnement

在本次模型微调实践中，选用 Kaggle 作为云端 IDE，原因在于 Kaggle 提供了免费的 GPU 资源。最初选择了两块 T4 GPU，但最终仅使用了一块。若用户希望在本地计算机上进行模型微调，则至少需要配备一块具备 16GB 显存的 RTX 3090 显卡.

首先，启动一个新的 Kaggle notebook，并将用户的 Hugging Face jeton répondre en chantant Weights & Biases token 添加为密钥。

完成密钥设置后，安装 unsloth Python 包。Unsloth 是一款开源框架，旨在将大型语言模型 (LLM) 的微调速度提升一倍，并显著提高内存效率。

%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

接下来，登录 Hugging Face CLI。此步骤对于后续下载数据集以及上传微调后的模型至关重要。

from huggingface_hub import login
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("HUGGINGFACE_TOKEN")
login(hf_token)

然后，登录 Weights & Biases (wandb)，并创建一个新项目，以便跟踪实验过程和微调进度。

import wandb
wb_token = user_secrets.get_secret("wandb")
wandb.login(key=wb_token)
run = wandb.init(
project='Fine-tune-DeepSeek-R1-Distill-Llama-8B on Medical COT Dataset',
job_type="training",
anonymous="allow"
)

2. 模型与 tokenizer 加载

在本文的实践中，加载了 Unsloth 版本的 DeepSeek-R1-Distill-Llama-8B 模型。

https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B

为了优化内存使用和提升性能，选择了以 4-bit 量化的方式加载模型。

from unsloth import FastLanguageModel
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit,
token=hf_token,
)

3. 微调前模型推理能力初探

为了构建模型的提示模板，定义了一个系统提示，并在其中加入了问题和答案生成的占位符。此提示旨在引导模型逐步思考，并最终生成逻辑严谨且准确的回答。

prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
{}
### Response:
<think>{}"""

在此示例中，向 prompt_style 提供了一个医学问题，并将其转化为 tokens，随后将这些 jetons 传递给模型以生成答案。

question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
FastLanguageModel.for_inference(model)
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=1200,
use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

上述医学问题的核心内容是：

一位 61 岁女性，长期在咳嗽或打喷嚏等活动中出现非自愿性漏尿，但夜间无漏尿现象。她接受了妇科检查和 Q-tip 测试。基于这些检查结果，膀胱测压最可能揭示其残余尿量和逼尿肌收缩状态的哪些信息？

即使在未经微调的情况下，该模型也成功生成了思维链，并在给出最终答案前进行了严谨的推理，整个推理过程被封装在 <think></think> 标签内。

那么，为何仍需进行微调？尽管模型展现出了详细的推理过程，但其表达略显冗长，不够简洁。此外，最终答案以项目符号列表的形式呈现，这与期望微调的数据集的结构和风格存在偏差。

4. 数据集加载与预处理

对提示模板进行了微调，以适应数据集的处理需求，具体方法是在提示模板中为复杂的思维链 (Complex Chain-of-Thought) 列添加了第三个占位符。

train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.
### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning.
Please answer the following medical question.
### Question:
{}
### Response:
<think>
{}
</think>
{}"""

编写了一个 Python 函数，用于在数据集中创建 “text” 列。该列的内容由训练提示模板构成，并将占位符分别填充为问题、思维链和答案。

EOS_TOKEN = tokenizer.eos_token  # Must add EOS_TOKEN
def formatting_prompts_func(examples):
inputs = examples["Question"]
cots = examples["Complex_CoT"]
outputs = examples["Response"]
texts = []
for input, cot, output in zip(inputs, cots, outputs):
text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
texts.append(text)
return {
"text": texts,
}

从 Hugging Face Hub 加载了 FreedomIntelligence/medical-o1-reasoning-SFT 数据集的前 500 个样本。

https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT?row=46

随后，使用 formatting_prompts_func 函数对数据集的 “text” 列进行了映射处理。

如上图所示，"text" 列已成功整合了系统提示、指令、思维链以及最终答案。

5. 模型配置

通过设定目标模块，采用低秩适配器 (Low-Rank Adapter) 技术对模型进行配置。

model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
random_state=3407,
use_rslora=False,
loftq_config=None,
)

接下来，配置了训练参数和训练器 (Trainer)。通过向训练器提供模型、tokenizer、数据集以及其他关键训练参数，以优化模型的微调过程。

from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
dataset_num_proc=2,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
# Use num_train_epochs = 1, warmup_ratio for full training runs!
warmup_steps=5,
max_steps=60,
learning_rate=2e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=10,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
),
)

6. 模型训练

trainer_stats = trainer.train()

模型训练过程耗时 22 分钟。训练损失 (loss) 逐渐降低，表明模型性能有所提升，这是一个积极的信号。

用户可以登录 Weights & Biases 网站，查看完整的模型评估报告。

7. 微调后模型推理能力评估

为了进行对比分析，再次向微调后的模型提出了与微调前相同的问题，以观察模型性能的变化。

question = "A 61-year-old woman with a long history of involuntary urine loss during activities like coughing or sneezing but no leakage at night undergoes a gynecological exam and Q-tip test. Based on these findings, what would cystometry most likely reveal about her residual volume and detrusor contractions?"
FastLanguageModel.for_inference(model)  # Unsloth has 2x faster inference!
inputs = tokenizer([prompt_style.format(question, "")], return_tensors="pt").to("cuda")
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=1200,
use_cache=True,
)
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])

实验结果表明，微调后模型的输出质量得到了显著提升，答案更加精准。思维链的呈现更为简洁明了，最终答案也更加直接，仅用一段话便清晰作答，表明此次模型微调取得了成功。

8. 模型本地存储

现在，将适配器 (adapter)、完整模型以及 tokenizer 保存至本地，以便在其他项目中使用。

new_model_local = "DeepSeek-R1-Medical-COT"
model.save_pretrained(new_model_local)
tokenizer.save_pretrained(new_model_local)
model.save_pretrained_merged(new_model_local, tokenizer, save_method="merged_16bit")

9. 模型上传至 Hugging Face Hub

还将适配器、tokenizer 和完整模型推送至 Hugging Face Hub，旨在使 AI 社区能够充分利用这一微调模型，并将其便捷地集成到各自的系统中。

new_model_online = "realyinchen/DeepSeek-R1-Medical-COT"
model.push_to_hub(new_model_online)
tokenizer.push_to_hub(new_model_online)
model.push_to_hub_merged(new_model_online, tokenizer, save_method="merged_16bit")