DistilQwen2.5-R1發(fā)布：知識蒸餾助推小模型深度思考

作者：蔡文睿（清素）、汪誠愚（熊兮）、嚴俊冰（玖燭）、黃?。ㄅR在）

引言

隨著 DeepSeek-R1 和 QwQ-32B 等面向深度推理的大語言模型的開源，“大模型+慢思考”已成為拓展大語言模型智能邊界的標準配置。然而，這些模型在資源受限的移動設(shè)備和邊緣計算場景中的普及仍面臨巨大挑戰(zhàn)。因此，學(xué)術(shù)界和工業(yè)界迫切需要解決如何有效利用知識蒸餾技術(shù)，將這些超大規(guī)模深度推理模型的知識遷移到小模型中，從而提升計算效率并降低部署成本的問題。為此，我們在 DistilQwen2.5 系列蒸餾小模型（看這里）的基礎(chǔ)上，推出了更為強大的 DistilQwen2.5-R1 系列深度推理模型。

DistilQwen2.5-R1 系列以少量來自 DeepSeek-R1 的思維鏈蒸餾數(shù)據(jù)為基礎(chǔ)，通過一系列創(chuàng)新的蒸餾策略，有效強化了小模型的深度思考能力。實驗評估結(jié)果顯示，DistilQwen2.5-R1 系列中的多種小規(guī)模模型在各項基準測試中表現(xiàn)優(yōu)異（見下圖）。例如，DistilQwen2.5-R1-7B 性能顯著超越了其他開源蒸餾模型，包括 OpenThinker-7B。

為方便開發(fā)者和企業(yè)在實際應(yīng)用中使用 DistilQwen2.5-R1 系列模型，其所有的 Checkpoint 已在 Hugging Face 和 Model Scope 開源社區(qū)中公開。本文將深入闡述 DistilQwen2.5-R1 的蒸餾算法、性能評估，并且提供在阿里云星空人工智能平臺 PAI 上的使用指南及相關(guān)下載教程。

DistilQwen2.5-R1中的知識蒸餾技術(shù)

本節(jié)中，我們主要描述 DistilQwen2.5-R1 模型訓(xùn)練中使用的數(shù)據(jù)增強與知識蒸餾技術(shù)。

由于自身參數(shù)量的顯著差異，大模型與小模型的認知與推理軌跡有時并不完全一致。以數(shù)學(xué)問題為例：對于有的數(shù)學(xué)問題，小模型由于自身參數(shù)量的限制，會傾向于使用更基礎(chǔ)的方法去解決問題。而大模型基于其強大的推理能力，會采用較為高階的方法。比如經(jīng)典的雞兔同籠問題，小模型傾向于使用簡單枚舉法逐一試錯，而大模型會直接通過列方程的較高級方法求解。

正是由于大小模型的認知軌跡偏差，小模型有時無法有效理解大模型的思維鏈，此時如果直接該思維鏈（Chain-of-Thought，CoT）蒸餾到小模型中，往往效果不佳。為此，我們設(shè)計了一種小型推理模型訓(xùn)練框架，以消除這種認知軌跡偏差帶來的負面影響。在后續(xù)訓(xùn)練中，我們還利用這種偏差數(shù)據(jù)進一步提升小模型的推理能力，最終推出基于該訓(xùn)練框架的 DistilQwen2.5-R1 系列模型。我們提出的訓(xùn)練技術(shù)框架包含兩個階段：CoT 數(shù)據(jù)“評價-改進-驗證”機制，以及基于不同認知軌跡數(shù)據(jù)的偏好優(yōu)化算法?？傮w而言，DistilQwen2.5-R1 模型蒸餾的詳細算法框架如下圖所示：

給定原始的大模型思維鏈數(shù)據(jù)集，例如從 DeepSeek-R1 蒸餾的數(shù)據(jù)集，在一階段，我們先對其進行數(shù)據(jù)難度評價，接著根據(jù)數(shù)據(jù)的難度等級對其進行相應(yīng)的優(yōu)化，優(yōu)化之后還要對結(jié)果進行驗證。我們使用改進且被驗證的 CoT 數(shù)據(jù)集對模型進行 SFT 訓(xùn)練，獲取模型的基礎(chǔ)推理能力。在二階段，我們利用一階段已有的不同難度的 CoT 數(shù)據(jù)構(gòu)造偏好數(shù)據(jù)集，在一階段的基礎(chǔ)上進一步提升小模型的推理能力。

CoT 數(shù)據(jù)“評價-改進-驗證”機制

正如上文中提到的，大小模型間的認知推理軌跡有時存在顯著偏差。因此，對于待蒸餾的大模型思維鏈數(shù)據(jù)集，小模型無法完全理解。階段一正是基于這種認知偏差對數(shù)據(jù)集進行優(yōu)化，采用了 LLM-as-a-Judge 的范式，對大模型的推理過程進行評價并改進。

給定問題、大模型的推理過程和問題的答案，我們使用模型判斷這個推理過程是簡單、中等還是困難。難度等級的核心標準是小模型是否能夠遵循給定的推理過程得到問題的答案。以下是思維鏈的難度等級及定義：

· 中等：小模型可以遵循該推理過程得到問題的答案。

· 簡單：給定的推理過程過于簡單，缺少小模型所需的必要步驟，導(dǎo)致大模型依賴其強大的推理能力解決問題，而小模型無法遵循該過程得到答案。

· 困難：給定的推理過程過于復(fù)雜或過于困難，導(dǎo)致小模型無法遵循該過程得到答案。

基于一個大模型的問題與思維鏈集合，我們可以將其分為簡單、中等和困難三類。對于評級為中等的部分，我們予以保留。對于被評為簡單和困難的數(shù)據(jù)，我們使用模型對思維鏈進行改進。具體來說：對于簡單部分，我們擴展其推理過程，直至小模型可以遵循擴展的過程得到答案。對于評級為困難的部分，我們精簡其推理過程，直至小模型可以遵循精簡的過程得到答案。

我們之后對改進結(jié)果進行進一步驗證，包括：對改進后的思維鏈再次評價難度等級，檢測其是否被歸類為中等難度，以及驗證小模型是否能夠遵循改進的思維鏈解決問題。如果改進后的思維鏈通過驗證，說明改進有效，該數(shù)據(jù)可以被小模型有效理解，我們將其保留。如果驗證不通過，說明改進無效，我們將返回到改進步驟，重新進行改進，直至通過驗證。最終，我們獲取了優(yōu)化后的思維鏈數(shù)據(jù)集，其組成部分如下：

· 初始難度評級為中等的數(shù)據(jù)。

· 初始難度評級為簡單，經(jīng)過改進擴展后評為中等并通過驗證的數(shù)據(jù)。

· 初始難度評級為困難，經(jīng)過改進精簡后評為中等并通過驗證的數(shù)據(jù)。

此時，數(shù)據(jù)集內(nèi)所有思維鏈的最終難度評級均為中等，意味著小模型可以有效理解數(shù)據(jù)集內(nèi)的所有思維鏈，并能遵循這些思維鏈解決相應(yīng)推理問題。上文提到的大小模型認知軌跡偏差問題在改進后的數(shù)據(jù)集中得到妥善解決，其可能帶來的負面影響也被消除。我們使用優(yōu)化后的思維鏈數(shù)據(jù)集對 Qwen2.5 系列基座模型進行監(jiān)督微調(diào)（SFT），得到 DistilQwen2.5-R1 系列模型的基礎(chǔ)結(jié)果。

基于多種認知軌跡數(shù)據(jù)的偏好優(yōu)化

在第二階段，我們基于第一階段得到的不同難度等級數(shù)據(jù)對模型進行進一步提升。

具體來說，在第一階段中，評級難度為中等的思維鏈數(shù)據(jù)是正確且適合小模型的思維鏈，小模型能夠有效理解該思維鏈并解決問題。而難度評級為簡單或困難的思維鏈數(shù)據(jù)依然是正確的思維鏈，只是不適合小模型。在此基礎(chǔ)上，我們使用模型將正確的推理過程改寫為一個錯誤的推理過程。錯誤的推理過程沒有邏輯性，且會誤導(dǎo)小模型，使得小模型完全無法遵循該錯誤的推理過程解決問題。

基于改寫得到的錯誤思維鏈，我們將其與簡單、中等和困難的思維鏈進行兩兩組合，組成多種偏好數(shù)據(jù)對。這些偏好數(shù)據(jù)對中有的偏差大，有的偏差小。基于不同種類的偏好數(shù)據(jù)對及其特點，我們分別使用針對性的參數(shù)配置，在第一階段模型的基礎(chǔ)上，采用 DPO 算法進一步優(yōu)化小模型的推理能力。

最終，我們利用第一階段得到的不同難度等級的認知軌跡（思維鏈）數(shù)據(jù)以及基礎(chǔ)模型結(jié)果，得到了 DistilQwen2.5-R1 系列模型。

DistilQwen2.5-R1 模型效果評測

在本節(jié)中，我們從多個角度評測 DistilQwen2.5-R1 系列蒸餾小模型的實際效果；同時，我們將 DistilQwen2.5-R1 系列模型和當前業(yè)界的前沿模型對比效果。

模型綜合能力評測

我們在多個模型推理能力評測基準上測試了 DistilQwen2.5-R1 系列模型的能力，涵蓋數(shù)學(xué)、代碼和科學(xué)問題三個主流推理領(lǐng)域。

在數(shù)學(xué)領(lǐng)域，我們使用 AIME2024 和 MATH-500 這兩個基準進行測試，AIME2024 是美國數(shù)學(xué)邀請賽的2024年測試集，包含30道高難度數(shù)學(xué)題，用于評估大語言模型在復(fù)雜數(shù)學(xué)推理和問題解決能力，尤其考察代數(shù)、幾何等領(lǐng)域的綜合應(yīng)用。MATH-500 是一個數(shù)學(xué)推理能力的基準測試，包含500個測試樣本，旨在全面考察模型在數(shù)學(xué)解題上的能力。它與 AIME2024 類似，但有其獨特的測試目標和對比結(jié)果，用于衡量模型在不同數(shù)學(xué)題目上的準確性。

在代碼領(lǐng)域，我們使用 LiveCodeBench 基準，LiveCodeBench 是一個動態(tài)更新的基準測試平臺，用于全面評估大型語言模型在復(fù)雜編碼場景中的能力。它通過從頂級競賽平臺收集高難度編程任務(wù)來測試模型的代碼生成、自我修復(fù)代碼執(zhí)行和測試等能力，是一個綜合性、無污染的評價基準。在本次評測中，我們使用 LiveCodeBench 基準的V2版本，其包含2023年5月-2024年5月的511個代碼問題。

在科學(xué)問題領(lǐng)域，我們使用 GPQA-Diamond（Grade-Level Problems in Question Answering Diamond）基準，其由紐約大學(xué)、CohereAI 及 Anthropic 的研究人員聯(lián)合發(fā)布，包含198條結(jié)果，是 GPQA 系列中最高質(zhì)量的評測數(shù)據(jù)，用于評估模型解決專家級科學(xué)問題的能力。

如下圖所示，DistilQwen2.5-R1 系列模型在3B、7B、14B和32B四個參數(shù)量級的模型中，與原始 Qwen2.5 模型的效果進行了對比?？梢钥闯?，本文描述的小型推理模型訓(xùn)練框架顯著提升了現(xiàn)有語言模型的推理能力，并在多個評測基準上取得了一致而明顯的效果提升。

AIME2024實驗結(jié)果對比：

MATH-500實驗結(jié)果對比：

GPQA Diamond實驗結(jié)果對比：

LiveCodeBench V2實驗結(jié)果對比：

與其他模型能力對比

為了橫向比較同期發(fā)布的不同參數(shù)規(guī)模的推理模型效果，下表分別是 DistilQwen2.5-R1 系列模型在各個參數(shù)量級上與其他前沿推理模型在上文提到的4個基準的評測結(jié)果。我們重點對比了 DistilQwen2.5-R1 系列與 OpenThinker、DeepSeek-R1-Distill-Qwen等系列模型。

以下是7B量級的對比結(jié)果，可以看出，DistilQwen2.5-R1-7B 模型超越了 Bespoke-Stratos-7B 和 OpenThinker-7B。值得注意的是，相較于 OpenThinker-7B，DistilQwen2.5-R1-7B 在使用更少訓(xùn)練數(shù)據(jù)的情況下在所有基準上達到了更高的結(jié)果。DeepSeek-R1-Distill-Qwen-7B 使用了800k閉源訓(xùn)練數(shù)據(jù)，而 DistilQwen2.5-R1-7B 使用了開源數(shù)據(jù)進行訓(xùn)練（OpenThoughts數(shù)據(jù)集過濾和改寫得到的子集），在基于開源數(shù)據(jù)模型領(lǐng)域內(nèi)處于領(lǐng)先地位。

模型	訓(xùn)練數(shù)據(jù)量	AIME2024	MATH-500	GPQA Diamond	LiveCodeBench V2
DeepSeek-R1-Distill-Qwen-7B (reported)	800k	55.5	92.8	49.1	-
Bespoke-Stratos-7B (reported)	17k	20.0	82.0	37.8	36.1
OpenThinker-7B (reported)	114k	31.3	83.0	42.4	39.9
DistilQwen2.5-R1-7B	105k	43.33	88.4	42.93	46.38

以下是32B量級的對比結(jié)果。同樣地，DistilQwen2.5-R1-32B 在所有已知基準上超越了 Sky-T1-32B-Preview，以及在絕大多數(shù)基準上超越了 OpenThinker-32B。

模型	訓(xùn)練數(shù)據(jù)量	AIME2024	MATH-500	GPQA Diamond	LiveCodeBench V2
DeepSeek-R1-Distill-Qwen-32B (reported)	800k	72.6	94.3	62.1	-
Sky-T1-32B-Preview (reported)	17k	43.3	86.4	56.8	-
OpenThinker-32B (reported)	114k	66.0	90.6	61.6	68.9
DistilQwen2.5-R1-32B	105k	70.0	93.8	62.12	65.95

模型多次推理評測

我們還測試了 DistilQwen2.5-R1 系列模型在上文提到的四個基準上多次推理的結(jié)果，模型會對同一個問題生成k個回答進行評測，即 Pass@k 指標。以下是 DistilQwen2.5-R1-7B 和 DistilQwen2.5-R1-32B 在四個基準上Pass@k結(jié)果（k=2、4、8、16、32、64）。

可以看出，隨著模型推理次數(shù)k的逐步增加，兩個模型在所有基準上的評測準確率大幅提高。值得注意的是，隨著k的增加，DistilQwen2.5-R1-7B 在 MATH-500和GPQA-Diamond 上漲幅巨大，并且不斷逼近 DistilQwen2.5-R1-32B 水準。這表明我們的推理模型訓(xùn)練框架在小模型領(lǐng)域內(nèi)擁有巨大潛力。我們可以通過多次推理的方式使7B模型擁有媲美32B模型的能力，極大減少了推理所需的計算資源。

模型輸出案例

對同一數(shù)學(xué)問題，我們對比了 DistilQwen2.5-R1 系列模型在7B、32B量級和同等量級模型的推理結(jié)果。從輸出結(jié)果可以看出，DistilQwen2.5-R1 系列模型在同量級推理模型中處于領(lǐng)先地位。

示例一：7B量級模型對比

問題	Peter has $70 and wishes to spend an equal amount each day for one week. From Sunday through Wednesday, he spent his money on wooden action figures which cost $5 each. For the rest of the week, he will buy plastic action figures which cost $2 each. How many total action figures will he have by the end of the week?
答案	There are 7 days in a week so if he wishes to spend an equal amount of $70 each day that would be $70/7 = $<<70/7=10>>10\nFor four days (Sunday through Wednesday) he will spend 4$10 = $<<410=40>>40\nHe will spend $40 on wooden action figures of $5 each which gives $40/$5 = <<40/5=8>>8 wooden action figures\nFor the remaining 3 days he will spend 3$10 = $<<310=30>>30\nHe will spend $30 on plastic action figures of $2 each which gives $30/$2 = <<30/2=15>>15 plastic action figures\nAltogether, he will have 8+15 = <<8+15=23>>23 action figures.\n#### 23
DistilQwen2.5-R1-7B	<\|begin_of_thought\|> Okay, let's try to figure out how many action figures Peter can buy. First, I need to understand the problem step by step. So, Peter has $70 and wants to spend an equal amount each day for one week. That means seven days total. From Sunday through Wednesday, he spends money on wooden action figures that cost $5 each. Then, for the rest of the week (which would be Thursday and Friday, I guess, since the week ends on Saturday or Sunday?), he buys plastic ones that cost $2 each. The question is asking how many total action figures he'll have by the end of the week. Wait, the problem says "from Sunday through Wednesday," so that's four days. Then "for the rest of the week," which would be Thursday and Friday, making two days. So total days: 4 + 2 = 6 days? Wait, a week is seven days. If he's spending from Sunday through Wednesday (4 days), then the rest would be Thursday and Friday (2 days), but that's only 6 days. Wait, maybe the "rest of the week" includes Saturday as well? Because Sunday to Wednesday is four days, then Thursday to Saturday would be three days left. Hmm, maybe I need to clarify that. Wait, the problem says "for the rest of the week," which could be interpreted differently. Let me check again. The problem states: "From Sunday through Wednesday, he spent his money on wooden action figures..." So that's four days. Then "For the rest of the week, he will buy plastic action figures which cost $2 each." So "rest of the week" would be the remaining days of the week after Sunday through Wednesday. Since Sunday is the first day, the week would be Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday. So from Sunday to Wednesday is four days. The rest would be Thursday, Friday, Saturday. That's three days. So total days: 4 + 3 = 7 days, which is a week. Okay, so that makes sense. So he spends four days on wooden figures ($5 each) and three days on plastic ones ($2 each). The total amount he spends is $70, and he wants to spend an equal amount each day. So first, we need to figure out how much he spends each day, then calculate how many action figures he can buy each day, and sum them up. Wait, but the problem says he has $70 and wishes to spend an equal amount each day for one week. So the total amount he spends is $70, divided equally over seven days. So each day he spends $70 / 7 = $10 per day. So $10 per day total for the week. But then, from Sunday through Wednesday (four days), he spends on wooden figures ($5 each), and the remaining three days (Thursday, Friday, Saturday) he spends on plastic figures ($2 each). So the total amount he spends on wooden figures is 4 days * $10/day = $40. Then the remaining amount is $70 - $40 = $30, which he spends over three days at $2 each. So $30 / 3 days = $10 per day, which matches the total daily spending of $10. So that checks out. Now, how many wooden action figures does he buy? Each costs $5, and he spends $10 on them each day for four days. Wait, but if each day he spends $10 on wooden figures, then each day he can buy $10 / $5 = 2 action figures per day. So over four days, that's 2 * 4 = 8 wooden action figures. Then, for the plastic figures, he spends $10 per day for three days. Each plastic figure is $2, so per day he can buy $10 / $2 = 5 action figures per day. Over three days, that's 5 * 3 = 15 plastic action figures. Total action figures: 8 + 15 = 23. Wait, but let me verify this again. Let's check the total money spent. Wooden: 8 figures * $5 = $40. Plastic: 15 figures * $2 = $30. Total: $40 + $30 = $70. Which matches the total he has. So that's correct. Alternatively, another way to look at it: since he spends $10 each day, total $70 over 7 days. So each day he spends $10. For the first four days, he uses $10 each day on $5 action figures. So each day, $10 / $5 = 2 per day. 4 days: 8. Then the remaining three days, $10 each day on $2 action figures: $10 / $2 = 5 per day. 3 days: 15. Total 23. Yes, that seems to be the answer. <\|end_of_thought\|> <\|begin_of_solution\|> Peter has $70 to spend over 7 days. He spends $10 each day ($70 ÷ 7 = $10/day). Wooden action figures (Sunday–Wednesday, 4 days): Each costs $5. Daily purchase: $10 ÷ $5 = 2 figures/day. Total: 2 × 4 = 8 figures. Plastic action figures (Thursday–Saturday, 3 days): Each costs $2. Daily purchase: $10 ÷ $2 = 5 figures/day. Total: 5 × 3 = 15 figures. Total action figures: 8 (wooden) + 15 (plastic) = 23. \boxed{23} <\|end_of_solution\|>
DeepSeek-R1-Distill-Qwen-7B	<think> Okay, let's see. Peter has $70 and wants to spend an equal amount each day for one week. So first, I need to figure out how much he spends each day. A week has seven days, right? From Sunday through Wednesday, he buys wooden action figures that cost $5 each. Then, for the rest of the week, which would be Thursday and Friday, he buys plastic ones that cost $2 each. The question is asking how many total action figures he'll have by the end of the week. Wait, hold on. Wait, a week is seven days. Sunday through Wednesday is four days, and then the rest is Thursday and Friday, that's two days. So total of six days? Wait, Sunday, Monday, Tuesday, Wednesday is four days. Then Thursday and Friday, that's two days. So total of six days he's spending money, leaving Saturday out? The problem says "for one week," so maybe he starts on Sunday and goes through Saturday? But the problem mentions spending from Sunday through Wednesday, and then the rest of the week. Hmm. Wait, the problem says: "From Sunday through Wednesday, he spent his money on wooden action figures which cost $5 each. For the rest of the week, he will buy plastic action figures which cost $2 each." So Sunday to Wednesday is four days, then the rest of the week would be Thursday to Saturday, which is three days. Wait, that's three days. So total of seven days? Wait, no. Sunday to Saturday is a week. If Sunday through Wednesday is four days, then the remaining days are Thursday, Friday, Saturday, which is three days. So total of seven days. So he spends money on four days buying wooden figures and three days buying plastic ones. But the problem says he spends an equal amount each day for one week. So total spending per day is consistent. So first, I need to find out how much he spends each day. Let me denote the daily amount as x dollars. So total spending for the week would be 7x because it's one week. But he has $70. So 7x = $70. Therefore, x = 10. So he spends $10 each day. Wait, but that seems high. Let me check. If he spends $10 each day for seven days, that's $70 total. That makes sense. So each day he spends $10. Then, from Sunday through Wednesday, which is four days, he spends 4 days * $10/day = $40 on wooden action figures. Then, for the rest of the week, which is three days (Thursday, Friday, Saturday), he spends 3 days * $10/day = $30 on plastic action figures. So total $40 + $30 = $70, which matches his total money. That checks out. Now, the problem is asking for the total number of action figures he will have by the end of the week. So we need to calculate how many wooden figures he bought during the four days and how many plastic figures he bought during the three days, then sum them up. First, wooden action figures cost $5 each. He spent $40 on them. So the number of wooden figures is $40 divided by $5 per figure. That's 40 / 5 = 8 figures. Then, plastic action figures cost $2 each. He spent $30 on them. So the number of plastic figures is $30 divided by $2 per figure. That's 30 / 2 = 15 figures. Adding them together: 8 wooden + 15 plastic = 23 total action figures. Wait a second, but let me verify again to make sure I didn't make a mistake. So total days are seven. He spends $10 each day. Total $70. From Sunday to Wednesday: four days. 4 days * $10 = $40. Then Thursday, Friday, Saturday: three days * $10 = $30. Correct. Then wooden figures: $40 / $5 = 8. Plastic: $30 / $2 = 15. Total 23. That seems right. But wait, the problem says "for the rest of the week, he will buy plastic action figures which cost $2 each." So does "rest of the week" mean the remaining days after Sunday through Wednesday? Sunday through Wednesday is four days, so the remaining three days would be Thursday, Friday, Saturday. So three days. So three days of $10 each day, totaling $30. Correct. Therefore, the total number of action figures is 8 + 15 = 23. So the answer is 23. </think> Peter spends $10 each day for 7 days, totaling $70. From Sunday to Wednesday (4 days), he buys wooden action figures at $5 each: \[ \frac{4 \text{ days} \times \$10}{\$5/\text{figure}} = \frac{\$40}{\$5} = 8 \text{ figures}. \] For the remaining 3 days (Thursday to Saturday), he buys plastic action figures at $2 each: \[ \frac{3 \text{ days} \times \$10}{\$2/\text{figure}} = \frac{\$30}{\$2} = 15 \text{ figures}. \] Total action figures: \[ 8 + 15 = \boxed{23}. \]

示例二：32B量級模型對比

問題	A shoe store was having a weekend sale on a brand of popular tennis shoes. On Friday the store sold 14 pairs of tennis shoes. The next day they sold double that number of shoes. On the last day of the sale they sold one-half the amount that they did the day before, but six people returned their pairs because they didn't fit. How many pairs of tennis shoes were sold by the end of the sale?
答案	On Saturday 142=<<142=28>>28 shoes were sold\nOn Sunday 28/2=<<28/2=14>>14 shoes were sold\nBefore the returns were made 14+28+14=<<14+28+14=56>>56 pairs of shoes were sold\nAfter the returns were made 56-6=<<56-6=50>>50 pairs of shoe were sold\n#### 50
DistilQwen2.5-R1-32B 回復(fù)	<\|begin_of_thought\|> Okay, let me try to figure out this problem step by step. So, the shoe store had a sale on tennis shoes over the weekend. Let's break down the information given. First, on Friday, they sold 14 pairs of tennis shoes. That's straightforward. Then, the next day, which would be Saturday, they sold double that number. Double of 14 is 28, right? So, 28 pairs on Saturday. Then, on the last day of the sale, which is Sunday, they sold half the amount they did the day before. The day before Sunday is Saturday, so half of 28 is 14. So, 14 pairs on Sunday. But wait, there's a catch here. Six people returned their pairs because the shoes didn't fit. So, we need to subtract those 6 returned pairs from the Sunday sales. That means instead of 14, they effectively sold 14 minus 6, which is 8 pairs on Sunday. Now, to find the total number of tennis shoes sold by the end of the sale, we need to add up the sales from Friday, Saturday, and Sunday, taking into account the returns on Sunday. So, Friday: 14 pairs. Saturday: 28 pairs. Sunday: 14 - 6 = 8 pairs. Adding those together: 14 + 28 + 8. Let me calculate that. 14 plus 28 is 42, and then 42 plus 8 is 50. So, total pairs sold would be 50. Wait, but let me double-check to make sure I didn't make a mistake. Let's verify each step again. Friday: 14 pairs. Correct. Saturday is double, so 14*2=28. Correct. Sunday is half of Saturday's sales, which is 28/2=14. But then subtract the 6 returned pairs. So 14-6=8. Then total is 14+28+8=50. Yeah, that seems right. Hmm, another way to look at it: Total sold without returns would be 14 + 28 +14 = 56. Then subtract the 6 returned pairs, which would be 56 -6=50. Same result. So, that confirms it. I think that's all. So the final answer should be 50 pairs of tennis shoes sold. <\|end_of_thought\|> <\|begin_of_solution\|> The shoe store sold 14 pairs on Friday. On Saturday, they sold double that number, which is $14 \times 2 = 28$ pairs. On Sunday, they sold half of Saturday's sales, which is $28 \div 2 = 14$ pairs, but then subtract the 6 pairs returned. Therefore, the effective sales on Sunday are $14 - 6 = 8$ pairs. Adding up the sales from all three days: $14$ (Friday) + $28$ (Saturday) + $8$ (Sunday) = $14 + 28 + 8 = 50$. The final answer is $\boxed{50}$. <\|end_of_solution\|>
DeepSeek-R1-Distill-Qwen-32B 回復(fù)	<think> Okay, let's try to figure out how many tennis shoes were sold by the end of the sale. So, the problem is about a shoe store that had a weekend sale. I'll need to break it down day by day to make sure I understand each part correctly. First, on Friday, the store sold 14 pairs of tennis shoes. That seems straightforward. So, Friday sales are 14 pairs. Next, the problem says that on the next day, which would be Saturday, they sold double that number. Hmm, double of Friday's sales. So, double of 14 is 14 times 2. Let me calculate that: 14 * 2 = 28. So, they sold 28 pairs on Saturday. Now, moving on to the last day of the sale, which is Sunday. The problem states that on Sunday, they sold half the amount they did the day before. The day before Sunday is Saturday, so half of Saturday's sales. Saturday was 28 pairs, so half of that would be 28 / 2. Let me do that math: 28 divided by 2 is 14. So, they sold 14 pairs on Sunday before any returns. But wait, the problem mentions that six people returned their pairs because they didn't fit. So, we need to subtract those returns from Sunday's sales. So, Sunday's total after returns would be 14 minus 6. Let me calculate that: 14 - 6 = 8. So, they effectively sold 8 pairs on Sunday after the returns. Now, to find the total number of pairs sold by the end of the sale, I need to add up the sales from each day. That would be Friday's 14, plus Saturday's 28, plus Sunday's 8. Let me add them up step by step. First, 14 (Friday) + 28 (Saturday) = 42. Then, adding Sunday's 8 to that: 42 + 8 = 50. Wait, does that make sense? Let me double-check each step to make sure I didn't make a mistake. Friday was 14, Saturday was 28, Sunday was 14 before returns, which became 8 after subtracting 6. So, 14 + 28 is indeed 42, and 42 + 8 is 50. Yeah, that seems correct. But just to be thorough, maybe I should think about whether the returns affect the total sold or just the net sales. The problem says "they sold one-half the amount that they did the day before, but six people returned their pairs." So, I think the six returns are subtracted from Sunday's sales, so the total sold would be the initial sales minus the returns. Therefore, the calculation I did is correct. So, adding all three days together: 14 + 28 + (14 - 6) = 14 + 28 + 8 = 50. Yep, that looks right. </think> The total number of pairs sold by the end of the sale is 50. The final answer is 50.

模型下載和使用

DistilQwen2.5-R1 在阿里云星空人工智能平臺 PAI 上的實踐

以下 HuggingFace transformers 庫為例，簡要介紹如何在 PAI-DSW 上使用 DistilQwen2.5-R1 模型。首先需要保證 PAI-DSW 鏡像內(nèi) transformers 版本大于等于4.37.0，否則會在加載模型時報錯：

KeyError: 'qwen2'

以 DistilQwen2.5-R1-7B 為例，我們可以使用如下代碼調(diào)用模型：

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "alibaba-pai/DistilQwen2.5-R1-7B"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "xxxxx"
messages=[
    {"role": "system", "content": "Your role as an assistant involves thoroughly exploring questions through a systematic long thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution. In the Thought section, detail your reasoning process using the specified format: <|begin_of_thought|> {thought with steps separated with '\n\n'} <|end_of_thought|> Each step should include detailed considerations such as analisying questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The solution should remain a logical, accurate, concise expression style and detail necessary step needed to reach the conclusion, formatted as follows: <|begin_of_solution|> {final formatted, precise, and clear solution} <|end_of_solution|> Now, try to solve the following question through the above guidelines:"},
    {"role": "user", "content": prompt},
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

DistilQwen2.5-R1在開源社區(qū)的下載

我們在 Hugging Face 和 Model Scope 上開源了我們蒸餾后的模型，分別為DistilQwen2.5-R1-3B、DistilQwen2.5-R1-7B、DistilQwen2.5-R1-14B、DistilQwen2.5-R1-32B。以Hugging Face為例，用戶可以使用如下代碼下載這兩個模型：

from huggingface_hub import snapshot_download

model_name = "alibaba-pai/DistilQwen2.5-R1-3B"
snapshot_download(repo_id=model_name, cache_dir="./DistilQwen2.5-R1-3B/")

model_name = "alibaba-pai/DistilQwen2.5-R1-7B"
snapshot_download(repo_id=model_name, cache_dir="./DistilQwen2.5-R1-7B/")

model_name = "alibaba-pai/DistilQwen2.5-R1-14B"
snapshot_download(repo_id=model_name, cache_dir="./DistilQwen2.5-R1-14B/")

model_name = "alibaba-pai/DistilQwen2.5-R1-32B"
snapshot_download(repo_id=model_name, cache_dir="./DistilQwen2.5-R1-32B/")

小結(jié)與未來工作

本文介紹了 DistilQwen2.5-R1 系列深度推理模型，它在少量來自 DeepSeek-R1 的思維鏈數(shù)據(jù)基礎(chǔ)上，通過創(chuàng)新蒸餾策略增強了小模型的深度思考能力。實驗結(jié)果表明，該系列模型在多個基準測試中表現(xiàn)出色，尤其是 DistilQwen2.5-R1-7B 的性能全面超越了其他開源蒸餾模型。為了方便實際應(yīng)用，這些模型的 Checkpoint 已在 Hugging Face 和 Model Scope 社區(qū)中公開，并提供了在阿里云星空人工智能平臺 PAI 上的操作指南。在未來，隨著大語言模型和知識蒸餾技術(shù)更進一步的發(fā)展，我們將推出各種領(lǐng)域、各種規(guī)格的 DistilQwen 系列模型，充分促進大語言模型在實際應(yīng)用中的降本增效。

參考資料

相關(guān)發(fā)表論文

1. Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang. Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud. COLING 2025

2. Yuanhao Yue, Chengyu Wang, Jun Huang, Peng Wang. Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning. EMNLP 2024

技術(shù)文章

1. DistilQwen2.5發(fā)布：通義千問蒸餾小模型再升級：https://developer.aliyun.com/article/1653842

2. DistilQwen2：通義千問大模型的知識蒸餾實踐：https://developer.aliyun.com/article/1633882

3. DistilQwen2蒸餾小模型的訓(xùn)練、評測、壓縮與部署實踐：https://help.aliyun.com/zh/pai/user-guide/training-evaluation-compression-and-deployment-of-distilqwen2

4. 大語言模型數(shù)據(jù)增強與模型蒸餾解決方案：https://help.aliyun.com/zh/pai/user-guide/llm-data-enhancement-and-model-distillation-solution

繼續(xù)閱讀：

星空人工智能技術(shù)網(wǎng) 倡導(dǎo)尊重與保護知識產(chǎn)權(quán)。如發(fā)現(xiàn)本站文章存在版權(quán)等問題，煩請30天內(nèi)提供版權(quán)疑問、身份證明、版權(quán)證明、聯(lián)系方式等發(fā)郵件至1851688011@qq.com我們將及時溝通與處理。?。?a href="/">首頁 > 星空人工智能產(chǎn)業(yè) > AI大模型 » DistilQwen2.5-R1發(fā)布：知識蒸餾助推小模型深度思考

99热综合福利导航,久久66日韩,91一二区少妇,久久产国视频,日韩久久久五月精品八区,丰满狐狸精在线电影,一区婷婷久久,日韩欧美另类在线,欧美中文字幕区

星空人工智能技術(shù)網(wǎng)