目前看大模型厂家各自最好的模型的API,DeepSeek-V2的性价比遥遥领先。
相对性能是用模型在榜单中的得分除以GPT-4-Turbo-0409的得分;有Arena Elo得分的只用Arena Elo,没有Arena Elo有OpenCompass就用OpenCompass,这两都没有的用SuperCLUE。
性价比计算用相对性能分数除以输出价格,因为使用API一般输出token数量要远多余输入。
序号 | 模型名称 | 公司/提供方 | 输入价格(元/百万tokens) | 输出价格(元/百万tokens) | Arena Elo | OpenCom-pass客观综合 | OpenCom-pass主观综合 | SuperCLUE | 相对GPT-4-Turbo-0409性能 | 性能/输出价格 |
---|---|---|---|---|---|---|---|---|---|---|
1 | DeepSeek-V2 | 深度求索 | 1 | 2 | 56.5 | 44.5 | 0.893805 | 0.446903 | ||
2 | Qwen1.5-72B | 阿里云 | 5 | 10 | 1147 | 54.5 | 37.5 | 68.04 | 0.913217 | 0.091322 |
3 | Moonshot-v1 -8k | 月之暗面 | 12 | 12 | 53.7 | 43.1 | 70.42 | 0.856637 | 0.071386 | |
4 | Qwen1.5-110B | 阿里云 | 7 | 14 | 1163 | 56.8 | 47.4 | 0.925955 | 0.06614 | |
5 | yi-large | 零一万物 | 20 | 20 | 1239 | 74.29 | 0.986465 | 0.049323 | ||
6 | abab6.5 | MiniMax | 30 | 30 | 57.8 | 45.8 | 0.916814 | 0.03056 | ||
7 | Spark3.5 Max | 讯飞星火 | 30 | 30 | 50.3 | 48.1 | 69.43 | 0.870796 | 0.029027 | |
8 | Llama-3-70b-Instruct | 未提供 | 27 | 82 | 1153 | 0.917994 | 0.011195 | |||
9 | Baichuan4 | 百川 | 100 | 100 | 80.64 | 1.047001 | 0.01047 | |||
10 | Mistral-large | Mistral | 29.8 | 88.6 | 1153 | 53.4 | 28.8 | 0.917994 | 0.010361 | |
11 | GPT-4o | OpenAI | 36.2 | 108.6 | 1287 | 64.2 | 51.1 | 1.024682 | 0.009435 | |
12 | 混元-pro | 腾讯 | 30 | 100 | 72.12 | 0.93638 | 0.009364 | |||
13 | GLM-4 | 智谱AI | 100 | 100 | 1175 | 57.8 | 44.2 | 72.58 | 0.93551 | 0.009355 |
14 | Claude 3Opus | Anthropic | 74.47 | 108.6 | 1246 | 60.5 | 48.1 | 74.47 | 0.992038 | 0.009135 |
15 | Qwen-Max | 阿里云 | 40 | 120 | 1186 | 55.6 | 50 | 72.45 | 0.944268 | 0.007869 |
16 | ERNIE 4.0-8K | 百度 | 120 | 120 | 54.7 | 46.6 | 71.9 | 0.89646 | 0.007471 | |
17 | Gemini 1.5 Pro | 谷歌 | 50.7 | 152 | 1248 | 0.993631 | 0.006537 | |||
18 | GPT-4-Turbo-0409 | OpenAI | 72 | 217 | 1256 | 63.1 | 49.9 | 77.02 | 1 | 0.004608 |
19 | GPT-4-Turbo-1106 | OpenAI | 72 | 217 | 1251 | 62 | 50 | 0.996019 | 0.00459 | |
20 | Claude 3 Sonnet | Anthropic | 21.7 | 108.6 | 1199 | |||||
21 | Claude 3 Haiku | Anthropic | 2 | 9 | ||||||
22 | abab6.5s | MiniMax | 10 | 10 | ||||||
23 | abab6.5g | MiniMax | 5 | 5 | ||||||
24 | GPT-4-0613 | OpenAI | 217 | 434 | ||||||
25 | GPT-3.5 | OpenAI | 11 | 14 | ||||||
26 | qwen-Long | 阿里云 | 0.5 | 2 | ||||||
27 | qwen-turbo | 阿里云 | 2 | 6 | ||||||
28 | qwen-plus | 阿里云 | 4 | 12 | ||||||
29 | qwen-max-0428 | 阿里云 | 40 | 120 | ||||||
30 | qwen-max-0403 | 阿里云 | 40 | 120 | ||||||
31 | qwen-max-0107 | 阿里云 | 40 | 120 | ||||||
32 | qwen-max-1201 | 阿里云 | 120 | 120 | ||||||
33 | qwen-max-longcontext | 阿里云 | 40 | 120 | ||||||
34 | Baichuan3-Turbo | 百川 | 12 | 12 | ||||||
35 | Baichuan3-Turbo-128k | 百川 | 24 | 24 | ||||||
36 | Baichuan2-Turbo | 百川 | 8 | 8 | ||||||
37 | Baichuan2-Turbo-192k | 百川 | 16 | 16 | ||||||
38 | Baichuan2-53B | 百川 | 20 | 20 | ||||||
39 | ERNIESpeed | 百度 | 0 | 0 | ||||||
40 | ERNIELite | 百度 | 0 | 0 | ||||||
41 | yi-large-turbo | 零一万物 | 12 | 12 | ||||||
42 | yi-large-rag | 零一万物 | 25 | 25 | ||||||
43 | yi-medium | 零一万物 | 2.5 | 2.5 | ||||||
44 | yi-medium-200k | 零一万物 | 12 | 12 | ||||||
45 | yi-spark | 零一万物 | 1 | 1 | ||||||
46 | yi-vision | 零一万物 | 6 | 6 | ||||||
47 | 混元-lite | 腾讯 | 0 | 0 | ||||||
48 | 混元-standard | 腾讯 | 4.5 | 5 | ||||||
49 | 混元-standard-256k | 腾讯 | 15 | 60 | ||||||
50 | Mixtral 8x22B | 未提供 | 14 | 43 | ||||||
51 | Moonshot-v1 -128k | 月之暗面 | 60 | 60 | ||||||
52 | Moonshot-v1 -32k | 月之暗面 | 24 | 24 | ||||||
53 | GLM-4-0520 | 智谱AI | 100 | 100 | ||||||
54 | GLM-4-Air | 智谱AI | 1 | 1 | ||||||
55 | GLM-4-Airx | 智谱AI | 10 | 10 | ||||||
56 | GLM-4-Flash | 智谱AI | 0.1 | 0.1 | ||||||
57 | GLM-4V | 智谱AI | 50 | 50 | ||||||
58 | GLM-3-Turbo | 智谱AI | 1 | 1 | ||||||
59 | Doubao-pro-4k | 字节跳动 | 0.8 | 2 | ||||||
60 | Doubao-pro-32k | 字节跳动 | 0.8 | 2 | ||||||
61 | Doubao-pro-128k | 字节跳动 | 5 | 9 | ||||||
62 | Doubao-lite-4k | 字节跳动 | 0.3 | 0.6 | ||||||
63 | Doubao-lite-32k | 字节跳动 | 0.3 | 0.6 | ||||||
64 | Doubao-lite-128k | 字节跳动 | 0.8 | 1 | ||||||
65 | Doubao-embedding | 字节跳动 | 0.5 | 0.5 |
本文作者:tsingk
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!