LM Benchmarks Logo

LM Benchmarks

Beta
  Model  Size  Release  Updated  LBCode  Aider  HumEval  HumEval+  ArenaCode  LBReason  MMLU Pro  MMLU  GPQA  MATH  LBMath  Arena  AA Score  In Cost  Out Cost  Cutoff  Context  Org  Type 
Anthropic logo
Claude 3.5 SonnetLatestOct 22, 202467.1%84.2%93.7%85.9%132158.7%78.0%88.0%65.0%78.3%51.3%128280$3$15Apr 24200KAnthropicProprietary
Anthropic logo
Claude 3.5 SonnetOldJun 20, 202460.9%77.4%92.0%129558.7%76.1%88.7%59.4%71.1%53.3%1268$3$15Apr 24200KAnthropicProprietary
Alibaba logo
Qwen 2.5 Coder32BLatestSep 19, 202456.9%73.7%92.7%87.2%126447.3%79.0%41.0%72.0%46.0%122070$0.2$0.2Sep 24128KAlibabaOpen Source
Alibaba logo
Qwen 2.572BLatestSep 19, 202456.6%55.6%59.1%51.2%128346.0%58.1%86.1%45.9%62.1%52.4%125975$0.38$0.57Sep 24128KAlibabaOpen Source
Google logo
Gemini Exp 1114OldNov 14, 202452.4%60.9%132454.7%54.9%134332KGoogleProprietary
OpenAI logo
GPT 4oOldAug 6, 202451.4%71.4%90.0%87.2%127454.7%74.7%89.0%51.0%78.0%48.2%126577$2.5$10Oct 23128KOpenAIProprietary
Anthropic logo
Claude 3.5 HaikuLatestOct 22, 202451.4%75.2%88.1%29.3%65.0%81.0%41.6%73.0%35.5%69$1$5Apr 24200KAnthropicProprietary
OpenAI logo
o1 PreviewLatestSep 12, 202450.8%79.7%92.4%89.0%135468.0%91.0%73.3%85.0%62.9%133385$15$60Oct 23128KOpenAIProprietary
Google logo
Gemini Exp 1121LatestNov 21, 202450.4%57.9%134145.3%62.7%136532KGoogleProprietary
Alibaba logo
QwQ Preview32BLatestNov 28, 202450.0%75.0%65.2%AlibabaOpen Source
Google logo
Gemini 1.5 ProLatestSep 24, 202448.8%66.9%84.1%79.3%129046.0%75.8%86.0%61.0%86.5%57.4%130180$1.25$2.5Nov 232MGoogleProprietary
OpenAI logo
o1 miniLatestSep 12, 202448.0%70.7%92.4%89.0%136477.3%85.0%60.0%90.0%59.2%130882$3$12Oct 23128KOpenAIProprietary
Mistral logo
Mistral Large 2123BOldJul 24, 202447.1%60.2%91.3%62.2%126942.0%85.0%48.0%72.0%43.7%125074$2$6Jul 24128KMistralProprietary
OpenAI logo
GPT 4o1.8TLatestNov 20, 202446.1%72.9%90.0%135153.3%74.7%86.0%39.0%69.0%42.5%136171$2.5$10Oct 23128KOpenAIProprietary
DeepSeek logo
DeepSeek 2.5236BLatestSep 12, 202445.5%72.2%89.0%83.5%128839.3%65.8%81.0%42.0%55.0%122166$0.14$0.28Nov 23128KDeepSeekProprietary
Meta logo
Llama 3.1405BLatestJul 23, 202443.8%66.2%89.0%128453.3%73.3%87.0%51.1%73.8%40.5%126672$5$15Dec 23128KMetaOpen Source
OpenAI logo
GPT 4o mini8BLatestJul 18, 202443.1%55.6%87.2%83.5%128435.3%63.1%82.0%43.0%75.0%35.6%127271$0.15$0.6Oct 23128KOpenAIProprietary
Google logo
Gemini 1.5 FlashLatestSep 24, 202441.9%52.6%84.0%75.6%125450.0%67.3%81.0%50.0%79.0%47.2%126973$0.075$0.3Nov 231MGoogleProprietary
DeepSeek logo
DeepSeek Coder 2236BLatestJul 22, 202441.5%72.9%90.2%82.3%126745.3%63.6%80.0%42.0%58.0%121367$0.14$0.28Nov 23128KDeepSeekProprietary
Google logo
Gemini 1.5 ProOldAug 27, 202440.9%66.9%126249.3%56.1%1260Nov 232MGoogleProprietary
Google logo
Gemini 1.5 FlashOldAug 27, 202440.6%52.6%123247.3%1227Nov 231MGoogleProprietary
Anthropic logo
Claude 3 OpusLatestMar 4, 202438.6%68.4%84.9%77.4%125041.3%68.5%84.0%50.4%60.1%43.4%124870$15$75Aug 23200KAnthropicProprietary
Mistral logo
Mistral Large 3123BLatestNov 18, 202465.4%90.0%84.0%47.0%72.0%74$2$6128KMistralProprietary
xAI logo
Grok-2 miniLatestAug 13, 202437.5%54.9%85.7%126242.0%72.0%86.2%51.0%73.0%1266Aug 24128KxAIProprietary
xAI logo
Grok 2LatestAug 13, 202436.2%58.6%88.4%80.5%128736.2%75.5%87.5%56.0%76.1%129070$5$15Aug 24128KxAIProprietary
Meta logo
Llama 3.170BLatestJul 23, 202432.7%58.6%80.5%124440.7%62.8%84.0%43.0%60.0%34.4%124265$1$3Dec 23128KMetaOpen Source
Google logo
Gemini 1.5 Flash8BLatestSep 24, 202428.7%38.3%116333.3%58.7%38.4%58.7%1177$0.0375$0.15Nov 231MGoogleProprietary
Mistral logo
Mistral Nemo12BLatestJul 24, 202428.7%33.1%25.3%68.0%33.0%40.0%52$0.15$0.15Apr 24128KMistralProprietary
Anthropic logo
Claude 3 HaikuOldMar 7, 202424.5%47.4%75.9%118925.9%75.2%33.3%40.9%22.9%117954$0.25$1.25Aug 23200KAnthropicProprietary
Meta logo
Llama 3.18BLatestJul 23, 202419.7%37.6%72.6%62.8%117415.3%71.0%27.0%64.0%16.6%117153$0.3$0.6Dec 23128KMetaOpen Source
Mistral logo
Pixtral Large123BLatestNov 18, 202482.0%85.0%52.0%72.0%73$2$6128KMistralProprietary
Meta logo
Llama 3.23BLatestSep 25, 202426.3%104863.4%21.0%49.0%109346$0.08$0.1Dec 23128KMetaOpen Source
Meta logo
Llama 3.21BLatestSep 25, 2024100949.3%14.0%27.0%104428$0.04$0.08Dec 23128KMetaOpen Source
Meta logo
Llama 3.2 (Vision)90BLatestSep 25, 202486.0%43.0%68.0%67$0.9$0.9Dec 23128KMetaOpen Source
Meta logo
Llama 3.2 (Vision)11BLatestSep 25, 202473.0%26.0%51.9%54$0.19$0.19Dec 23128KMetaOpen Source
DeepSeek logo
R1 Lite PreviewLatestNov 20, 202458.5%91.6%DeepSeekProprietary
36 models