## MMLU (Massive Multitask Language Understanding)

인공지능 모델이 획득한 지식을 측정하는 벤치마크이다. 약 57개의 주제(STEM, the humanities, the social sciences 등)에 대해 다지선다 문제를 푸는 테스트이다. 특히 zero-shot 환경이나 few-shot 환경에 맞게 되어있다고 한다.

현재 존재하는 모델들의 점수는 이곳에서 볼 수 있다. 현재 GPT-4가 86.4%로 최고이다.

[Papers with Code - MMLU Benchmark (Multi-task Language Understanding)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu)

공식 github는 여기에서 확인할 수 있다.

[GitHub - hendrycks/test: Measuring Massive Multitask Language Understanding | ICLR 2021](https://github.com/hendrycks/test)

![](https://server.tilnote.io/images/pages/df65c70a-299b-4e89-b417-44cd82926458.png)

이미지 출처 : Papers with Code 홈페이지

챗봇 아레나에 가면 claude가 포함된 것 까지 볼 수 있다.

[Chatbot Arena Leaderboard - a Hugging Face Space by lmsys](https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard)

![](https://server.tilnote.io/images/pages/b7c0bcc4-623e-465a-a9a3-704a6aa1b7d4.png)

<h2 id="MMLU-Massive-Multitask-Language-Understanding">MMLU (Massive Multitask Language Understanding)</h2>인공지능 모델이 획득한 지식을 측정하는 벤치마크이다. 약 57개의 주제(STEM, the humanities, the social sciences 등)에 대해 다지선다 문제를 푸는 테스트이다. 특히 zero-shot 환경이나 few-shot 환경에 맞게 되어있다고 한다.현재 존재하는 모델들의 점수는 이곳에서 볼 수 있다. 현재 GPT-4가 86.4%로 최고이다.<a href="https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu">Papers with Code - MMLU Benchmark (Multi-task Language Understanding)</a>공식 github는 여기에서 확인할 수 있다.<a href="https://github.com/hendrycks/test">GitHub - hendrycks/test: Measuring Massive Multitask Language Understanding | ICLR 2021</a><img src="https://server.tilnote.io/images/pages/df65c70a-299b-4e89-b417-44cd82926458.png" alt="MMLU 란 무엇인가? 다양한 분야의 성능을 측정하는 인공지능 벤치마크 image 1">이미지 출처 : Papers with Code 홈페이지챗봇 아레나에 가면 claude가 포함된 것 까지 볼 수 있다.<a href="https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard">Chatbot Arena Leaderboard - a Hugging Face Space by lmsys</a><img src="https://server.tilnote.io/images/pages/b7c0bcc4-623e-465a-a9a3-704a6aa1b7d4.png" alt="MMLU 란 무엇인가? 다양한 분야의 성능을 측정하는 인공지능 벤치마크 image 2">