WHICH AI DRAWS THE BEST CAT?

Name: meowbench
Creator: Adnan
License: https://github.com/adoistic/meowbench/blob/main/LICENSE

an unreasonably rigorous investigation

29 models · 6 prompts · 4 attempts each · 3 vision judges · no retries

VOTE IN THE ARENA

HIGH SCORES

meowscore = judge panel 0–100 · crowd = your votes (elo) · click a row for the evidence

01 GPT-5.5 88.3 —

best cat9.5

minimal
8.5
realistic
8.0
action
9.2
style
9.2
constraint
8.8
animation
9.3

refusal rate 0% · avg 70 elements · current · US · closed · all 24 attempts →
02 Claude Fable 5 87.7 —

best cat9.0

minimal
8.4
realistic
8.3
action
9.0
style
9.0
constraint
9.0
animation
8.9

refusal rate 0% · avg 61 elements · current · US · closed · all 24 attempts →
03 Gemini 3.1 Pro 87.7 —

best cat9.3

minimal
8.2
realistic
8.5
action
9.0
style
9.0
constraint
9.0
animation
8.9

refusal rate 0% · avg 92 elements · current · US · closed · all 24 attempts →
04 Claude Opus 4.8 86.8 —

best cat9.0

minimal
8.9
realistic
7.8
action
8.5
style
9.0
constraint
9.0
animation
8.9

refusal rate 0% · avg 34 elements · current · US · closed · all 24 attempts →
05 Qwen3.7 Max 86.5 —

best cat9.3

minimal
8.3
realistic
7.8
action
8.9
style
9.0
constraint
8.9
animation
9.0

refusal rate 0% · avg 70 elements · current · CN · closed · all 24 attempts →
06 Claude Sonnet 5 86.0 —

best cat9.0

minimal
8.3
realistic
7.9
action
8.7
style
8.9
constraint
8.8
animation
9.0

refusal rate 0% · avg 48 elements · current · US · closed · all 24 attempts →
07 GLM-5.2 85.3 —

best cat9.3

minimal
8.2
realistic
8.0
action
8.9
style
9.2
constraint
7.9
animation
9.0

refusal rate 0% · avg 81 elements · current · CN · open · all 24 attempts →
08 DeepSeek V4 Pro 85.2 —

best cat9.0

minimal
8.2
realistic
7.8
action
8.9
style
8.8
constraint
8.8
animation
8.6

refusal rate 0% · avg 68 elements · current · CN · open · all 24 attempts →
09 MiniMax M2 83.7 —

best cat9.0

minimal
8.3
realistic
7.2
action
8.7
style
8.2
constraint
8.8
animation
9.0

refusal rate 0% · avg 30 elements · previous · CN · open · all 24 attempts →
10 Claude Sonnet 4 81.7 —

best cat9.0

minimal
9.0
realistic
6.7
action
7.9
style
8.0
constraint
8.7
animation
8.7

refusal rate 0% · avg 29 elements · previous · US · closed · all 24 attempts →
11 GLM-4.6 81.7 —

best cat9.0

minimal
8.4
realistic
6.6
action
7.8
style
8.9
constraint
8.4
animation
8.9

refusal rate 0% · avg 32 elements · previous · CN · open · all 24 attempts →
12 Kimi K2.6 81.7 —

best cat9.0

minimal
7.2
realistic
6.6
action
8.7
style
8.7
constraint
9.0
animation
8.8

refusal rate 0% · avg 96 elements · current · CN · open · all 24 attempts →
13 Kimi K2 81.5 —

best cat9.0

minimal
7.3
realistic
7.4
action
8.8
style
9.0
constraint
8.1
animation
8.3

refusal rate 0% · avg 52 elements · previous · CN · open · all 24 attempts →
14 Claude Opus 4 81.2 —

best cat9.0

minimal
8.6
realistic
6.8
action
8.1
style
8.3
constraint
8.4
animation
8.5

refusal rate 0% · avg 27 elements · previous · US · closed · all 24 attempts →
15 MiniMax M3 79.2 —

best cat9.3

minimal
8.3
realistic
7.8
action
4.4
style
8.9
constraint
8.9
animation
9.2

refusal rate 0% · avg 77 elements · current · CN · open · all 24 attempts →
16 Gemini 2.5 Pro 78.5 —

best cat9.0

minimal
7.9
realistic
6.7
action
8.0
style
7.8
constraint
9.0
animation
7.7

refusal rate 0% · avg 28 elements · previous · US · closed · all 24 attempts →
17 GPT-5 77.2 —

best cat9.0

minimal
8.3
realistic
7.8
action
4.4
style
8.2
constraint
8.7
animation
8.9

refusal rate 0% · avg 57 elements · previous · US · closed · all 24 attempts →
18 DeepSeek R1 73.5 —

best cat9.0

minimal
8.2
realistic
5.3
action
6.8
style
7.7
constraint
8.5
animation
7.6

refusal rate 0% · avg 21 elements · previous · CN · open · all 24 attempts →
19 Qwen3 Max 73.5 —

best cat8.5

minimal
7.9
realistic
6.1
action
6.7
style
7.7
constraint
8.3
animation
7.4

refusal rate 0% · avg 19 elements · previous · CN · closed · all 24 attempts →
20 Mistral Large (2512) 66.2 —

best cat8.5

minimal
5.8
realistic
6.4
action
4.9
style
8.4
constraint
6.9
animation
7.3

refusal rate 0% · avg 25 elements · previous · FR · open · all 24 attempts →
21 DeepSeek V3 65.5 —

best cat8.3

minimal
6.2
realistic
5.3
action
5.7
style
7.0
constraint
6.9
animation
8.2

refusal rate 0% · avg 16 elements · legacy · CN · open · all 24 attempts →
22 Mistral Large (2407) 62.2 —

best cat9.0

minimal
4.0
realistic
5.9
action
6.0
style
8.3
constraint
7.4
animation
5.7

refusal rate 0% · avg 24 elements · legacy · FR · open · all 24 attempts →
23 GPT-4o 59.5 —

best cat8.3

minimal
6.2
realistic
4.6
action
5.4
style
5.0
constraint
7.7
animation
6.8

refusal rate 0% · avg 14 elements · legacy · US · closed · all 24 attempts →
24 Qwen2.5 72B 45.8 —

best cat7.0

minimal
4.9
realistic
2.7
action
3.3
style
6.1
constraint
6.2
animation
4.3

refusal rate 0% · avg 11 elements · legacy · CN · open · all 24 attempts →
25 GPT-3.5 Turbo 45.7 —

best cat8.0

minimal
6.4
realistic
2.8
action
4.1
style
6.1
constraint
5.2
animation
2.8

refusal rate 0% · avg 9 elements · legacy · US · closed · all 24 attempts →
26 GPT-4 (original) 45.7 —

best cat7.5

minimal
6.0
realistic
2.3
action
4.2
style
4.5
constraint
6.1
animation
4.3

refusal rate 0% · avg 10 elements · legacy · US · closed · all 24 attempts →
27 Llama 4 Maverick 43.8 —

best cat6.3

minimal
4.2
realistic
4.0
action
4.6
style
3.4
constraint
5.4
animation
4.7

refusal rate 0% · avg 12 elements · previous · US · open · all 24 attempts →
28 Claude 3 Haiku 39.8 —

best cat6.3

minimal
4.2
realistic
2.9
action
3.2
style
4.4
constraint
5.4
animation
3.8

refusal rate 0% · avg 13 elements · legacy · US · closed · all 24 attempts →
29 Llama 3.1 70B 37.3 —

best cat7.5

minimal
2.7
realistic
3.9
action
3.1
style
4.8
constraint
2.5
animation
5.4

refusal rate 0% · avg 14 elements · legacy · US · open · all 24 attempts →