20 points | by Brajeshwar 5 hours ago
2 comments
If we didn’t have the previous example I would interpret this as pretty solid evidence that labs were training on the Pelican “benchmark”.
I just can’t imagine a model dropping so significantly from one version to the next on such a silly task.
GLM-5.2 is the new leading open weights model on Artificial Analysis
https://news.ycombinator.com/item?id=48567759
If we didn’t have the previous example I would interpret this as pretty solid evidence that labs were training on the Pelican “benchmark”.
I just can’t imagine a model dropping so significantly from one version to the next on such a silly task.
GLM-5.2 is the new leading open weights model on Artificial Analysis
https://news.ycombinator.com/item?id=48567759