E
Measuring AI models needs an overhaul.
I often mention AI model benchmarks in posts, but Kevin Roose at The New York Times said the quiet part out loud: AI benchmark tests don’t help in comparing models, and these need to change.
Benchmarks cover a small amount of human knowledge, but as Roose points out, AI models easily surpass that. Training datasets sometimes include answers from benchmarking tests, so, of course, models beat the tests.
A.I. Has a Measurement Problem
[The New York Times]
Follow topics and authors from this story to see more like this in your personalized homepage feed and to receive email updates.
Loading comments
Getting the conversation ready...
Most Popular
Most Popular
- Our long national sunscreen nightmare is almost over
- Kaleidescape’s movie player blows streaming, and your wallet, away
- Barret Zoph is out at OpenAI again after just five months
- Midjourney goes from generating cat images to full-body ultrasound scans
- Hue’s wired wall modules bring non-smart lights into its ecosystem











