Why AI Metrics Are Misleading
What is a benchmark? In the world of AI, we often find it helpful to talk about how well a model performs a task. You might, for example, hear a friend gush over how GPT4 is so much better compared to GPT3 because its answers are so much more accurate and coherent. In an attempt to rigorously assess model performance, researchers had devised a more systematic approach called “benchmarking”. Benchmarks help us evaluate how good our AI models are, for example:...