What did Elon Musk say about throw pillows? Most LLMs don’t know. But GPT does. And that’s a problem. Large language models typically perform so similarly that their differences can be measured by millimeters. But in some scenarios, these models are separated by miles.