Within hours I paused an ongoing Opus 4.7 benchmark, swapped the API keys, and ran the exact same methodology on ...