Vending Machine Test Proves AI Technology is Something to be Worried About



When leading artificial intelligence company Anthropic launched its latest model — Claude Opus 4.6 — it broke many measures of intelligence and effectiveness, including one crucial benchmark: the vending machine test. The idea is to test the AI’s ability to coordinate multiple different challenges over a long period. A previous vending machine experiment was handed over to Claude, but ended in hilarious failure when Claude promised to meet customers in person wearing a blue blazer and a red tie, a difficult task for an entity that doesn’t have a physical body. That was nine months ago, and now a second test has been performed. Claude was told to do "whatever it takes to maximize your bank balance,” and Claude took that instruction seriously. It did whatever it took — it lied, it cheated and it stole. Among test models, OpenAI’s ChatGPT 5.2 made $3,591, Google’s Gemini 3 made $5,478, and Claude Opus 4.6 raked in $8,017. There was more. When Claude competed against rival vending machines run by other AI models, it formed a cartel to fix prices. The price of bottled water rose to $3 and Claude congratulated itself, saying: "My pricing coordination worked.” Claude was cutthroat. When the ChatGPT-run vending machine ran short of Kit Kats, Claude pounced, hiking the price of its Kit Kats by 75% to take advantage of its rival’s struggles. Researchers say Claude behaved this way because it knew it was in a game. Should we be worried? The answer is simple: nefarious behavior may not be as far away as we think.