Nvidia CEO Jensen Huang recently told the All-In podcast that if one of his $500,000-a-year engineers “did not consume at least $250,000 worth of tokens” in a year, “I am going to be deeply alarmed.”
I read that and thought about how we used to evaluate startups.
We’ve watched this movie twice. The first run was the dot-com era โ Pets.com, Webvan, Kozmo, spending themselves into the ground to capture category share that never materialized. The second run was the WeWork and Softbank years, when “blitzscaling” and “growth at all costs” got dressed up as a virtue. Both times, the startups that actually made it weren’t the ones burning the fastest. They were the ones whose spending mapped to something customers wanted. Everyone else became a footnote.
Huang is describing burn rate for people. And it’s catching on.
Claudeonomics
Earlier this month, The Information reported that an employee at Meta had built an internal leaderboard called “Claudeonomics” that ranked coworkers by AI tokens processed. Digital badges. Titles like “Model Connoisseur” and “Cache Wizard.” The top user was averaging 281 billion tokens, which The Information estimated could run in the hundreds to thousands of dollars per user. Two days after the story broke, the leaderboard was quietly taken down.
But the underlying idea didn’t go with it. Reddit threads and engineering blogs are now full of people describing what they do to get on similar boards at their own employers: lengthier prompts than the task requires, the same job fanned out to four agents in parallel, coding assistants left running overnight against problems nobody asked them to solve. One engineer called it “lines-of-code thinking for the agentic era,” which is exactly right.
In the early 2000s, some shops measured developer productivity by lines of code. It was easy to track, felt real, and rewarded exactly the wrong behavior. The best engineers deleted code. The worst ones wrote sprawling, unmaintainable masterpieces, and they got promoted. We learned our lesson, or we said we did.
Tokenmaxxing is the same mistake with a bigger power bill.
Compute Is Not Free
Reid Hoffman came out in favor of token tracking this week at the Semafor World Economy summit, with a caveat: look at what people are doing with the tokens, not just the count. That caveat is the whole game, and the moment tokens go on a leaderboard or get tied to compensation, the caveat vanishes. You can’t put nuance on a scoreboard. People will optimize for the number. They always do.
The deeper problem is that tokens aren’t virtual currency. They’re compute. Compute is power, water, transformers, and square footage of silicon, and right now every one of those is backed up. Frontier labs are queuing for data center capacity the way hospitals used to queue for MRI magnets. Utilities in Virginia, Texas, and Arizona are telling hyperscalers they can’t connect new load for years. The IEA’s most recent outlook has data centers roughly doubling their share of global electricity demand before the end of the decade โ and that’s the projection before we started paying engineers to pad their numbers.
Every token an engineer burns to climb a leaderboard is a token some other team isn’t running for something that matters. In an unconstrained world, that’s someone else’s problem. The world is not unconstrained.
Measure Outcomes
Volume-of-consumption is almost never a good proxy for value-of-output. Rank developers by lines of code and you get bloat. Rank support reps by tickets closed and you get tickets closed without being fixed. Rank salespeople by calls made and you get call logs full of hang-ups. The failure mode is well-documented, Goodhart’s Law has a Wikipedia page, and yet here we are, stacking AI tokens on a scoreboard like nobody has ever seen a metric get gamed before.
The only reason it’s working at all is that nobody has a better number yet and most CFOs aren’t looking closely. That part is coming. The first time a finance team reconciles a monthly AI bill against actual delivered work, the program goes on hold โ and the conversation is a lot easier from the “we measured outcomes” side of the table than the “we hit 281 billion tokens” side. The reckoning with “wait, what did we actually get for this?” is the natural consequence of any metric that measures input instead of output, and it’s been coming for tokenmaxxing since the moment somebody wrote the first leaderboard query.
Measure the bugs fixed, the features shipped, the problems solved, the hours clawed back. Measure the outcomes. Not the receipts.
