May · 2026

The Clay Jar and the Data Center

Jensen Huang describes Nvidia's business in a single phrase: electrons in, tokens out.1 The phrase is doing more work than it appears to. It implies that the production of intelligence is, at its core, the conversion of electrical energy into useful computation, and that the rate of that conversion is what matters: the tokens per joule, the joules per dollar, the dollars per token.

I think this framing is correct. I also think it points to a problem that most of the current AI infrastructure buildout is structured to avoid rather than solve.

The dominant response to the AI energy problem, judged by where capex is flowing, is to treat energy primarily as a procurement problem. Sign longer PPAs.2 Restart nuclear.3 Buy your own gas turbines.4 Microsoft and Three Mile Island,5 Amazon and Talen,6 Meta and Clinton,7 and the surge of new gas turbine orders backing up data center campuses across Texas and the Mountain West are each rational responses to real constraints such as scarce firm capacity, long lead times, and the strategic need to lock in resources before competitors do. But they share an implicit assumption, that AI workloads are continuous, massive, and inflexible, and therefore demand continuous, massive, inflexible supply.

I think the procurement view is correct as far as it goes, and incomplete in a way that will matter a great deal in the next few years. The reason is that the demand side of an AI data center is not flat, and treating it as flat leaves most of the available economic value on the table.

Orchestrating energy end to end, from how it is procured and stored through how it is allocated to workloads in real time, will be one of the most strategic layers of the inference era. This holds regardless of the generation mix. Renewables-plus-storage, nuclear-powered, and gas-fired campuses all face the same underlying problem: a demand side that varies through the day, and a supply side whose value depends on the moment. Orchestration doesn't go away when the generation mix is firm.

Why inference is not flat

There is an assumption embedded in the procurement view that deserves to be made explicit: that AI workloads are continuous and inflexible, and therefore the right job is to procure continuous and inflexible supply. The first half of this is becoming less true every year.8

A training run is a single coherent computation, fragile and tightly synchronized, that cannot be paused without significant cost.9 Inference is not. Inference is millions of independent requests with wildly varying latency requirements. Some are real-time chat, where every additional second of latency degrades the product, but some are batch document processing that can wait six hours without anyone noticing. Some are agent workflows where the user has already gone to lunch and will not check the result until a few hours later. Some are scheduled analyses that could run any time before the morning report. In aggregate, inference has substantial temporal flexibility, and the proportion of inference that is genuinely latency-bound is shrinking as agentic and long-running workloads grow.

The market is already pricing this. Both OpenAI and Anthropic offer a 50% discount on their batch APIs, which run with a roughly 24-hour turnaround.10 That is the going rate for temporal flexibility in inference today, and the fact that the two largest model providers converged on the same number is not an accident. It is the market revealing that a meaningful share of inference does not need to run immediately. Agentic workloads are accelerating the shift: Gartner projects that 40% of enterprise applications will integrate task-specific AI agents by the end of 2026, up from less than 5% a year earlier,11 and inference workloads, broadly, are projected to rise from roughly a third of total AI compute in 2023 to two-thirds by 2026.12 The category that is growing fastest is the category with the most temporal flexibility.

This flexibility is, today, almost entirely unexploited at the energy layer. The systems that schedule inference jobs don't see the price of electricity. The systems that manage electricity don't see the value of the next token. The two control loops run in separate buildings, optimized for separate objectives. A facility that could see both at once would, on most days, find itself with hours of cheaper electrons and hours of more valuable workloads, and the gap between them is pure margin that nobody is currently capturing.

The clay jar

Ten thousand years ago, food was not scarce. Hunter-gatherers were often well-fed. The constraint was time, not production. Everything had to be consumed almost immediately, because there was no reliable way to carry today's abundance into tomorrow. Without storage, there was no surplus. And without surplus, there was no progress.

The invention that broke through that ceiling was not farming. It was the clay jar. The sealed grain pit. The granary. These technologies decoupled eating from finding, so that a good harvest could carry a community through a bad season. People could settle. Individuals could stop farming and become potters, soldiers, priests, traders. The jar was the unlock, but the jar alone wasn't enough. What mattered just as much was the knowledge of when to open it. The granary needed a granary-keeper.

The AI energy problem has the same structure. Solar and wind are now genuinely cheap. The global weighted average levelized cost of utility-scale solar fell roughly 90% between 2010 and 2024, from $0.417/kWh to $0.043/kWh.13 Nuclear, where it can be brought online, provides something different and also valuable: firm power at scale, indifferent to weather and time of day. Natural gas, the actual workhorse of most new AI campuses in the United States, sits between them, dispatchable and relatively cheap but exposed to fuel price volatility and emissions constraints. These are not competing solutions. They are different inputs to the same machine.

What none of them gives you, on its own, is the matching of supply to a demand curve that is itself highly variable. In 2024, ERCOT curtailed more than 8 TWh of wind and solar, energy that existed, was generated, and was thrown away because nothing could use it in that moment.14 A nuclear-powered data center, meanwhile, runs its reactors flat through hours when global inference load is at its trough and the marginal token is worth a fraction of what it would be worth six hours later. A gas-fired campus has a different problem: the turbines cannot ramp fast enough to follow GPU demand swings, and forcing them to do so shortens their life and increases emissions, because the equipment is being used in a way it was not designed for.15 These are three forms of the same waste: generation and consumption that are not coordinated through time.

Storage is already infrastructure

A modern AI data center already runs at least six distinct storage technologies simultaneously, across timescales spanning ten orders of magnitude. Ceramic capacitors on the GPU board absorb microsecond voltage spikes. Rack-level supercapacitors handle millisecond fluctuations. Nvidia's Vera Rubin NVL72 ships with 400 joules of storage per GPU, six times the prior generation, cutting peak current demand by up to 25%.16 UPS systems bridge seconds-long outages. Facility-level battery energy storage systems buffer the volatile power signatures of GPU clusters from the stability requirements of the grid. Storage is no longer insurance. It is infrastructure, embedded in the machine itself.

What does not yet exist, at any data center I am aware of, is the layer that coordinates across all of it. The capacitors are governed by the GPU firmware. The UPS is run by the facility team. The grid-scale battery, where one exists, is dispatched by an energy trading desk optimizing for day-ahead prices. The workload scheduler is owned by the inference team, which has no visibility into any of the above. Each control loop is locally sensible and globally incoherent.

The interesting object is the system that closes this gap, that can see generation, storage, grid prices, thermal limits, workload demand, and revenue per token at the same time, and make second-by-second decisions about all of them. Storage is already part of the stack. What is missing is the coordination on top of it. This is a software problem, not a hardware problem.

The orchestration layer forecasts energy prices and grid conditions, tracks the state of charge of every battery, the temperature of every rack, and the priority of every queued workload, and decides in real time which inference job runs where, which battery discharges, and which token gets produced now versus which one gets queued. The pattern has analogues. High-frequency trading firms built it for markets, cloud providers built it for compute, grid operators built it for reliability and stability. Nobody has built it for the intersection of all three.

When a passing cloud cuts solar output across West Texas, an orchestration layer would not just issue a dashboard alert. It would shift non-urgent agent workloads to a region with cheaper power, draw down on-site batteries to buffer the local GPUs, and preserve the most economical electrons for latency-sensitive inference. The point is not backup power or conventional demand response. It is the joint optimization of energy and computation as a single economic system.

ERCOT as the leading indicator

Texas is the most instructive case because the dynamics are visible there earlier than anywhere else. The state has more data center pipeline than any other, the most aggressive battery buildout in the country, the largest queue of new gas turbines, and a deregulated market structure that surfaces the tension between them in real time.17

ERCOT entered 2026 with 13.9 GW of installed battery storage, nearly doubling from 7.8 GW at the start of 2025, and Modo Energy projects 40 to 55 GW operational by 2029.18 The data center pipeline looks larger on paper. ERCOT is processing roughly 226 GW of large-load interconnection requests, 72.9% from data centers, up 270% in 2025 alone.19 Only 5.3 GW of large loads were actually energized as of November 2025.20 A more conservative range for real buildout is 40 to 65 GW by 2030.21

The conventional reading is that storage is dramatically undersupplied. It isn't, because batteries don't need to power data centers around the clock. They need to firm an underlying generation mix that is already, on most hours of most days, more than capable of meeting demand. ERCOT's nameplate capacity sits well above its all-time peak demand of 85.5 GW (set in August 2023) and its summer 2025 peak of 83.9 GW.22 The state is not short of electrons. It is short of electrons in the shape a data center needs them: smooth, instantaneous, and reliable.

The hardware is not the bottleneck. The bottleneck is that batteries and data centers are nominally in the same market and operationally in different worlds. What Texas makes visible, every other grid will eventually surface in its own way.

Looking ahead

By 2028, I think at least one major AI lab will be publicly reporting that a meaningful share of its inference cost is being managed by a combination of storage orchestration and workload routing to regions and times of day where firmed power is cheapest. By 2030, I think the gap in $/token between the best and worst data centers will depend as much on energy orchestration as on the underlying silicon. And I think the unified control plane across procurement, storage, dispatch, and workload will become a recognized layer of the AI infrastructure stack. The granary-keeper.

I could be wrong in specific ways. If inference demand turns out to be much more latency-sensitive in aggregate than I expect, the temporal flexibility I am leaning on shrinks, and the procurement view holds up longer. If grid-scale storage costs stall, whether from a chemistry floor, supply chain shocks, or policy changes, the economics of coordinated systems become less compelling against simply buying more firm supply.

What I am more confident about is the underlying shape. Energy is not solved by procurement alone, regardless of whether the supply is nuclear, gas, or pure renewables. The demand side of an AI data center is variable, the supply side has options, and the value of a joule depends on the moment it is used. A system that doesn't coordinate across these is leaving money on the table every second it operates. By 2030, the gap between the facilities that coordinate well and the ones that don't will be a major source of variance in $/token.

I would extend Huang's framework to include the middle term. Electrons in, orchestration, tokens out. The middle term is not just a scheduling problem. It is the control layer that decides, second by second, when an electron is worth more stored, when it is worth more as grid revenue, and when it is worth most as intelligence.

I think the ceiling on what AI can do for the world in the next decade will be set less by how good the models get and more by how many tokens we can afford to spend on the problems worth solving. Cheaper inference is not just a margin improvement. It is the difference between intelligence that gets rationed and intelligence that gets applied. Every joule we coordinate well is a token that gets to do useful work somewhere it could not before.

References

  1. Jensen Huang has used this formulation repeatedly in keynotes and investor communications, framing AI factories as systems that convert electricity into tokens. See for example Nvidia's Vera Rubin NVL72 announcement, where the company explicitly frames AI as "token-driven." back
  2. A power purchase agreement (PPA) is a long-term contract for electricity, typically 15 to 25 years, between an off-taker and a generation source. PPAs are the primary mechanism through which hyperscalers procure dedicated power for AI data centers. back
  3. The "restart nuclear" move is most visible in the ongoing effort to bring Three Mile Island back online under its new name (Crane Clean Energy Center). As of May 2026, FERC may decide as early as June on the grid-injection rights that determine when the plant can restart. See Laila Kearney, US may decide on Three Mile Island restart in June, Constellation execs say, Reuters, May 11, 2026. The pattern extends beyond Three Mile Island: Palisades in Michigan is targeting restart in 2025-2026, and several other shuttered reactors are under reactivation discussion. back
  4. The "buy your own gas turbines" move is most dramatically visible in xAI's Colossus buildout. In March 2026, the Mississippi Department of Environmental Quality approved a permit for 41 natural gas turbines (1.2 GW of self-generated capacity) at xAI's Southaven site to power the Colossus 2 and Colossus 3 data centers across the state line in Memphis. See Musk's xAI gets go-ahead for 41 natural gas turbines in Mississippi to power Colossus data centers, Data Center Dynamics, March 2026. xAI is the most visible case but not the only one: a Senate investigation announced in April 2026 identified 12 new gas plants under development across eight AI companies to power data center expansion. back
  5. Constellation Energy, Constellation to Launch Crane Clean Energy Center, September 2024. Constellation signed a 20-year power purchase agreement with Microsoft to restart Three Mile Island Unit 1 (to be renamed the Crane Clean Energy Center), with the plant expected back online in 2028. Constellation will invest $1.6 billion in the restart and Microsoft will purchase the entire output. See also CNBC coverage and EIA analysis. back
  6. In March 2024, Amazon Web Services acquired a data center campus from Talen Energy adjacent to the Susquehanna nuclear plant in Pennsylvania, with an initial 960 MW supply arrangement. In June 2025, AWS expanded the relationship into a 17-year power purchase agreement for 1.92 GW from the two-unit plant. See NucNet, Constellation Secures $1 Billion Federal Loan For Three Mile Island Restart, November 2025, for the updated terms, and EIA's original analysis for the initial transaction. back
  7. In June 2025, Meta signed a 20-year power purchase agreement with Constellation Energy for 1,100 MW of capacity from the Clinton Clean Energy Center in Illinois, with deliveries starting in June 2027. The arrangement covers the entire output of the single-unit nuclear facility. See NucNet, November 2025. back
  8. The flexibility of AI workloads is now quantified in academic and operator-side research. A widely-cited Duke University study, the most-downloaded publication in the history of Duke's Nicholas Institute, found that the 22 largest US balancing authorities (representing 95% of national peak load) could accommodate 76 to 126 GW of new flexible load with curtailment of just 0.25% to 1% of annual uptime, with average curtailment events lasting 1.7 to 2.5 hours. In ERCOT specifically, the analysis suggests up to 15 GW of new data center load could be added without expanding generation. See Tyler Norris, Tim Profeta, Dalia Patino-Echeverri, and Adam Cowie-Haskell, Rethinking Load Growth: Assessing the Potential for Integration of Large Flexible Loads in US Power Systems, Duke Nicholas Institute, February 2025. Operator practice is following: at DTECH 2026, Google's head of market innovation for advanced energy described an expansion from "limited demand response" to structured utility agreements oriented toward AI and machine learning workloads, designed to accelerate grid connections for new data centers. See How automation and load flexibility are helping manage data center growth, Renewable Energy World, February 2026. back
  9. Distributed training at frontier scale is structurally fragile: thousands of GPUs must stay in synchronization, and individual hardware or network failures can destabilize a run. Epoch AI estimates that a 100,000-GPU cluster faces a failure roughly every 30 minutes, and Meta's Llama 3 training run on 16,000 GPUs experienced more than 400 hardware failures over 54 days. Modern frameworks rely on continuous checkpointing to limit lost progress, which makes pausing operationally possible but costly. See Epoch AI, Hardware Failures Won't Limit AI Scaling, November 2024. back
  10. OpenAI and Anthropic both offer a 50% discount on their Batch APIs for workloads that tolerate approximately 24-hour turnaround. See Anthropic's Message Batches API announcement ("Each batch is processed in less than 24 hours and costs 50% less than standard API calls") and OpenAI's Batch API documentation ("50% lower costs, a separate pool of significantly higher rate limits, and a clear 24-hour turnaround time"). The convergence on identical pricing across both providers is itself meaningful: it is the rate at which the market is currently pricing temporal flexibility in inference. back
  11. Gartner, Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026, Up from Less Than 5% in 2025, August 2025. Gartner's projection covers task-specific AI agents (systems that operate end-to-end on bounded workflows) and explicitly distinguishes this from the simpler AI assistants of the previous generation. back
  12. Deloitte, Why AI's next phase will likely demand more computational power, not less, February 2026. Inference workloads are projected to account for roughly two-thirds of all AI compute by 2026, up from roughly one-third in 2023 and one-half in 2025. The shift reflects both the proliferation of inference-optimized chips and the growth of agentic and long-running workloads, which are inherently more inference-heavy than chat. back
  13. International Renewable Energy Agency, Renewable Power Generation Costs in 2024, July 2025. The global weighted-average LCOE for utility-scale solar PV fell from $0.417/kWh in 2010 to $0.043/kWh in 2024, representing a 90% decline. Total installed cost fell 87% over the same period, from $5,283/kW to $691/kW. See also coverage in PV Magazine. back
  14. Modo Energy, The Curtailment Crisis: Saving wind and solar investments in ERCOT, October 2025. Combined wind and solar curtailment in ERCOT exceeded 8 TWh in 2024, equivalent to approximately 1.2 GW of average hourly curtailment. The West and Panhandle zones bore the brunt, with the West zone alone experiencing 3.1 TWh of wind and 2.2 TWh of solar curtailment. See also ITK Research and Amperon. back
  15. Mark Chediak, Michelle Ma, and Bloomberg, Data centers are finding a surprising way to deploy batteries, Fortune, April 2026. The article documents that gas turbines used in behind-the-meter data center applications cannot ramp quickly enough to follow AI compute demand swings, and that the start-stop cycles required to keep up shorten turbine life and increase emissions. A Williams Cos. spokesperson confirmed that "smoothing load swings and reducing start-ups and ramping... are the most emissions-intensive conditions." BloombergNEF has tracked 4.9 GW of co-located energy storage at on-site data center power plants, approximately 32% of announced global on-site data center battery capacity, evidence of the scale at which the operational problem is now being addressed. back
  16. Nvidia Developer Blog, Inside the Nvidia Vera Rubin Platform: Six New Chips, One AI Supercomputer, January 2026; NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer, April 2026. The MGX rack architecture introduces Intelligent Power Smoothing with 6× more rack-level energy storage (400 J per GPU) than Blackwell Ultra, reducing peak current demand by up to 25%. back
  17. Texas leads the US in data center pipeline, battery storage, and gas plant development. As of Q1 2026, Aterio projects 962 data center sites in the Texas pipeline, the largest of any state. See Ranked: Which U.S. States Will Be Data Center Hotspots?, Visual Capitalist, April 2026. On batteries, ERCOT surpassed CAISO in Q2 2025 to become the US region with the most operating battery storage capacity, reaching 14.2 GW. See S&P Global, ERCOT surpasses CAISO in Q2 for most operating battery storage capacity in US, September 2025. On gas, the Texas interconnection queue contained roughly 64 GW of gas projects as of mid-2026, up over 400% in three years, and Global Energy Monitor has called Texas "the global epicenter of a gas power buildout." See Brandon Mulder, Gas power making a comeback in Texas' grid connection queue, Texas Tribune, May 2026. back
  18. Modo Energy, ERCOT Annual Buildout Report: Battery capacity reaches 14 GW entering 2026, February 2026. Texas entered 2026 with 13,888 MW of operational battery capacity, nearly doubling from 7.8 GW at the start of 2025. Average duration rose from 1.5 hours at the start of 2025 to 1.65 hours by year-end. Modo's central forecast is 40 to 55 GW of operational ERCOT battery storage by 2029, depending on attrition assumptions. See also ESS News coverage. back
  19. Blockspace Media, Texas large load requests up 270% in 2025 on data center growth: ERCOT, December 2025; ERCOT, Report on Existing and Potential Electric System Constraints and Needs, December 2025; Latitude Media coverage. ERCOT's large-load interconnection queue reached 225.8 GW by November 2025, of which 72.9% (164.5 GW) was data centers, with crypto mining accounting for an additional 8.8%. The queue grew 270% in 2025. back
  20. ERCOT, Large Load Interconnection Process Q&A, December 2025. As of November 18, 2025, ERCOT had identified 5,302 MW of large loads in the "Observed Energized" category — projects that have received approval to energize and have been observed operational. back
  21. Aurora Energy Research, commissioned by ERCOT, models scenarios with approximately 35 GW of data center load by 2030. Industry consensus places the realistic conversion range between 20 and 65 GW. See Dave Friedman, Inside Texas's AI Data Center Queue, January 2026, for a synthesis of competing forecasts. back
  22. ERCOT Fact Sheet and Sierra Club Texas, October 2025. The all-time ERCOT peak demand record of 85,508 MW was set on August 10, 2023. The summer 2025 peak reached 83,900 MW on August 18, 2025, in a milder summer that ERCOT noted could have set new records under more typical heatwave conditions. Entering 2025, ERCOT's nameplate generation capacity exceeded 150 GW across natural gas, wind, solar, nuclear, and storage, well above peak demand even before accounting for capacity additions during 2025. Note that nameplate capacity and firm capacity differ meaningfully for variable resources; the relevant point for the argument is that on most hours of most days, available supply exceeds demand. See also EIA's ERCOT analysis. back