Skip to content
Agentic Elasticity

[Editor’s note: Halcyon’s Chief Data Officer Alex Klaessig will be presenting at CERAWeek’s Agora Innovation program on March 25. Let us know if you’ll be there: sayhi@halcyon.io]

Last week, I wrote about the signals which Virginia’s Data Center Alley sends to power market participants: our particular cluster of data centers is effectively always-on, because the services it provides are also always-on. (This week, Halcyon re-launched its “Follow a docket” capability; anyone who wants to follow Virginia State Corporation Commission Docket PUR-2024-00144 can do so here.) To put the Northern Virginia utility and data center operator position more plainly: we run your 911 calls, and you don’t want those interrupted because of a notional benefit from flexing data center demand when the grid capacity is tight. Flex your training loads all you want, grid operators, but please be careful with backbone stuff in today’s telecommunications system.

That evidence-based post kicked off some entertaining discussion internally, and as often happens, the conversation went from empirical to conjectural pretty quickly. We have the good fortune, in that regard, of a new colleague with real grid operator experience (hi Jake!) who asked the right next question: what would a highly agentic AI demand paradigm look like to the grid?

Jake said it better than I can, so here’s a chunk of the back-and-forth:

You might be able to think about AI data center power demand in relation to the typical generator supply stack. We start with inflexible power production (like nuclear power plants which run for 18-24 months straight at relatively constant output levels) matched to relatively inflexible data center demand, like training runs. The flexibility afforded by training demand likely comes from coordinating with the grid operator to schedule when the runs occur; much like how nukes coordinate now to schedule refueling outages.

As you move up the stack, you move from training to something like “base AI inference,” similar operationally to what was mentioned in PUR-2024-00144. This will be a stable load with a high load factor and so can be matched quite easily to base-to-intermediate generation. Then there is inference that scales with human work cadence – the same familiar shape, just magnified. Grid operators have tools for that intermediate range.

But what if you introduce agents?

Agents introduce a disconnect in the typical daily demand profile. Agents are not bound to the typical 9-5 work window, and so you could imagine them being deployed in low-priced hours (e.g., overnight, or the random five minute intervals where prices drop to -150/MWH). This is, in a sense, the demand-side inverse of a generation outage: instead of a sudden, unexpected loss of supply, you get a sudden, unexpected surge of load — swarms of price-responsive agents all flocking to the same cheap electrons at the same moment. The stabilizing read is that you're filling in the valleys, keeping generation from bumping up against its minimum run limits. The destabilizing read is the flash-crash problem: do we end up with the same cascading, self-reinforcing dynamics that have periodically broken equity markets?

There's also a deeper reliability framing here. Today's grid is architected around N-1 and N-1-1 contingency standards: you have to be able to recover from your single largest contingency (N-1) within a defined window, and your second-largest (N-1-1) in another. This is the foundation of how ISOs set reserve requirements. But there is no demand-side equivalent — no N+1 standard — and, critically, no established method for even sizing what that "+1" contingency would look like when the load in question is a coordinated swarm of autonomous agents rather than a factory or a data center with a known, stable profile.

If Amazon makes different types of chips for different purposes, we can imagine different types of data centers for either training or for inference. We can also imagine data centers not just shifting the inference demand over time, but physically shifting to other jurisdictions with lower prices, and doing so instantaneously. That could even be part of the agentic workflow itself: an agent classifying both the urgency and cost of its own inference needs.

That kind of spatial, not temporal, shift is interesting from a grid operator perspective for a couple of reasons. Remember that in the energy market, generators and DR get paid what is known as a locational marginal price (LMP). So what happens when the load doesn’t have a fixed location, when it can quickly shift across RTO boundaries from Virginia to Texas?

Traditional demand response never posed this problem because it has always been physically bounded. If a utility wanted a fish finger factory to stop production, this meant hours of lost labor and inedible meals. There was no way to instantly transport the fish fingers to an identical fish finger company and pick up the slack. For this reason, barring extremely high-priced hours, DR typically does not meaningfully interact in the market (at least in New England, the market I am most familiar with).

All this to say, the introduction of AI poses very significant challenges for a grid (and a regulatory system) that was designed decades ago to meet very different demands (the reliably predictable production demands of, say, an aluminum smelter). Likewise, because all of this is so unprecedented, managing it requires a certain kind of creativity and original thinking that you may not expect in the bring-your-lunchbox-to-work world of the utilities industry. Imagine a reliability engineer doing a 10-year assessment in 2016 and trying to forecast the world we have today - there was no taxonomy for even one type of AI load, nevermind several.

This time- and location-shifting demand paradigm full of billions of autonomous compute consumers suggests that we think not just in flexibility, but elasticity. Every Econ 101 student is familiar with the concept of elasticity: how much demand or supply of thing X moves in response to signal Y. Residential electricity typically has a low demand elasticity of price, for instance. Power is the bill you pay until you absolutely cannot pay it; it is quite inelastic. Ultra-fancy chocolate has a high demand elasticity of price.

So, what sort of elasticity should we ascribe to a highly agentic inference paradigm? It all depends on what those agentic processes are doing, really. Non-essential? Highly elastic, and ready to shift in time or space to capture power prices. Essential? Maybe as always-on as Loudoun County’s data centers today. In this sense, the biggest concern will be scale. Will there be gigawatts of agentic elastic compute? Or are tens of gigawatts? Or more?

But there is another elasticity to consider too: call it ‘compute elasticity of agency.’ Is one of today’s existing workloads something that is going to become agentic? Then it could be more elastic than today’s always-on inference and be an instrument of flexibility. We could see previously-inflexible demand becoming flexible (and we have started to see energy companies experiment with this). We could see meaningful demand-side participation in the energy market.

Last year, Brookfield published a chart on AI compute demand, which it expects to be 75% inference by 2030. Here is an adaptation of that chart to capture the spirit of the two paragraphs above:

Halcyon_blog_post_charts_2026_agentic_AI

Yes, it’s cheeky, but it’s in good spirit and for good reason. The best time for compute demand, electricity supply, and system regulation to begin thinking about agentic elasticity is now, and the best way to do so is to ask the right questions, creatively.

It just so happens that Halcyon can be very helpful to people asking the right questions. If you’re tracking how elastic compute demand intersects with grid planning and regulation, these Halcyon Curated Alerts provide real-time visibility into the proceedings shaping the outcome: