The Largest Infrastructure Arms Race in Tech History Is Playing Out Right Now
Anthropic just committed $100 billion to Amazon. OpenAI committed $100 billion to Amazon two months ago. And both of them are also spending tens of billions with Microsoft, Google, and Nvidia simultaneously.
Here is what it means for anyone building AI products right now.
The hyperscalers are the substrate
AWS, Azure, and Google Cloud are no longer just cloud providers. They are the physical substrate of the entire AI economy. Every model, every agent, every API call runs on infrastructure controlled by three companies.
When Anthropic locks in up to 5 gigawatts of compute with Amazon for a decade, it is not just buying chips. It is choosing its lane in an industry that is rapidly consolidating around physical infrastructure. The deal secures Trainium2, Trainium3, Trainium4, and rights to every generation after that. That is a ten-year architectural commitment, not a procurement decision.
OpenAI's February deal was structured almost identically. A $100 billion commitment to AWS over eight years, on top of an existing $38 billion agreement, with Amazon investing $50 billion into OpenAI in the process. Two gigawatts of Trainium capacity. Exclusive distribution of OpenAI Frontier on AWS.
For builders: the platforms you build on top of are becoming more dependent on fewer providers. The "cloud agnostic" strategy is getting harder to execute as frontier AI locks into specific silicon ecosystems.
Multi-cloud still matters, and Monday proved it
Anthropic is still multi-cloud. AWS remains primary, but the stack also runs on Google TPUs and Microsoft Azure. Claude is the only frontier model available across all three hyperscalers: Bedrock, Vertex AI, and Azure Foundry.
That diversification is not theoretical. It protected them during Monday's AWS outage. A DynamoDB software update in us-east-1 cascaded across the internet, taking down ChatGPT, Perplexity, Coinbase, Robinhood, Signal, and Fortnite. Claude stayed online. Not by accident. By architecture.
On top of AWS, Anthropic has signed a $30 billion Azure commitment with Microsoft plus up to a gigawatt of additional capacity, and a multi-gigawatt TPU deal with Google and Broadcom for 2027 and beyond. The lesson for anyone building on top of a single provider: the resilience you want is the resilience your foundation was engineered for. If the platform is single-homed, so are you.
What happens next
Google unveiled its next-gen inference TPU this week at Cloud Next. Ironwood is the first TPU Google has built specifically for inference, nearly 2x more power-efficient than the previous generation, scaling to 9,216 chips per pod and 42.5 exaflops of compute. The timing is not accidental. Google is betting that the next wave of AI spending will be on serving models, not training them, and it wants to own that workload.
Nvidia is watching two of its biggest customers lock their roadmaps to Amazon's custom silicon. Microsoft just committed billions to Anthropic on Azure while still holding its $250 billion OpenAI contract. Every hyperscaler is now in a position where it cannot afford to lose any frontier AI workload, and the labs are in a position where they cannot afford to be dependent on any single provider.
The hyperscalers are about to fight each other for every frontier AI workload that exists. Pricing will get aggressive. Performance benchmarks will get political. Marketing will get louder.
What this means for your business
If you are building on top of these platforms, three things to think about right now.
1. Know your dependency graph. Which models do you use, through which APIs, running on which silicon, in which region? If you cannot answer that in a sentence, you are exposed. The Monday outage took down companies that did not even realize they were single-homed in us-east-1.
2. Architect for substitution, not just redundancy. The providers are going to compete on price, latency, and capability for years. Building your stack so you can move workloads between Bedrock, Vertex, and Azure is not paranoia. It is leverage.
3. Watch the silicon, not just the model. Trainium, TPU, and Nvidia GPUs are not interchangeable for every workload. As the labs tune their models to specific chips, the downstream performance and cost profile of the models you consume will shift. The builders who understand this will price, plan, and ship better than the ones who do not.
The infrastructure layer of AI is being decided this year. The builders who understand it will have a real edge over those who do not.
---
*Raptor Tech builds custom software and AI systems for businesses that need to move fast without getting locked in. If you need help architecting an AI stack that can weather the hyperscaler wars, book a free consultation or call (561) 786-7926.*
Sources
- Anthropic and Amazon Expand Collaboration for Up to 5 Gigawatts of New Compute (Anthropic)
- Anthropic Takes $5B From Amazon and Pledges $100B in Cloud Spending (TechCrunch)
- Amazon Invests $50B in OpenAI, Deepens AWS Partnership with $100B Cloud Deal (GeekWire)
- How Amazon's Massive Stake in OpenAI Could Boost Its AI and Cloud Businesses (CNBC)
- Anthropic to Purchase $30bn in Microsoft Azure Credits (Data Center Dynamics)
- Anthropic Expands Partnership with Google and Broadcom (Anthropic)
- Inside Anthropic's Multi-Cloud AI Factory (Data Center Frontier)
- AWS Outage Exposes 'Dangerous' Over-Reliance on US Cloud Giants (Data Center Knowledge)
- Ironwood: The First Google TPU for the Age of Inference (Google)