SemiAnalysis

SemiAnalysis

Microsoft's AI Strategy Deconstructed - From Energy to Tokens

"The Big Pause", AI Tokens Factory Economics Stack, OpenAI, Neocloud Renting, GitHub Copilot, MAI and Maia

Dylan Patel
,
Jeremie Eliahou Ontiveros
,
Myron Xie
, and 4 others
Nov 12, 2025
∙ Paid

Microsoft was at the top of AI in 2023 and 2024, but then a year ago they changed course drastically. They paused their datacenter construction significantly and slowed down their commitments to OpenAI. We called this out a year ago to datacenter model clients and later wrote a newsletter piece about it.

Microsoft's Datacenter Freeze - 1.5GW Self-Build Slowdown & Lease Cancellation Misconceptions

Microsoft's Datacenter Freeze - 1.5GW Self-Build Slowdown & Lease Cancellation Misconceptions

Dylan Patel, Jeremie Eliahou Ontiveros, and Maya Barkin
·
Apr 28
Read full story

2025 was the story of OpenAI diversifying away from Microsoft, with Oracle, CoreWeave, Nscale, SB Energy, Amazon, and Google all signing large compute contracts with OpenAI directly.

This seems like a dire situation. Today we have a post dissecting Microsoft’s fumble as well as a public interview with Satya Nadella and our dear friend Dwarkesh Patel where we challenged him on their AI strategy and execution.

Now Microsoft’s investments in AI are back, and the AI giant has never had such high demand for Accelerated Computing. The Redmond titan has woken up to it going down the wrong path and has dramatically shifted course. With the newly announced OpenAI deal, Azure growth is set to Accelerate in the upcoming quarters as forecasted by our Tokenomics model.

Source: SemiAnalysis Tokenomics Model

Microsoft plays in every single part of the AI Token Economic Stack, is witnessing accelerated growth, and we expect the trend to continue in coming quarters and years.

Source: SemiAnalysis Tokenomics Model

The firm is actively looking for near-term capacity and pulling the trigger on everything it can get its hands on. Self-build, leasing, Neocloud, middle-of-nowhere locations – everything is on the table to accelerate near-term capacity growth (exact numbers available to our Datacenter Model subscribers).

Source: SemiAnalysis Tokenomics Model

On the hardware side, Microsoft even has access to OpenAI’s Custom Chip IP, the most exciting custom chip ASICs currently in development. Given the trajectory of OpenAI’s ASIC developments look much better than that of Microsoft Maia, it may be that Microsoft ends up using the chip to serve OpenAI models. This dynamic mirros Microsoft’s situation with OpenAI models. While they have access to OpenAI models: they are still trying to train their own foundation model with Microsoft AI. We beleive they’re attempting to become a truly vertically integrated AI powerhouse, eliminating most of the 3rd party gross margin stack, and deliver more intelligence at lower cost than peers.

OpenAI Chip Team Is Now Serious

OpenAI Chip Team Is Now Serious

Dylan Patel
·
June 3, 2024
Read full story

In this report, we will dive into all aspects of Microsoft’s AI business. We begin by reviewing the history of the OpenAI relationship, covering the historic surge in Microsoft’s datacenter investments in 2023-24, as well as the mooning scale of their OpenAI training clusters – from tens of MWs to Gigawatts. We then analyze the" “Big Pause” and spectacular return to the datacenter market. Much of this is motivated by a drastic simplification of the OpenAI ownership structure as well as Microsoft’s laser focus on serving the infrastructure needed for converting model capabilities into product use cases (and revenue) via stateless APIs.

Source: SemiAnalysis Tokenomics Model, Company Disclosures
Source: SemiAnalysis Tokeomics Model, Company Disclosures

We then analyze every part of Microsoft’s positioning in the AI Tokens Economic Stack:

  • Applications

  • LLMs

  • PaaS

  • IaaS

  • Chips

  • System architecture

Source: SemiAnalysis sales@semianalysis.com

In each section, we’ll dive deep into Microsoft’s product portfolio, competitive positioning and outlook. It’s not all good news for Microsoft as the software behemoth faces a slew of new entrants and challengers to their dominant productivity suite and AI compute platform.

Microsoft & OpenAI in 2023-25: From All-In on AI to the Big “Pause”

2023-24: selfbuild, leasing, and building the world’s largest datacenters for OpenAI

ChatGPT’s release in November 2022 changed the world. Microsoft was the first hyperscaler to react to the “ChatGPT moment”, and they did it in a spectacular way. While Microsoft had invested $1B in OpenAI back in 2019, they 10x’d the investment in January 2023. Simultaneously, they engaged in the most aggressive datacenter buildout in history – primarily driven by its key AI partner.

The chart below depicts datacenter pre-leasing activity, one of the best leading indicators on capacity growth and CapEx. Microsoft’s pre-leasing activity from Q1’23 to Q2’24 dwarfed that of the other hyperscalers combined. In Q3’23, Microsoft alone leased nearly as much as the whole North American market leasing in full year 2022.

Source: SemiAnalysis Datacenter Model

And datacenter leasing is only a portion of the picture. Our building-by-building Datacenter Industry Model highlights unprecedented growth in MWs of self-built capacity added over the course of 2024 and 2025. In addition, they also contracted billions of dollars from Coreweave and Oracle to get additional capacity.

Microsoft & OpenAI training clusters – from a fraction of a building to the world’s largest facility

Perhaps the most iconic symbol of this buildout is the “Fairwater” program. In 2023-24, Microsoft planned and simultaneously constructed the two largest datacenters on Earth. Let’s briefly go back in time to get a sense of the scale of Microsoft’s 2023-24 buildouts. We show below their first major training cluster, in Iowa, where GPT 3.5 was trained. We believe it hosts ~25k A100 chips. While the campus shown below is fairly large, we believe OpenAI only uses two data halls of one Ballard building, i.e ~19MW.

Source: SemiAnalysis Datacenter Model

The second major cluster was built in Arizona. It scaled up over time, building by building, with the first H100 building completed in 2023, followed by H200s in 2024 in a separate facility, and two more datacenters hosting GB200 in 2025. In Total, we estimate ~130k GPUs across four buildings.

Source: SemiAnalysis Datacenter Model

Microsoft’s next-generation clusters for OpenAI are called Fairwater and are significantly larger. Each “Fairwater” is comprised of two buildings – a standard CPU & storage facility of 48MW, and an ultra-dense GPU building. The latter, with two stories and a total ~800k sqft area, boasts ~300MW, i.e. the equivalent power consumption of >200k American households. That represents over 150k GB200 GPUs per building. We show below the Wisconsin facility – fully dedicated to OpenAI.

Source: SemiAnalysis Datacenter Model

In Georgia, QTS built for Microsoft a “sister” facility, again for OpenAI. While the cooling system is different, the GPU building is also ~300MW. The picture below shows the scale of the facility – no other building in the world has as many air-cooled chillers for cooling purposes! The size of the onsite substation is also impressive.

Source: SemiAnalysis Datacenter Model

Not only are individual buildings the largest on Earth, they sit in even bigger campuses. In Atlanta, a second Fairwater is already well under construction.

Source: SemiAnalysis Datacenter Model

In Wisconsin, while a second Fairwater will imminently start construction, there’s more to the story as Microsoft is preparing for an even larger 3rd phase. We believe Microsoft designed two >600MW individual buildings, with each facility having 2x the amount of CPU/storage and diesel generators, relative to a standard ~300MW Fairwater. We show below the site plan associated with these 600MW buildings. These would be the largest individual datacenters in the world if they are built on time.

Source: SemiAnalysis Datacenter Model, Local Disclosures

At full buildout, this will be one of the world’s largest campuses, with over 2GW of IT capacity.

Source: SemiAnalysis Datacenter Model

Cherry on top of the cake, Microsoft planned all these major AI regions to be connected via an ultra-fast AI WAN, at over 300Tb/s with an ability to scale to above 10Pb/s. We called that out over a year ago in our piece “Multi-Datacenter Training: OpenAI’s Ambitious Plan To Beat Google’s Infrastructure”.

Multi-Datacenter Training: OpenAI's Ambitious Plan To Beat Google's Infrastructure

Multi-Datacenter Training: OpenAI's Ambitious Plan To Beat Google's Infrastructure

Dylan Patel, Daniel Nishball, and Jeremie Eliahou Ontiveros
·
September 4, 2024
Read full story

We show below a representation of the network design for a hypothetical 5GW distributed cluster. We will discuss all aspects of the Fairwater networking architecture later in the report, building on our AI Networking Model.

The Multi-Gigawatt “pause”

After firing on all cylinders, Microsoft suddenly decided to hit the brakes– in a spectacular fashion. Looking at total datacenter pre-leased balance, Microsoft alone was over 60% of leasing contracts at peak! But new leasing activity froze after Q2’24 (calendar) while other hyperscalers materially picked up. Microsoft is now below 25% of the total hyperscaler pre-leased capacity.

Source: SemiAnalysis Datacenter Model

At that time, Microsoft also dropped out of multiple gigawatts of non-binding LOIs, in multiple locations such as:

  • Major US markets, such as Phoenix and Chicago.

  • Major European markets, including the UK and the Nordics, among others.

  • In the rest of the world, Microsoft’s pause impacted Australia, Japan, India, as well as LatAm.

These sites went to other major competitors such as Oracle, Meta, CoreWeave, Google, Amazon, etc. Microsoft forever ceded a large percentage of AI Infrastructure away due to tepidness and lack of believe in AI.

In addition, Microsoft also materially slowed down its selfbuild program. We show some pictures below, listing ~950MW of “frozen” IT capacity. This doesn’t include multiple other datacenters in Virginia, Georgia, Arizona, as well as internationally.

In the same time frame, major datacenters from other players went from ground broken to running workloads. In total Microsoft paused over 3.5GW of capacity that would have been built by 2028. The details are in the datacenter model.

Microsoft’s AI portfolio decomposed: IaaS, PaaS, Models, Applications

To understand the causes and consequences of the Big Pause, let’s dig into all parts of Microsoft’s AI portfolio. Our preferred framework to analyze the margins at different layers is our “AI Token Factory Economics Stack”:

  • From chip to tokens, a broad range of suppliers are exposed to the AI infrastructure buildout.

  • Currently, the single largest margin stack is, of course, at the chip level driven by Nvidia’s 75% GPM

  • There remain intense debates on the end-state margin profile of the following four layers:

  • Application layer (e.g. ChatGPT, Microsoft Copilot, Claude Code…)

  • Model layer (e.g. Claude 4.5 Sonnet, GPT5-Pro, DeepSeek R1,…)

  • IaaS layer (e.g. Coreweave renting a bare-metal GPU cluster to Meta, Oracle renting GPUs to OpenAI, Nebius renting SLURM and K8s in a multitenant cluster to startups…)

  • PaaS layer (e.g. AWS selling tokens to a Fortune 500 enterprise via Bedrock, Nebius selling a fraction of a GPU cluster to a startup with SLURM and K8s…)

At the current pricing today, we see leading model makers making 60%+ margins on their direct API business

Source: SemiAnalysis

Azure’s AI Bare Metal services – stepping away from $150B of OpenAI Gross Profit dollars, poor execution and ROIC concerns

Successful players in the business of building massive Bare Metal GPU/XPU clusters have mastered the art of constructing large-scale infrastructure. It is a mix of various ingredients such as speed of execution, understanding of the market and end-user requirements, site selection, and financing, among other things.

Our deep dive on Oracle called out their material strategic change to win over the market. Outside of tech giants, Coreweave is a case-study of a player with no initial scale, winning over the market by executing perfectly on the above criteria. Let’s now look at Microsoft’s execution.

Disappointing execution, loss of the Stargate contract

To gauge Microsoft’s bare metal efforts, it is useful to dig into the Fairwater projects. Early in 2024, rumors swirled around Microsoft’s $100 billion “Stargate” project for OpenAI. We believe they planned to host the cluster at the Wisconsin datacenter campus. As discussed earlier, the roadmap would’ve taken the site to over 2GW of capacity.

Of course, the first Stargate $100B contract ended up going to Oracle and Abilene, TX. Microsoft’s slow execution played a key role, in our view. Over two years after breaking ground, Phase 1 is still not operational. By contrast, Oracle broke ground on its Abilene, Texas datacenter in May 2024 and begun operations in September.

We also believe Microsoft poorly planned the 1.5GW expansion. Full capacity is slated for at least mid-2027 delivery from a power transmission standpoint, one year after Oracle’s Abilene cluster breaks the 1GW mark. Microsoft couldn’t keep up with OpenAI’s request to scale as fast as possible – demonstrating a misunderstanding of the market. The AI lab had no option but to look for other partners to serve its insatiable need for near-term compute.

Source: SemiAnalysis Datacenter Model

Stepping away from $150B of OpenAI Gross Profits

As we know today, Oracle has become OpenAI’s main GPU partner. They’ve signed over $420B of contract value over the last twelve months, translating into ~$150B of gross profit dollars, detailed modeling is available in the AI TCO model for every neocloud compute contract and their cost / margin breakdown.

Given typical 5-year duration, the $30B of annual gross profits would’ve increased Microsoft’s $194B of annual gross profits (FY25) by over 18%. To be fair, the loss of the OpenAI contract is not just about execution. It’s also, to some extent, a conscious decision. From Microsoft’s perspective, landing all OpenAI contracts would have deteriorated the quality of their Azure business, since:

  • OpenAI would’ve represented close to 50% of Azure’s revenue within a few years.

  • The margin and return on capital profile aren’t nearly as attractive as Azure’s historical Cloud profile.

Compared to Microsoft’s overall business, Oracle AI ROIC is indeed lower coming in at 20% vs overall 35-40% today for total MSFT. However, we see that Microsoft’s own AI ROIC is not much higher than ORCL’s once you strip out the revenue share from OpenAI which is set to end between 2030-2032.

Source: SemiAnalysis Tokenomics Model

However, Microsoft seems to be forgetting their own recent history lesson as they moved from a very heavy bare metal workload AI revenue composition towards more API and token-factory business model, resulting in continuous gains in their ROIC profile. They potentially have just allowed a competitor to fund their own entry into the AI factory business!

For a comprehensive model of the economics of OpenAI, Oracle and Microsoft, refer to our industry-first Tokenomics Model. Built on top of our Datacenter and Accelerator tracking, Tokenomics tracks every single major compute contract and unpacks all relevant financial metrics: growth, profits, ROIC, funding, and much more.

Underestimating demand, RPO share loss, desperate need for Neocloud capacity

A key lesson from Microsoft’s Pause is how much they underestimated the scale of XPU Cloud demand coming from other players, such as Meta. We are currently witnessing the impact of their miscalculation. Other players have been booking materially more RPO than Microsoft.

Source: SemiAnalysis Tokenomics Model

Microsoft is now firmly back to the market, but they’re running out of options to expand near-term capacity. They’re force to go with the worst option: renting GPUs from Neoclouds and reselling that to 3rd parties, either as bare metal or tokens through Foundry. We’ll discuss Foundry below. Of course, the business of renting bare metal to resell bare metal will yield a materially lower-than-usual margins for Azure.

Source: SemiAnalysis Tokenomics Model

Microsoft shunned away building their own DCs, only to pay margin to neoclouds when they realized they messed up.

PaaS layer – Not All GPUs Are Deployed Equally

Gold-rated cloud… at risk of downgrade

In ClusterMAX 1.0, released in March of this year, we discussed how Azure was leading the way on networking performance security, and availability of the latest GPUs, and had already captured the lions share of OpenAI’s compute buildout. This put them clearly in the Gold tier of our rankings, right behind CoreWeave and next to companies like Nebius, Oracle, and Crusoe. However, by the time ClusterMAX 2.0 rolled around in early November, it has become clear that the pace of development for new CycleCloud and AKS features targeting AI workloads has stagnated.

In our research talking to 140+ buyers of compute from scaled-up AI companies such as OpenAI, Meta, Snowflake and Cursor to startups like Periodic Labs, AdaptiveML, Jua, Nous Research, DatologyAI and Cartesia, it is clear that Azure is not a significant player for managed clusters or on-demand VMs. Azure’s GPU capacity for large scale clusters seems to go straight to OpenAI, with the leftovers being gobbled up by individual developers at legacy businesses in the Fortune 500. These companies that love working on internal RAG chatbots typically have enterprise agreements in place to purchase all their IaaS exclusively from Azure.

ClusterMAX™ 2.0: The Industry Standard GPU Cloud Rating System

ClusterMAX™ 2.0: The Industry Standard GPU Cloud Rating System

Jordan Nanos, Daniel Nishball, and 5 others
·
Nov 6
Read full story

During our hands-on testing the reason why Azure is not selling managed slurm or kubernetes clusters for AI is clear: we found significant gaps in ease of use, monitoring, reliability and health checks on CycleCloud slurm clusters. The wholesale bare metal experience that Azure provides OpenAI when they are renting out entire data halls at a time is very different from what a provider like CoreWeave, Nebius, or Fluidstack gives to their end users.

The typical GPU compute buyer in the industry is still looking for H100, H200, B200, or B300 HGX servers on the scale of 64 to 8,000 GPUs. It is much less common for a buyer to look for GB200, GB300, or anything from AMD. However, Microsoft has invested enormous amounts of time and attention into AMD GPUs and GB200/GB300 NVL72 rack scale systems for their largest customers (aka OpenAI). You can measure this in terms of OPEX on engineers salaries or CAPEX on GPU purchases and new facilities.

An additional way to look at this is via the open source community. According to Hugging Face, the de facto place for any company to publish and download open source models, IPs associated with Microsoft represent 5x less daily model downloads when compared to Amazon, and 3x less when compared to Google.

Microsoft shoo’ed away the OpenAI business, but its not like they are capturing enterprises or long tail. They are significantly behind the other hyperscalers on this metric.

The result of all of this is clear: AI companies who are actively looking for capacity are looking elsewhere. This can range anywhere from a 1y 64-GPU contract valued around $1M all the way up to 3y 8000-GPU contracts valued at over $500M. We are seeing startups who bought 256 H100 in March look for 9,000 GB300 NVL72 in November. Right now, Azure is missing out on all of this upside.

To address this customer base, we believe that Azure must re-vamp its CycleCloud and AKS offering for AI, simplifying the current cluster deployment and monitoring experience. They need to build health checks, deploy them to clusters by default, and proactively recover from hardware failures. And they need to build the GTM and support organizations full of people to deliver these clusters to end users. We mentioned in ClusterMAX 2.0 that Azure is at risk of being demoted to Silver due to its poor user experience for startups from Series A to AI unicorns.

ClusterMAX 2.0 Rankings
Source: SemiAnalysis ClusterMAX 2.0 Ratings, November 2025 http://clustermax.ai/

“Fungible fleet” and Sovereign AI – a bet on the direction of inference workloads

With that said, Azure clearly has a foundation built for success. They have 70 regions with over 400 datacenters globally. They run the biggest SaaS business ever seen, with experience selling to the largest organizations all over the world: from “Azure Government Secret” for US intelligence agencies to Windows PCs for consumers in China.

Key to Azure’s strategy is to bring AI closer to the enterprise customers, by having a broad geographical footprint. This is a directional bet on the future shape of AI workloads:

  • Today’s largest inference use-cases, namely ChatGPT and coding agents, are not latency-sensitive, will become increasingly less so as time horizons keep increasing. They also don’t interact with sensitive enterprise data, for the most part. As such, latency and data locality don’t matter much – the name of the game is ramping up capacity as fast as possible to sell more tokens to the world.

  • In the future, enterprise use-cases will likely be a large source of growth. They’ll have to comply with high security, data locality laws, and typical environments & constraints favored by large enterprises. They’ll also be co-processed with non-AI workloads, e.g. Cosmos DB storage in a specific Azure region. The drawback is that the datacenter site selection process is more complex, due to power constraints impacting most major metros in the world. They won’t be able to ramp as fast, relative to others building in “middle-of-nowhere” locations with excess power.

Constructing and leveraging a global footprint is key to Microsoft’s leitmotiv of building a “fungible” fleet. They’ve already seen some success. For example ByteDance Seed trains their video models in Arizona USA, not China or Malaysia. We believe ByteDance Seed rents from all major US hyperscalers on American soil. While this run was on a competitor, it shows some fungibility is not necessary.

That infrastructure strategy is quite different from that of leading AI labs like OpenAI. Given the most power-intensive workloads take several minutes to respond (e.g. Deep Research, reasoning models), adding a few milliseconds of networking latency is irrelevant to them.

As the time horizon of AI tasks extends, the locality to the user matters less and less.

The datacenters can be placed wherever possible and serve global traffic. This is further driven by the fact that post-training workloads are quickly increasing in compute, which are similarly latency insensitive and do not necessitate large amounts of centralized compute.

Depreciation Schedules and the Future of GPUs in Azure

If you are to crack open this fungible fleet, an important consideration is found. One that has received a lot of coverage lately. Namely: depreciation.

The infamous Michael Burry recently claimed that all hyperscalers (Meta, Google, Oracle, Microsoft, Amazon) are artificially boosting earnings by extending the useful life of their IT assets. This boost has moved the “useful life” from 3-5 years back in 2020, to 5-6 years in the present day.

Source: Michael J Burry on X, the everything app

Dr. Burry’s claim is predicated on an assumption that the NVIDIA product cycle is now 2-3 years, which is far lower than the useful life of the assets. We believe this is a fatal flaw in the argument. The new accounting, while beneficial to the companies in the short-term, is also predicated on real operational experience in datacenters.

Back in 2020, when Microsoft, Meta, and Google increased the useful life from 3 to 4 years, we were still in the year 2 BC (Before ChatGPT). Now, in present-day 3 AD (After Da Launch of ChatGPT), the increases in useful life have proved beneficial to CAPEX-hungry hyperscalers. What began changing in IT equipment in 2020 that has continued through to 2025? The answer is reliability, and incentives.

Server OEMs such as Dell, SuperMicro, HPE, Lenovo, and Cisco have long sold servers with a standard warranty between 3 to 5 years in length. The 5 year warranty is more expensive, of course, but many extended warranty options exist for 6 and 7 years. The price goes up of course, but all it takes is the vendor stocking enough spare parts to make service calls on worn out nodes. Meanwhile, networking equipment vendors such as Cisco, Arista, Aruba and Juniper have experimented with lifetime warranty on their switches. Storage vendors have offered the same - just pay a yearly support contract and they’ll keep swapping worn out drives. Think of it like a car: the high end of the market might lease and upgrade their Benz every 2 years, while others drive their 20 year older beaters for the price of gas and insurance.

The point is proven when looking at the largest HPC clusters and Supercomputers in the world. These leading edge systems run the biggest, baddest, hottest (and sometimes most efficient) processors on the market. Supercomputing centers were the first to adopt liquid cooling, and have experience building a datacenter around the system, rather than fitting a system into a datacenter.

IBM Summit at Oak Ridge National Laboratory was for a long time the world’s fastest supercomputer on the Top500. It went into production in June 2018, and was decommissioned in November 2024 after 6.5 years of continuous operation. Summit used IBM Power9 processors that were launched in 2016, with the procurement being completed as far back as 2014.

Fugaku was installed at RIKEN in Japan in 2020 and is still running, sitting at #7 on the Top500. Sierra was installed at LLNL in 2018 and is still running, sitting at #20. Sunway TaihuLight was installed in 2016 in the National Supercomputing Center in Wuxi, China. It is still running and sits at #21. Exascale systems like El Capitan, Frontier, and Aurora (#1, #2, and #3 on the list respectively) went into service anywhere from 2021-2025, and are expected to run until 2027-2032.

Finally, Eagle - Microsoft NDv5 featuring 14,400 H100 was installed in 2023, and is sitting at #5. We expect this system is highly utilized and will be running for years to come.

Turning to current cloud providers, we can find p3.16xlarge for sale in AWS with 8x V100 GPUs. We can find similar instances available through marketplaces like Shadeform, Prime Intellect, and Runpod, from underlying providers such as DataCrunch, Paperspace, and Lambda Labs.

V100 was announced in May 2017, with volume shipments happening in the fall of 2017, and final product shipments from NVIDIA occurring in January 2022. In other words, NVIDIA shipped spare parts for over 5 years from the time they launched the new GPU. Hyperscalers and OEMs had plenty of time to stock their spares, and keep instances running through to the present day - a full 8 years after those V100 GPUs started shipping.

Finding some V100’s for sale on our AWS console

Of course, V100s today aren’t a great business on a pure revenue-per-MW basis. So much so that we are aware of hyperscalers ripping out V100, A100, and even older H100 GPUs from their older datacenters in order to make room for the latest and greatest. The point is not that they are doing this because the GPUs are wearing out and dying of old age. Rather, they are ripping out revenue-generating assets in favour of higher-revenue-generating-assets due to power and floorspace constraints.

Key to optmizing GPU Cloud economics is to maximize their economic life. Our AI Cloud TCO Model provides a useful framework. Analyzing the TCO of a H100 cluster, we see that after stripping capital costs, there remains operating costs of $0.30-0.40/GPU/hour. The question is whether, after 5 years, a GPU can still monetize above that rate.

Source: SemiAnalysis AI Cloud TCO Model

Our AI Cloud TCO Model provides an answer by forecasting GPU rental prices. Our historical hit rate has been remarkably high, which is why our model is trusted by most of the largest buyers of GPUs, as well as their financial sponsors, to make billion-dollar investments.

Source: SemiAnalysis AI Cloud TCO Model

While our long-term forecast points to rapid declines, the economic life will be much higher than 2-3 years. From the perspective of a PaaS company like Microsoft, they key is sell GPUs at a good enough pricing at years 5-6 to maximize economic returns. In our view, Azure has a shot at accomplishing this, given:

  • Deep enterprise relationships, powered by their broad “fungible fleet”, which could lead to an ability to sell at higher prices than most.

  • An ability to upsell: if Azure can’t monetize 6-yr-old bare metal GPUs above operating costs, these GPUs might be enablers of addional upselling, such as non-AI worloads (e.g. database), or additional AI layers like tokens or applications. We discuss these two layers in more depth below.

To be clear, it remains uncertain at this stage, and will depend on the scale and pace of enterprise adoption, and whether new entrants like Coreweave are successful at developing tight enterprise relationships.

Source: SemiAnalysis AI Cloud TCO Model

So what will the future hold? Will Vera Rubin deliver on performance claims and motivate hyperscalers to rip out perfectly good, revenue generating GPUs after just 2-3 years in service as Dr. Burry claims? Or will we see the bottom of our H100 pricing data hold strong into the future? These questions remain to be answered, but we attempt at providing the best insights via our unique tracking of the industry. In addition to our industry-standard GPU testing ClusterMAX, we recently launched InferenceMAX, the best free & open-source benchmark of token throughput by GPU. While system-level innovations like Nvidia GB200 NVL72 push the frontier of capabilities at an impressive rate, we don’t see any case for 2-3 years useful life.

Source: SemiAnalysis InferenceMAX - inferencemax.ai

Azure Foundry: the enterprise Token Factory

Azure Foundry is where GenAI meets Microsoft’s enterprise-grade fleet. Foundry is a “Token-as-a-Service” business enabling Microsoft to monetize a variety of models at different margin profiles in the form of a platform business. This includes a very large range of open sources models. Crucially, it also includes OpenAI models – an Azure exclusivity (besides OpenAI’s own platform, dubbed OpenAI API platform). We’ll discuss the IP sharing in more depth below.

Currently, the majority of GPT API tokens are processed directly via OpenAI, but we expect Foundry to be a major growth driver going forward.

Source: SemiAnalysis Tokenomics Model

However, we believe that the business of selling tokens to enterprises remains nascent. Alphabet’s Sundar Pichai gave an interesting disclosure confirming our stance during the Q3’25 earnings call:

Over the past 12 months, nearly 150 Google Cloud customers each processed approximately 1 trillion tokens with our models for a wide range of applications.

This disclosure suggests that sales of Gemini tokens to enterprises represents a total revenue below 0.5% of GCP’s business.

Converting tokens to revenue is much more complicated that it might seem. We often see analysts making huge mistakes in areas like input/output ratio, failing to account for cached tokens, or miscalculating pricing. Our Tokenomics Model provides a comprehensive understanding of the economics of generating tokens and translating tokens into revenue, watts, RPO, and ROIC.

Application layer: GitHub Copilot Moat, Meet Cursor

In the application layer, Microsoft enjoyed total dominance in the field through GitHub Copilot. Microsoft had the first in-line code model, commonly referred to as “tab” models now, and were early to integrating GPT-4 to Copilot.

From afar, Microsoft’s fort seemed impenetrable. They owned VS Code and GitHub, the standard tools for industry, access to OpenAI IP to build products on top of, and an enormous enterprise footprint to sell products into.

However, a closer look revealed defects: the integrations of OpenAI’s models into the product were not as strong as it could be and much of the core infrastructure was open source, like VS Code.

Players like Cursor swooped in, forked VS Code, built tighter and improved integrations between models and the codebase, and delivered superior experience. Microsoft execution was frankly poor.

Features like Slack integrations, one-click MCP installs, Agent planning and execution, etc. were all features that Cursor added weeks or months before Microsoft. One is a startup and the other is a multi-trillion dollar corporation, but in a circumstance where the switching costs are low and the customers are early adopters of new technologies, it was no surprise that Cursor’s revenue exploded.

The labs also developed products themselves. The optionality of other models was taken away, but available ones were trained on the harness and environment used in production. This delivered an optimized experience that, as the revenue ramp of Codex and Claude Code shows, is very popular.

Source: SemiAnalysis Tokenomics Model, Company Disclosures/Announcements

Microsoft has diversified into taking an ecosystem bet, with features like Agent HQ, an ecosystem that plugs int agents from a variety of labs, including Google, and smaller players like Cognition. This will grow the GitHub ecosystem and retain developers but is less profitable than owning the coding agents themselves.

Source: GitHub

The revenue increase rate will continue to be larger for apps like Claude Code and Codex as compared to Copilot. While Microsoft does plan on expanding their efforts via MAI, which we will dive into shortly, we think they will have more success through their ecosystem bet than through building their own coding agents.

Microsoft’s homemade models: MAI

Microsoft has released 3 MAI models across text, images, and voice. The text model, MAI-1 is currently around 38 on LMArena but not yet publicly available either through chat or through API. The model is a large MoE trained on 15,000 H100s and the next model will be a much larger multi-modal LLM.

The other two are image and voice models respectively. The image model is still a top 10 model in LMArena, and both are available in Copilot.

To Microsoft, the latter two models represent a use case where models can be served cheaply and with decent quality. But fundamentally image usage in the industry is dominated by LLM usage—and the MAI LLM leaves much to be desired. The release did not even contain evaluation numbers.

The explicit goal for MAI-2 is to scale up the pre-training run, data, and produce a larger more capable multi-modal model, we expect it to not match the capability of frontier OpenAI or Anthropic models.

Microsoft’s explicit strategy is that they will not spend the excessive compute on MAI because they have access to OpenAI models until 2032 or AGI. This is capital efficient because it makes no sense to spend until then. The short sightedness of this behavior is that Microsoft has to be able to attract top tier talent and build a top lab if they want to be a top 5 company in the post 2032 world. They do not have that today.

This does not mean that Copilot products will stagnate in quality, though. Copilot as a whole has many different parts which themselves can drive significant revenue.

Office 365 Copilot

Microsoft Copilot is an overarching umbrella comprising of more than just GitHub Copilot. There are Copilots for Sales, Finance, Service, Security, and others. This umbrella has itself surpassed 100M Monthly Active Users and will comprise how a significant portion of the population will interact with AI.

The latest effort to build the Office 365 Copilot is within the Office Agent, and we dive into the Excel one in particular later in the article. The overarching goal of these agents is to take actions across the Microsoft Ecosystem in way that is autonomous, functional, and useful for users. In classic Microsoft fashion, Copilot is now bundled within the Office apps for $20/month.

The implementations themselves can quite brittle as shown with the lack of usefulness for most of Office Copilot today. The question is if Microsoft will execute like they did with GitHub copilot, better, or worse.

They have access to OpenAI IP and the underlying data that can be used to post-train them.

Microsoft’s edge: OpenAI IP and Office user data

Access to OpenAI’s models, weights, and code base enables Microsoft to distill from the raw Chain of Thought of the OpenAI models. Distillation is more effective than post-training small models, meaning that Microsoft will be able to get significant capabilities without a meaningful compute cost.

What access to the OpenAI IP also enables is fine-tuning the OpenAI models with data Microsoft has access to for applications Microsoft explicitly cares about. As Satya said:

We are absolutely going to use the OpenAI models to the maximum across all of our products.

This will include all forms of post-training, like mid-training, RL, and SFT. There are existing techniques that can be used here, so all that is left is data, which Microsoft has plenty of. Indeed, Microsoft is already doing this.

The Excel Agent is a post-trained version of one of OpenAI’s reasoning models. This led to results that Microsoft claims to be better than the frontier labs.

Source: Microsoft

Given a sufficiently strong baseline, which an OpenAI reasoning model certainly is, the bottleneck to increasing performance across the Office apps is data. This data can be in the form of RL environments to train the model over. In Microsoft’s case, they own all of the underlying technology behind Office, ie the most faithful versions of these environments.

They do not have to pay for them to get built from scratch like many of the frontier labs are resolving to doing. Just like how Gemini models can search Google a couple hundred times in less than a couple seconds, Microsoft’s excel agents would not be bottlenecked by having to use the same interface a human would and could operate at a significantly faster pace.

So while the future for the MAI models reaching the frontier in the future is rather bleak, there is significant product upside for Microsoft to be captured if they post-train OpenAI’s models on their own software stack.

The part which Microsoft does not have access to beyond AGI or beyond 2030 (whichever is first), though, is the research IP. This includes:

“…confidential methods used in the development of models and systems”

There is no company in the world with more advanced “systems” for developing leading AI models in the world than OpenAI. Azure has access to all of it. And in some ways, these systems are more important than the models. The underlying systems are the machine that builds the machine. These systems snowball, getting substantially better over time, whereas model architectures can change in a matter of months.

Having dug deep into Azure’s AI business, we now turn our attention to two key parts of Azure’s AI hardware stack:

  • Microsoft’s actual chip strategy: how they will balance NVIDIA, Maia, OpenAI, AMD and others

  • Azure’s networking architecture, and the impact on a broad range of suppliers

Mama Ma-ia!: Custom ASIC Struggles

In custom silicon development, Microsoft sits in dead last among the hyperscalers, and is not even trying to catch up.

Microsoft showcased their Maia 100 accelerator in late 2023, the latest of the big 4 hyperscalers to have an AI accelerator ASIC.

Microsoft Infrastructure - AI & CPU Custom Silicon Maia 100, Athena, Cobalt 100

Microsoft Infrastructure - AI & CPU Custom Silicon Maia 100, Athena, Cobalt 100

Dylan Patel and Myron Xie
·
November 15, 2023
Read full story

As expected of first generation silicon, Maia 100 was not manufactured in high volume or deployed for production workloads. The chip was architected before the Gen AI boom, leaving it short on memory bandwidth suitable for inference. ASIC programs take multiple generations to iterate until meaningful compute can be offloaded compute from merchant systems.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Dylan Patel
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture