AI Gigafactories: From Design to First Token at Scale

Home

As AI moves from design to deployment, infrastructure constraints are becoming the primary bottleneck. Lenovo and IREN explore how gigafactory-scale systems, vertical integration, and time-to-first-token metrics are redefining AI at scale.

‍

The real limit to AI success is how fast you can deploy it.

At NVIDIA GTC 2026 in San Jose, Daniel Newman and Patrick Moorhead sit down with Vlad Rozanovich of Lenovo and Kent Draper of IREN to unpack what it actually takes to move from AI system design to “first token” at gigafactory scale.

As demand shifts from training models to powering real-world applications, infrastructure is becoming the critical path. The challenge is no longer designing systems, it’s deploying them at scale with the power, cooling, networking, and operational precision required to sustain performance.

The group explores why many AI initiatives stall between planning and production, and how vertically integrated approaches, combining infrastructure design, deployment, and operations, are emerging as a competitive advantage. At this scale, every constraint matters, from land and energy to networking complexity and system reliability.

As AI gigafactories take shape, the focus is shifting toward measurable outcomes, time-to-first-token, sustained performance, and the ability to deliver consistent value across enterprise and cloud environments.

Key Takeaways:

🔹 Time-to-first-token is becoming a defining metric for AI ROI
🔹 Infrastructure bottlenecks span power, cooling, networking, and deployment timelines
🔹 Vertical integration is emerging as a key advantage in scaling AI infrastructure
🔹 Sustaining performance at scale is more difficult than achieving peak performance
🔹 Enterprise demand is driving AI adoption beyond model training
🔹 AI gigafactories are reshaping competitive dynamics across cloud and enterprise

Subscribe to our Youtube channel for more insights from NVIDIA GTC 2026.

Listen to the audio here:

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

‍

Transcript

Kent Draper:
Historically, the biggest users of compute were the people training LLMs. Now you're actually seeing a lot of the demand in the industry being driven by real world use cases. And I think that is just a vastly different position from where we were even a year ago and really proves that this is not a bubble.

Patrick Moorhead:

The Six Five is On The Road here at NVIDIA GTC 2026 here in San Jose. We are in the Lenovo booth. Gosh, what a keynote yesterday. Everybody's talking about it, talking about the, what does the $1 trillion actually mean?

Daniel Newman:

Oh, I mean that and could I finish a marathon before he finishes his dinner? There we go. It went for some time, but it was very good. I mean, you know, he's kind of got this arc, and he's become quite prolific on stage. He tells his story. I think it was, what, 75 minutes before we got to kind of the goods. Right. And everybody kind of knew what they were expecting. You and I had written our preamble post, you know, heterogeneous compute. We were going to hear something. The Grok deal was going to come to life. We knew Vera Rubin. We knew Feynman was going to be teased. We know he'd get into robotics. We knew he'd get into these kind of, industries. He does it each and every time. But it didn't disappoint. And yes Pat throwing out a trillion dollar number by 27 did create this market moment like this.

Patrick Moorhead:

Yeah.

Daniel Newman:

And then everybody put their calculators like well he already said 500 billion by the end of 26. So I don't know how much of an impact but a trillion dollars in the next 20 months is

Patrick Moorhead:

Yeah. So what is all that capital expenditure paying for. It's paying for those giga factory giga factories out there. And it's easy for analysts like us to throw this around because we're not the ones who actually have to do the work that goes underneath it. Right. Fair. But we do have two people. on the show today to walk us through that. Vlad from Lenovo, great to see you again. And Kent from IREN, great to see you first time on the show. Hearing a lot about you, not only in the tech news, but also on CNBC and places like that.

Kent Draper:

Great to be here, appreciate you having me on.

Patrick Moorhead:

Sounds like you flew a long way.

Kent Draper:

I did, although I'm based in Vancouver these days, so a little closer, but Sydney originally.

Daniel Newman:

There we go. I had to assume. Every time I go there, that's the only place I get the sleep issues when I go to Australia. We'll have to talk about how I'm going to fix that after the show, but for the audience out here that want to know more about Gigafactories, let's start out with you, Vlad. The AI design to First Token has become this really hot metric. Talk about why time to First Token is so hot and how you're enabling

Vlad Rozanovich:

Yeah, well, no, Dan, and thanks again. Great seeing you guys as our few times we've been together. You kind of nailed it. Time to first token is the metric that so many people like Iren look at because it really is. It's how do you maximize value? How do you maximize profit as soon as you can? The ecosystem is moving so rapidly with the introductions from NVIDIA and other partners out there in the market that if you can catch that front edge, and enterprises and other large language model type companies are looking at what is the best performance per watt per dollar that they can actually get out with a speed you know speed to light. That's really what so many customers are asking for right now. And from a Lenovo perspective we're really trying to keep on that cadence of how are we ensuring our customers like Iran are getting time to first token so that they can maximize their value for their customers.

Patrick Moorhead:

So Kent, I kind of poked fun at what industry analysts do. I did actually have a real job for over 20 years at real tech companies. So I actually have built and deployed things. But now I can vibe code and actually help me create an entire architecture for a gigawatt scale facility out there. But there's a big difference between what the design is and actually creating it. Where do most of these designs fail. Like at what point. And what what is what is making it fail. Is it a is it certain part of infrastructure. Is it the land. Is it the power. Is it the water. Is it the cooling. Is it the. I mean where is the breaking point.

Kent Draper:

Yeah, I think in short these days, it's kind of all of the above. I mean, there's serious pressures throughout the entire value chain. And I think, to some extent, design today is actually the easy part of the equation, because for the actual compute clusters, there are very well proven out reference architectures. But putting it all together, deploying it at scale, having access to the land, the power, being able to build data centers on time, be organized with your long lead procurement, getting ahead of the curve on that side. You need all of the elements to come together. And I think that's one of the reasons we like working with Lenovo. is because they can bring a lot together. So we work with them at the front end on the design side. We work with them throughout deployment and then ongoing system operation. They're able to provide continued services during the lifecycle of the product. So in short delivering infrastructure at this scale is not simple. And you know what one of the reasons why we think we have an advantage in this industry is because we're vertically integrated. And so we are actually attacking all the vectors here. The front end development of these projects. The build, so we act as our own GC when we're building these data centers. And then obviously the operations of the GPUs once they're in the facility. So it actually gives us a lot more control throughout the entire value chain.

Daniel Newman:

Let me ask you a quick follow-up to that, because I've been following your story for some time, commented on it. That was a pretty big leap. There was a whole market of companies that were kind of in the mining space that had access to all this grid capacity and that have made a great pivot and signed these wonderful contracts with the hyperscalers and you're one of those companies. But to make the decision to go up the stack, can you give a little bit like what was the inflection that you guys said, hey, we're gonna move from kind of being kind of a pure plug-in power front-end and GC to like an all-in full-stack NeoCloud with power?

Kent Draper:

Yeah, well, I think in our minds, this was never actually a pivot for us in the sense that when the business was founded, our two co-founders have a background in large scale institutional infrastructure and infrastructure banking. They could see this inflection in the industry where the world was becoming more data dependent. It was very obvious to them that we were going to need power dense compute at large scale. particularly with access to renewables, which is a big part of our story. And so they always had use cases in mind that encompass AI and machine learning. So right back to our C decks, those use cases were mentioned. And what you see in the digital world is obviously these exponential growth curves. But the real world development cycles are multi year cycles to get this infrastructure built. And that was the specific area that we were targeting. So for us we had always built very high quality data centers. We'd built out a very large portfolio of access to land and power. We thought that was going to be one of the key constraints in the industry. And so we were very ready for this moment. And in fact, we'd been exploring it for a number of years, but the market just wasn't there to make it commercially viable. But obviously, as soon as you saw that in fact, inflection with chat GPT and the growth, we were ready. So not so much a pivot from our perspective is just, you know, a natural extension of what we were already doing.

Patrick Moorhead:

Yeah, so Vlad, back to the earlier question. Kent, you probably could have done the mic drop and didn't need to say anything after what he said, but I think our viewers really want to know, what specifically are you doing? What is Lenovo doing to help facilitate, speed this up, and not be the company or the infrastructure that was the reason that something couldn't get installed or set up and operational?

Vlad Rozanovich:

Yeah, what I see, Pat, is when Kent talked about the vertical integration, you know, a year ago, two years ago, we were talking about air-cooled GPUs in a data center maybe consuming 40 to 60 kilowatts per rack. Now you're consistently in liquid cool, that 120 to 150, and you have to be ready. So based off of the infrastructure that Kent's putting in from a vertical integration of how much power can he secure, It's really important from a Lenovo perspective is how do we design around the top line constraints that our customers have to ensure that if we're doing our own products or if we're doing NVL reference designs, how do we make sure that those are the most optimized from a power performance perspective? How do we make sure that we're doing the right liquid cooling techniques to get ready for our own builds and designs in that space to take advantage of making sure that Iron can maximize their power to compute, right? Because for us, it's how do we ensure we have that tight relationship with partners like NVIDIA, understand their roadmap inside and out, get early access to systems so that we can actually test and qualify before we actually go deploy within these large data centers in the case of IRINN. Because that's an important part. The last thing that they want is Racks of servers showing up that are not going to be optimized not going to be running efficiently or not working from a technical qualifications right makes sense so as you're driving Ken towards peak performance right you're trying to get the most out of every system you know one is.

Daniel Newman:

Kind of temporarily achieving high performance or peak performance is one thing you're building these environments for very demanding customers that expect everything to be running at peak all the time. Is there any kind of as we're seeing the technology of also fast everything he just said like what it goes into the thinking around the engineering to accomplish that because it's it's not trivial.

Kent Draper:

No, it's not not trivial at all. And these systems are far more complex than traditional cloud computing systems, particularly as it relates to the back end networking. So running InfiniBand or even Rocky, which we're starting to see more and more, is incredibly complex. And so from our perspective, it's very important to have highly experienced partners like Lenovo that can help us through the front-end engineering of these systems, but as well making sure that when they get deployed, they're getting deployed in the right way, so they start up immediately with very low failure rates, and then partners that are able to support the ongoing operations of these systems. So certainly that's first and foremost in our mind when we're thinking about OEM partners is quality of systems and quality of support.

Patrick Moorhead:

So this AI Gigafactories, it sounds like a marketing term, okay? But there's a lot of meat behind it. It's basically a blueprint of how people can light this capability up. And Vlad, it really seems to me That it's changing the competitive dynamics between the hyperscalers, the NeoCloud, sovereign AI, and on-prem. Can you comment on that from your vantage point? I mean, obviously, you serve everybody, right? So how are you looking at this right now?

Vlad Rozanovich:

Yeah, it's interesting. I think in this early stages of frontier models, large language models, people are looking at hyperscalers. and neoclouds to support the overflow that was happening there. But now we're starting to see this transition into looking at how are enterprises actually taking advantage of AI? Are they looking at it for inference? Are they looking at it to do their own extensions of language models? How are they creating their enterprise stack? And what we know is that different customers are going to have different requirements. Sovereign cloud right now today is a big topic it's how do I keep my data secure within the confines of a particular government but things like latency and privacy are also so important as part of that so ensuring that we're working with customers like iron to say you know what is the stack that you're providing for. How are you looking to actually deploy these GPU compute servers? And what does the storage infrastructure look like on the back end? There's a big topic now about what about the persistency of some of the data that you're running in some of these models? Well, trying to store them on memory within the GPUs is not an efficient way to do it. And so now you're starting to see architecture where The storage capabilities and the CPU compute capabilities are really important as more enterprises are starting to access these large systems of GPU because you have to keep that persistency of information so that you can continue whatever you're doing within your enterprise.

Patrick Moorhead:

Well, it makes sense. And what we're tracking, too, is how do the Neo clouds provide stickiness to those enterprises in the future. They're re-looking at, you had mentioned, storage, also looking at basic stuff like Slurm, and having a Kubernetes capability that the hyperscalers don't use. They bring their own. So that's how your company is, if you're not sitting here, into a long-term entity. But it's funny, though. I don't see the need for capacity going away any time later. I know people are confused right now. And I think Jensen might get up next year and talk about the one point five trillion dollar number that's out there. I mean these new models are absolutely incredible. And the swarm of agents and I think people just have to use it to believe it. And then they're like oh my gosh you're right. Downstream is not going to be an issue like we need more compute. We're going to hear for a long time.

Daniel Newman:

Every forecast has been wrong and they've all been wrong to the to the underestimating. People struggle to see the future when they can't fully grasp it. And then they tend to underestimate its impact. It's a lot of that kind of overestimate one year and underestimating five. And I think we're doing that in some sort of exponential way. So guys, as a kind of a way to close this thing out, like what is the Gigafactory enabling really that wasn't possible? I mean, we're seeing, you know, I jokingly say the Blackwell era unleashed a bunch of things recently. Now we're entering the Rubin era, sorry, yeah, the Rubin and Fineman. But like we're seeing, like what are these Gigafactories unleashing?

Vlad Rozanovich:

Well I think one of the things that we're seeing from a Lenovo perspective is we're finally delivering value to consumers and enterprises and that's what it's unleashing data analytics business proposals understanding where you know it's been it's been so talked about if you're not on the AI curve. you may not be around as a corporation five years from now. And so from our perspective, those are some of the things that are being unleashed. And for me, you know, when I look at, you know, when we first engaged with Iron, seeing the rapid growth or the exponential growth that they have not only deployed, but actually have consumption on, it actually proves that organizations and corporations are actually taking advantage of this.

Kent Draper:

Yeah, that's absolutely right. And I think from our side, the enterprise adoption is the real proof point. Historically, the biggest users of compute were the people training LLMs. Now you're actually seeing a lot of the demand in the industry being driven by real world use cases. agents, internal enterprise models. And I think that is just a vastly different position from where we were even a year ago and really proves that this is not a bubble. This is going to be ongoing demand that is driven out into the future. And at our end, we are seeing just huge demand still. And for us, it's really just how quickly can you build And I think that is the benefit of our vertically integrated model. We can scale rapidly. We can provide really good customer service. We do only one thing, which is power-dense computing. We're not setting up data centers that have 20 different use cases in them, 1,000 different customers. And so I think it is a very differentiated offer for people who are very focused on AI computing.

Patrick Moorhead:

I love that you said that it's not a bubble. I've been saying that. I mean this guy has been a half a social media feed is we're not in a bubble.

Daniel Newman: All the example where there's a bubble bubble bubble there. But I said you know you you can't have a bubble and have a eating every industry. You're too good.

Vlad Rozanovich:

It's not good. It makes sense.

Daniel Newman:

I think I think in the last three months the bubble talk has stopped because people are seeing that utility that you think I talked about. So. Gentlemen, thank you so much for being part of this. Sam, thanks. Pat, good to see you. Good to see you guys. And thank you, everybody, for being part of this Six Five. We are on the road here at GTC 2026. Is that right? It is. It is. It's going fast. I appreciate you being part of our community. Hit subscribe. Join us for all of our coverage here at GTC and, of course, all of the coverage here on The Six Five. Got to go for now. See you all later.

‍

CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks

Alex Rose from Secureworks joins Shira Rubinoff on the Cybersphere to share his insights on the critical role of threat intelligence in modern cybersecurity efforts, underscoring the importance of proactive, intelligence-driven defense mechanisms.

HP Launches World’s First Business PCs to Protect Against Quantum Hacks - The Six Five On the Road

On this episode of the Six Five - On the Road, hosts Patrick Moorhead and Daniel Newman are joined by HP's Ian Pratt, Global Head of Security for Personal Systems.

What is Autonomous Endpoint Management?

Autonomous Endpoint Management is a framework designed to unify IT operations and security teams on a single platform through real-time control and visibility.

QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella

Quantum is no longer a technology of the future; the quantum opportunity is here now. During this keynote conversation, Infleqtion CEO, Matt Kinsella will explore the latest quantum developments and how organizations can best leverage quantum to their advantage.

Accelerating Breakthrough Quantum Applications with Neutral Atoms

Our planet needs major breakthroughs for a more sustainable future and quantum computing promises to provide a path to new solutions in a variety of industry segments. This talk will explore what it takes for quantum computers to be able to solve these significant computational challenges, and will show that the timeline to addressing valuable applications may be sooner than previously thought.

AI Gigafactories: From Design to First Token at Scale

MORE VIDEOS

IBM's Channel Chief on AI Maturity, Ecosystem Strategy, and Building Kareem.ai

Transforming Retail SMBs with Practical AI, Security, and Scale

The Most Consequential Week in AI Infrastructure History | Ep. 303

CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks

HP Launches World’s First Business PCs to Protect Against Quantum Hacks - The Six Five On the Road

What is Autonomous Endpoint Management?

QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella

Accelerating Breakthrough Quantum Applications with Neutral Atoms