Home

Rethinking AI Infrastructure: Why Memory Now Drives Performance

Rethinking AI Infrastructure: Why Memory Now Drives Performance

As AI workloads scale, memory is emerging as the defining constraint on performance. This conversation explores how innovations in HBM and system integration are reshaping the future of AI infrastructure.

AI infrastructure is shifting fast, and compute is no longer the only performance driver.

At NVIDIA GTC, hosts Patrick Moorhead and Daniel Newman sit down with Paul Cho, President of Samsung Semiconductors, and VP Indong Kim, to explore how memory, system design, and packaging are redefining AI performance.

The conversation is focused on how AI architectures are evolving beyond GPU-centric models toward more specialized compute environments, where memory bandwidth, latency, and integration are becoming critical constraints. As organizations scale AI across training and inference, innovations like HBM4, HBM4E, and next-generation HBM roadmaps, and tighter system-level integration are reshaping how performance is achieved. Samsung’s perspective highlights how memory is moving from a supporting role to a central pillar of AI infrastructure design.

Key Takeaways

🔹 AI infrastructure is shifting toward diverse, workload-specific architectures
🔹 Memory bandwidth and latency are emerging as primary performance bottlenecks
🔹 HBM4 and next-generation HBM innovations are enabling next-gen AI scalability

🔹 System-level integration across memory, logic, and packaging is becoming critical
🔹 Collaboration across the ecosystem is accelerating AI infrastructure evolution

As AI systems scale, performance will be defined not just by compute, but by how effectively memory and system architecture are designed together.

Subscribe to our Youtube channel for more insights from NVIDIA GTC 2026.

Listen to the audio here:

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript

Paul Cho:
That's how we interact, set the high goal, we exceed, and along the line, you know, we have many, many engineering discussions going, solving all the possible problems ahead of time, and finally help NVIDIA introduce the world's best products on time.

Patrick Moorhead: 

The Six Five is on the row here at NVIDIA GTC 2026 in San Jose. We are in the Samsung booth and we are talking memory. Daniel, it's like every six months some new thing comes up that there's a shortage of to be able to drive all this AI. A trillion dollars of AI that Jensen talked about yesterday.

Daniel Newman: 

Trillion in revenue, by the way. I think a lot of people have messed that up because he talked about order bookings and then he's talking about revenue. And by the way, every forecast that's come out from every analyst besides me, maybe you, has been too low. So some of us said a trillion last year and some of us got lectured by the pundits across social media. But you know what, Pat? I love when a prediction comes true. But it's exciting time. And of course, those constraints compute first and the GPUs. But now memory is having its moment.

Patrick Moorhead: 

I know, isn't that great? And, you know, Samsung's crushing it. I mean, whether it's HBM, whether it's DRAM, all these announcements, handshakes, visits out to Korea, it's really been great. And to break this down, I want to introduce Paul, who's been on the show many times. Great to see you again. 

Paul Cho: 

Thank you for having me. 

Patrick Moorhead: 

Indong, we have met in person before, but welcome to The Six Five.

In Dong Kim:

Thank you. Thanks for having me. 

Daniel Newman: 

So, Paul, let's start with you. I know we're going to spend a lot of time on memory, but let's also talk about kind of what's happening here. Jensen was on stage. You know, he clearly showed that heterogeneous computing is the future, doing their own sort of not custom, but high-scale inference chip with the Grok deal, next generations of GPU. We know that Vera and Grace are having moments. I'm just kind of curious, like, what's standing out to you in terms of this sort of, this iterative innovation pivot to a more heterogeneous computing world, and where do you think AI infrastructure's heading?

Paul Cho: 

Well, Dan, thank you for saying all those. You're like Jensen. You talk about so many things.

Patrick Moorhead: 

Yes. Eight questions. Daniel is known for the eight-question… 

Daniel Newman: 

Eight-part questions. 

Patrick Moorhead:

Eight-part questions. Sorry.

Paul Cho: 

Well, I'll just focus on one thing now. So that's more AI. Every day, on a daily basis, new applications are being added. Now, look at the enterprise environments where all these CEOs want to increase the productivity. And they are so open now to adopt new applications. You know, Jason was saying every CEO will need to have access to AI factory. They'll have to worry about the number of tokens made available to their employees. And they'll also have to worry about the cost of those tokens. New models are added, again, every day. And the infrastructure needs to be scaled. That's what I call more AI. We need to have more AI to be able to sustain this growth. More AI and more insights.

Patrick Moorhead: 

There we go. Speaking of more insights, four years ago, people would ask me, hey, How do you think this compute game is going to end up? And I very clearly said, I believe that we will have heterogeneous compute. And they said, Patrick, you're crazy. It's going to be only GPUs. But here we are today. We have multiple variants of GPUs, higher power, lower power optimized. We have a bevy of XPUs from the hyperscalers and more, and then CPUs are cool again. Imagine that for reinforcement learning and orchestration. How are you reading this shift to heterogeneous computing?

Paul Cho: 

Right, so again, if you come back to, if you allow me, more AI.

Patrick Moorhead: 

Please.

Paul Cho: 

Right? So we need to scale our AI infrastructure to be able to serve more people, more applications, more models. But there's increasing realization that to do so, actually we need to diversify the ideas to back up those infrastructure, how those infrastructures are built. And that's where this heterogeneous computing comes in. I would call that a better AI. Right. Better AI infrastructure, Composed of different things. GPU, very high performance gpu With very high performance memory like samsung hbm. There are other architecture slides. Group 3 lpu was announced big time at this gtc. The idea is that they want to run your model in the shortest A period of time period so that allows you to run certain important applications like no other alternatives. So that's a great idea. It's an addition to what Jensen is calling this different inference applications where it's like. A fast taxi that you can get on versus nice bus where hundreds of people can fit in. you know, taxi really fast. In Korea, we used to have this bullet taxi. I don't know if you've heard of bullet taxi.

Patrick Moorhead: 

I haven't, and I've been to Suwon a couple times. Right, right, right. But maybe I didn't ride in that.

Paul Cho: 

You know, the driver will get you to where you want to be in the shortest amount of time. OK. And you have to pay a lot, though. OK. And only you or one friend can fit. OK. It's a taxi. So you need taxi, you need bus. So it was great to see this diverse applications in the inference domain. And Justin did a fantastic job incorporating all those different ideas. And that's heterogeneous computing and better AI.

In Dong Kim: 

Yeah, I think we've seen diversification, more fine-tuned system architecture. you know, we're seeing in the AI, but we've been actually witnessing such evolution in other applications, such as like mobile. When it comes to big applications, there's a specific memory for that. Here, we're looking all the memory hierarchy are equally evolving, which is starting from SRAM, DRAM, HDM, and then all the way down to storage. Everything is actually a complement to each other. It's not really replacing anything. So we're in the very welcoming and excited about all the proliferation of different sorts of memory storage.

Patrick Moorhead: 

Listen, I think diversification is good for Samsung and good for the industry because we have to find the best solution. And given all the power that they were driving, and that's what this is all about, is how much power and how to get power and performance. And therefore, here we are at heterogeneous computing. I like to use the analogy of a golf club set, right? You don't bring your driver out to putt, okay? And you don't bring out, you know, vice versa. So now we're putting the right compute and the right memory subsystems and storage system to support those.

Daniel Newman: 

So let's talk a little bit about the relationship that exists between Samsung and NVIDIA. You have collaborations across several technologies, talking about a number of them here. I'd like to hear from both of you, but maybe you first, Paul. How do you see the partnerships reflecting in all the things we just talked about in terms of where AI is going?

Paul Cho: 

That's a great question. constantly have conversations with NVIDIA team and also with the CEO of NVIDIA. We always want to match or exceed the expectation of this AI compute led by NVIDIA. So that means we need to bring the best of Samsung into the products. So when it comes to HBM4, for example, we had to use the best available technology for the logic-based dive. In our case, we used Samsung Foundry's 4nano technology, which turns out to be great. It gives us edge in power, signal integrity, performance bandwidth, everything. And then we had to bring in D1C technology in the DRMC dies. And that's, again, leading-edge technology. And we put them together using our advanced packaging technology that makes this HBM4 so special and world-best product as we speak. And I told Jensen just two days ago, hey, Jensen, this is the world-best HBM4 product for you. This is the world-best base die. built with Samsung for nanofoundry technology for you. And this is a D1C world best DRAM die for you. And putting all this together based on their requirements and expectation, I think we exceeded their expectation this time. And that's how we interact, set the high goal, we exceed. And along the line, we have many, many engineering discussions going, solving all the possible problems ahead of time, and finally help NVIDIA introduced the world's best products on time.

Patrick Moorhead: 

Yeah, Jensen talked a lot about the value of integration, his five-layer cake. And you do the same. Other people outsource their base dyes, as an example. And it seems to be tough in the future for folks who do that, just from my point of view. But what do I know? I'm just a pundit. That's a great point.

In Dong Kim: 

I mean, exactly. I mean, having our own base dye in-house, it's extremely beneficial and valuable. We clearly started to see, you know, a lot of problems that we didn't even think about as we introduced this type of a new, different collaboration model. And we're feeling super lucky that we're getting maximum benefit out of it. Especially because, as you may have seen some of the Jensen's announcement, The HBM roadmap and the bar is going ever high, which means the technical engagement, the complexity of problems that is getting more challenging. And we're in a very good position to be able to have a quicker resolution, faster reaction, to make sure that we keep the TAT, which is very important when it comes to this AI.

Patrick Moorhead: 

So let's drill down into H.B.M. a little bit more. So the market was really fixated on how many flops how many tops. But what a lot of people missed was the bandwidth story and bandwidth is is key here in Don. Can you kind of educate people on HBM4, HBM4e, and some of the problems you're solving, and what's the net result here?

In Dong Kim: 

So from HBM3e to HBM4, the significant improvement is total number of IOs, which is double. So HBM3 Echo has a thousand IOs. Now HBM4 is having 2,000 I.O. So that naturally gives twice the bandwidth. In addition, the per pin I.O. from 3 to 4 also significantly increased. So we're looking at over 3 terabytes per second of memory bandwidth from just single T. From 4 to 4E, it's more revolutionary path, which is the density's main key feature, which goes from 24 gigabit to 32 gigabit. More memory. It's a great thing. Because the motor sizes keep growing, and we have to find a way to be able to hold everything into the HBM as much as we can. And then the pin speed is going also from 13 to 16, so overall the performance increase is over 40%. And there were also eventually looking at customization of the which Jason also showed some notion with this fine man slide that he mentioned this customization path which we are also more excited about.

Patrick Moorhead: 

Yes, that's how you and I met talking custom agent is correct.

Daniel Newman: 

Yes, so I'd love to hear from both you on this did not maybe I'll start with you know I'll start with you but like. We've seen basically years of innovation happen in months, maybe weeks. Pat and I talk about this a lot. We went from six months ago, people saying AI maybe doesn't have a lot of utility, and this whole thing's a bubble. You probably heard some of that talk. And then we saw Claude launched Opus 4.6, and Codex 5.3, and GBT 5.4, and Perplexity Computer, and OpenClaw, all happened. And now everybody's saying maybe AI's too good. It's so good that it's gonna disintermediate every industry. And now we don't even know if anyone's going to need to work, because we're just going to have AIs running our business. So what does this sort of mean, though, in terms of how does memory, how does the system integration, how does it kind of continue to need to advance and evolve? Because this is Blackwell era. We're talking Vera, and he's up on stage talking about Feynman. And what I'm saying is, like, what we're seeing now is the slowest innovations ever going to be.

In Dong Kim: 

The great thing about this seems to be a little bit complex architecture. What we are seeing as a huge advantage is, as you pointed out before, the diversification. All the different solutions need to work together, need to be integrated together. For example, we're not just talking about the HBM here. What we're showing here is LTE5 based SOCAM, which traditionally LTE5 only means for the mobile application. But now, with the significant advantage of power saving, the LP is finding way to the high-end server application, which is totally different approach compared to the previous days. And obviously, there is a storage portion. As the AI goes smarter, seems like the software, all the people start to decide, let's try to memorize everything so that we find better fits for a specific person or more specialized application, et cetera, which requires tremendous amount of storage. So, from our perspective, it's no longer just about the HPM. HPM is, of course, the most important still piece of the AI when it comes to memory band. However, you pointed out CPU importance. which falls down to the DDR5, the high-capacity DDR5 especially, which we have a leadership position from our 32-gigabit, and I mentioned LP SO-CAM, and then all the way down to the storage, which we've been also leading the industry. So it's really exciting to be able to see all the seamless integration of all different layers of memory and storage going together.

Patrick Moorhead: 

Yeah 5 years ago if somebody was said there's going to be a CPU server on the NVIDIA stage just for AI people have said you're crazy it's all about the GPU now we've got the CPU server and now we have the basically the rock server right using the LPU and disaggregation has occurred and memory needs to change and. And by the way, it's not that they create the chip, and then you go create the memory. You're doing all of this in parallel, actually before you even know the compute that you're going to be attaching to. And that, to me, not enough people are talking about.

Paul Cho: 

Well, coming to your question, I don't worry about the… Shortage here. The AIs are growing. And more and more users are being added every day. So it will go on. I'm very pleased to tell you that my wife is a big user of AI. So he calls me. Hi, Paul. I asked this question to Sam and I on chat GPT. I got this answer. What do you think? So I have more work to do, actually. You need to spawn an agent to answer her question.

Patrick Moorhead: 

Our wives might be talking.

Paul Cho: 

You add agents, and more and more compute is needed. More and more memory is needed. More and more bandwidth is needed. The thing is, to enable this more AI, we need better AI. The thing is, as Indung was pointing out, we need a much better way to integrate memory with the logic. That's why at Samsung, we are looking into This direction called the HBM you want to integrate HBM in a 3D way to enable better integration between compute and memory. That's the best way moving forward in the cost effectiveness of AI infrastructure and also we look at the storage bits. Don't forget about the flash bits. They are so important. The key value cache is growing so fast. You need DRAM and flash to capture those things. So all in all, my objective and Samsung's objective is to remain the most preferred partner to enable the best AI infrastructure possible using Samsung's broad semiconductor technologies, including memory, logic foundry, flash. and design capabilities as well.

Daniel Newman: 

And it's really great, and I'll wrap it up here, but you're in an industry where margins are expanding, demand is expanding, revenue is expanding. With each generation of new AI, the need for more memory continues to expand. And maybe for the first time in all, you've been around a long time, all these boom and bust cycles, there is no bust. There's a very real probability, and I've said it here, and I've said it numerous times, this may be the first time in history that we have a memory cycle that does not have that tough dip which is great for you and what you're doing. Congratulations, Paul and Dom. Thank you so much for being with us.

In Dong Kim: 

Pleasure.

Daniel Newman: 

Thank you, everybody, for being part of this Six Five On The Road. We are here at GTC 2026 in San Jose. You can see this place is absolutely packed. It is jammed. Big moment for, of course, for NVIDIA, but a big moment for memory as well. Stick with us for all our coverage. Be part of our community. We appreciate you. Bye for now.

MORE VIDEOS

AI Inferencing Everywhere: Scaling Enterprise AI from Core to Edge

As AI moves into production, enterprises must solve for distributed execution across core and edge environments. This conversation explores how infrastructure is evolving to support scalable, real-time AI inferencing.

The Inference Inflection: MiTAC on Building Flexible AI Infrastructure for Enterprise Scale

As AI moves into production, infrastructure flexibility, orchestration, and data performance are becoming critical. MiTAC outlines how modular platforms and integrated partnerships are enabling scalable, high-performance AI deployments.

The Rise of the AI-Native Phone: From Assistants to Action - Six Five In The Booth

Div Garg, Founder & CEO of AGI, joins Nick Patience at MWC 2026 to discuss the rise of the AI-native phone, the shift from assistants to agents, and how on-device AI will reshape trust, privacy, and mobile business models.

See more

Other Categories

CYBERSECURITY

QUANTUM