Powering the AI Engine: AMD Instinct, ROCm and Real-World Wins

Home

Powering the AI Engine: AMD Instinct, ROCm and Real-World Wins - Six Five On The Road

Andrew Dieckmann and Anush Elangovan from AMD join Patrick Moorhead and Daniel Newman to share their insights on enhancing AI development through open source, rapidly evolving hardware, and AMD's strategic moves in the AI infrastructure space.

‍

How are key players standing out in the AI arms race? 🚀

From open-source advantages to lightning-fast product rollouts and full-stack innovation, AMD is making bold moves for AI infrastructure.

‍

At AMD Advancing AI 2025, hosts Daniel Newman and Patrick Moorhead are joined by AMD's Andrew Dieckmann and Anush Elangovan. Andrew serves as CVP & GM of the Data Center GPU Business Unit, and Anush as VP of AI Software, engaging in a thought-provoking discussion on AMD's business updates and leadership in AI, focusing on Instinct + ROCm + customer progress.

‍

Key takeaways include:

‍

🔹Open Source as AMD's Strategic Edge: Discover how ROCm is fundamentally differentiating AMD, fostering an open ecosystem that champions customer flexibility and liberates them from vendor lock-in.

‍

🔹Seizing the AI Market Tidal Wave: Explore how AMD is uniquely positioned to capitalize on the explosive growth within the AI Total Addressable Market (TAM), fueled by soaring demand for inference and an expanding Instinct customer base.

‍

🔹Unleashing ROCm's Latest Innovations: Get an insider's look at the exciting advancements in ROCm, designed to match the blistering pace of AI software enhancement. This includes the expansion of AMD's Developer Cloud and a commitment to making cutting-edge AI tools more accessible than ever.

‍

🔹Accelerated Product Velocity: The MI325X and MI355X Impact: Learn about AMD's rapid-fire product development strategy, highlighted by the swift launch of the MI325X and MI355X GPUs, ensuring AMD remains at the forefront of the dynamic AI landscape.

‍

🔹The Strategic Vision Behind the Helios AI Rack: Gain critical insights into the development and profound strategic significance of the Helios AI rack, powered by the forthcoming MI400 series. This is AMD's bold move to establish itself as a dominant force in full-stack AI infrastructure.

‍

Learn more at AMD.

‍

Watch the full video at Six Five Media, and be sure to subscribe to our YouTube channel, so you never miss an episode.

‍

Or listen to the audio here:

‍

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

‍

Transcript

Patrick Moorhead: The Six Five is On The Road here in San Jose at AMD's Advancing AI event. Daniel, it's been quite the event. I mean the walk ons from XAI, Sam Altman from OpenAI and I heard people using the word training and instinct in the same sentence.

‍

Daniel Newman: Well, we always knew it was going to get to that point. But we also know that we're at this massive inflection right now where yes, we are still talking pre-training and the training era, but we've also entered this exciting inference era and this is going to just create an explosive TAM. It's going to grow this AI space so much and it really opens the door to a whole new share shift, market share change. Pat. And it's a really good moment if you're AMD and you're entering this space right here, right now.

‍

Patrick Moorhead: Yeah, it really is. So hey, let's dive in here. We have Andrew and Anush from AMD. Guys, welcome to Six Five for the first time. This is going to be great.

‍

Andrew Dieckmann: Thank you. Happy to be here.

‍

Anush Elangovan: Yep.

‍

Daniel Newman: Yeah, we can cover off on a few different things. You heard us in the sort of preamble talking about just, you know, the walk ons and what's going on and inference and training. But another big thing that we definitely glommed on today was about software. You know, you've been very focused and Anush, I'll start off with you, but you've been very focused on making open source a big part of your software approach. Why is that so important? Because I heard Lisa say at least three or four times, she reiterated that, you know, that we're open and it's not a closed garden, walled garden. I mean why is that so important in shaping, you know, adoption of instinct in Rockham and potentially really growing that particular market for you?

‍

Anush Elangovan: Yeah, that's a good question. The way we look at it is the rate of pace of innovation and AI is only going to accelerate and to be able to catch up with that, you need the entire ecosystem to be with you. And having an open platform allows you to bring in others and lift all boats. So you can bring in the best of the networking, best of the CPUs, best GPUs, but then layer it with software that again brings in the best serving infrastructure, VLLM, SGLANG, LLM, these are all like very fast moving innovations that are happening in real time compared to competitors like proprietary stacks that the flagship product right now is lacking FP8 support, which is mind blowing. But all of the open source ones are already there. So it's about speed.

‍

Patrick Moorhead: Yeah, I appreciated Vamsi kind of going through that. And if I look historically at in the long run what has driven growth, I mean, let's look at Unix versus Linux. I think that's probably the best example. Containers are probably another one out there, but pretty much most everything is standard space. There are some examples of where proprietary goes quicker and gets big, but in the long run it definitely equalizes. So Andrew, I want to turn to you. Seven or eight years ago we saw the transition from training to machine learning. It was an 80/20 to 20/80 and naturally we're seeing this transition as well from training inference. Not that we're not training, we're still training, but a lot of this is going into the actual use of these models with applications. How is AMD positioning itself to win in this massively growing and I would argue what will be the largest opportunity for AI compute?

‍

Andrew Dieckmann: Yeah, you mentioned you heard the word training and instinct a few times in the same sentence. That continues to be an incredibly important segment of the market, but even more fast growing is the inferencing segment. And we're seeing that inflection point happen now essentially this year where people make money on these models by inferencing. Right. And so that's been a key focus. Most of the deployments that we did last year were really focused around inferencing deployments. This year with several large customers, we're doing significant training deployments to augment that. So we're participating in both segments of the market. Both segments of the market are growing rapidly, inferencing even more so off of a smaller base. And we see that actually being the larger segment of the market as we go forward.

‍

Patrick Moorhead: Yeah, just a follow up to that. I heard one of your customers, I think it was Microsoft, talk about the flexibility that they like. If we can do training and inference with the same piece of silicon. But it's one of those debates on whether you need to do inference on the same chip that you did training. The answer is no. Right. But there are some built in advantages. The flexibility if you wanted to move the workload around in your data center. Is that true here?

‍

Andrew Dieckmann: That is absolutely true. Flexibility is a key attribute, a key buying attribute from our customers. So flexibility of infrastructure, the programmability of the GPU has to this point carried the day in terms of the workloads changing so rapidly. So flexibility of the actual compute engine as well as the overall infrastructure is important. And we see that trend continuing.

‍

Patrick Moorhead: Makes sense.

‍

Daniel Newman: And it's definitely good that you have the flexibility. And Anush, I'm going to ask you something about RocM in a moment, but it's also, I think, a significant door opener and I think I would be missing a big opportunity to clarify some things to the market if I didn't say that it's been a moat for your biggest competitor in terms of that software stack. And I know a lot of work's being done with RocM, but inference sort of changes the whole calculus because where you can get in on inference, where there's been a little bit more developer lock in, there's not the same software requirements. They can go all in with AMD and it's like a new opportunity to come in and compete. So you have now an improved platform and we'll talk about RocM here, but you also have the opportunity to go into these and not only are you building, but these chips are really built for inference. The memory, the throughput, the tokens per dollar. You showed that 40% number. I mean, there's some really compelling data there that says, yes, you can do training on this thing, but if you're also doing training and inference, this thing could be better.

‍

Andrew Dieckmann: Absolutely. I mean, we see for a lot of our major customers, we get started on inferencing, they get to know our software stack, they get to know our platform, and then we expand to other use cases. And Meta was a perfect example. The initial use cases there were all inferencing. Now they're doing recommendation training and other training workloads. And so as the familiarity comes and comes together and builds, then we do more together.

‍

Daniel Newman: We know the land and expand. It's how cloud was built and I'm sure that's how the agentic era of an ecosystem will be built initially. So let's get back to the software innovation though. You know, RocM has like you actually just gave a perfect analogy. It's kind of come through like we're going to do a little bit more, we're going to do a little bit more. And now we're seeing companies really starting to go more all in. What are the innovations that you think are driving that and what are you most excited about that are going to really make RocM more and more competitive for that developer ecosystem?

‍

Anush Elangovan: Yeah, I think it goes back to the original question of an open ecosystem. Today, RocM is fully open source and we've gotten to a point where external developers are contributing critical pieces of enablement. For example, RocM is really good on instinct but we had developers come in and contribute to run on the Strix Halo and the client platform on Windows. Now RocM runs on Windows. So this gives a pervasive software story for a college student to pick up their Ryzen laptop program on Pytorch and take that same program and now deploy at scale on instinct. So we are starting to layer the pervasive software story on top of the pervasive hardware story. And that brings in a good flow of users, developers up into the core of our data center business and RocM itself. I think the cool part about it is it's like what do we like about the Linux kernel? It's open and you can go write a new file system BTRFS, because hey, I think we need a new file system. So RocM is kind of getting to that same level where if you have to do training and we need some new framework that allows for checkpointing and restartability and reliability, we can do it. But more importantly, anyone in the ecosystem can do it. So of the 350 systems that we sent out to the Stanford Labs and Berkeley, et cetera, there could be a research project that just says, you know what, I'm going to do the best reliability for training. All I have to do is take this. And now they have the hardware, we have the developer cloud that we just launched, the software is open source. We don't need to be in the loop between innovation and unlocking the value.

‍

Patrick Moorhead: Yeah, RocM has been on a rocket ship. I mean, I remember years ago when RocM was an HPC play, right? And some people think, oh, HPC must mean AI. Well, that's not the case. New feature, I mean it's completely different. Hats off to you the rate of code commits as well. Do you do a weekly update?

‍

Anush Elangovan: Correct? Yes, we actually release nightly.

‍

Patrick Moorhead: Oh, okay.

‍

Anush Elangovan: But we test and validate the entire software suite and so we commit to a bi weekly release. So every two weeks you get a release of RocM with the latest inference, latest dockers. The two weeks are just for us to like to bake it a little bit. But in reality we are building every night and we are on the path to get every commit to be green. Right? So the ethos of this comes from like when I was working in Chrome and Chrome OS, you have to be shippable every day at any time you should be able to ship code. And bringing that gives us immense acceleration of like software capabilities. And if something's not ready, it just is backed out. Right? So that means you're always ready to ship. And that's a software mindset. And so AMD traditionally has built very good hardware. Now we have very good hardware with software being released. Like a software company.

‍

Patrick Moorhead: No, that's great. I became a believer last year when Meta got up on stage and you know, possibly the hardest grader out there, we can debate that, but I was wondering, what will they actually say? And they said, big improvements, we're working hard together. Right. But even that they would say anything nice meant a lot to me. And then they got up on stage today and had better things to say about that. And that again made me a believer. And the reason to believe is people like you involved nightly educated me on that. I thought it was like weekly and just seeing the performance figures go, the velocity is going up as well.

‍

Daniel Newman: But in the AI era, Meta really is sort of one of the arbiters of what is really open. They've really committed to that particular path.

‍

Anush Elangovan: That's right.

‍

Patrick Moorhead: Python Pytorch.

‍

Anush Elangovan: Yep, yep. And to that point on, you know, when they came out and said 405B, their Halo model that they just released was exclusively being served on the Mi 300.

‍

Daniel Newman: Right?

‍

Anush Elangovan: Yeah, that's kind of an inflection point of like, okay, that's ready to serve the best model at the time on AMD's hardware.

‍

Patrick Moorhead: So talk about velocity, right? Hardware is on a velocity curve too. Right. It used to be cool to, hey, let's bring in a new product every two years, okay? And then it was, okay, now we're going to commit every year, but here we have 325x, 355x very close together here. Talk to me about why you're doing this and a little bit about the journey.

‍

Andrew Dieckmann: Well, the pace of innovation is relentless, Pat. Right. We actually pulled in our schedule for 355 by probably close to two months from when our initial target was the team executed fantastically and we're bringing to market, we think, something very competitive for the second half of this year to serve our customers, both inferencing and training. Very competitive numbers versus the state of the art alternatives in the market. And our message around great products is very compelling. TCO is resonating. So that's what we're all about. And we have the pedal down to the metal, so to speak, and we'll continue on that very accelerated release cadence.

‍

Daniel Newman: So, Andrew, I was posting feverishly on X throughout the keynote and one of the things that I started flashing 355 and flashing 400 is I had some people banging on me. Well, they don't do rack systems. So are we even really talking A and B? And then voila, you know, the rack systems Helios shows up, Big moment. I mean there's been a lot of deal making in the background. There was, you know, the ZT Systems acquisition. I mean talk a little bit about this pivot because this changes your whole position in the market. It gives you a lot more TAM, it gives you a lot more opportunity. It also maybe changes the ecosystem and the relationships kind of. What is the strategic importance of the rack system and how does it drive AMD going forward?

‍

Andrew Dieckmann: Yeah, so the rack system and the tight integration of all the key components, cpu, GPU networking, all being co-designed in an architecture which can be rapidly deployed to market. I mean that was the thinking behind the ZT Systems acquisition. We have 1,000 engineers that have joined the AMD family from that with great thermal and mechanical and rack scale design expertise that we can then give that help to our customers, to our OEM partners, our ODM, our hyperscale customers to bring the Helios system to market. And that scale up networking combined with the scale out networking that we've announced with our Polari nics, those systems are going to provide the best TCO to market and get those systems to market in a rapid fashion. They're complex. We're allowing innovation for our customers within this reference architecture that we're providing and it's really a key part of our strategy moving forward.

‍

Daniel Newman: Well, it's very, very. Let me just say those same people were, some were pleased, some were less pleased that you were so quick to be able to respond to them with your own solution. But it was in real time. I mean look, the world was watching. I mean this was one of those days. The world was watching. We hear numbers in the trillions. It's funny how we shrug off numbers in the trillions now because it's become such commonplace. But you know, trillions of tokens and trillions of dollars are at stake. And today was a day that I would say that AMD showed a number of different ambitions to get more of that market, more of that TAM. And Pat and I, we get to be the arbiters of who's good and who's not. That's being analysts. I think we're in a good position to say that AMD is making a lot of great progress. So Anush and Andrew, thank you so much for joining us here today.

‍

Andrew Dieckmann: Thank you, thank you, Appreciate it.

‍

Daniel Newman: Thanks guys. And thank you everybody for being here with us. We're the Six Five. We are On The Road at AMD's Advancing AI in San Jose, California. Check out all the other coverage from the event. We have a number of great conversations here and of course, subscribe and be part of our community. For this episode, though, we got to say goodbye. We'll see you all later.

‍

CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks

Alex Rose from Secureworks joins Shira Rubinoff on the Cybersphere to share his insights on the critical role of threat intelligence in modern cybersecurity efforts, underscoring the importance of proactive, intelligence-driven defense mechanisms.

HP Launches World’s First Business PCs to Protect Against Quantum Hacks - The Six Five On the Road

On this episode of the Six Five - On the Road, hosts Patrick Moorhead and Daniel Newman are joined by HP's Ian Pratt, Global Head of Security for Personal Systems.

What is Autonomous Endpoint Management?

Autonomous Endpoint Management is a framework designed to unify IT operations and security teams on a single platform through real-time control and visibility.

QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella

Quantum is no longer a technology of the future; the quantum opportunity is here now. During this keynote conversation, Infleqtion CEO, Matt Kinsella will explore the latest quantum developments and how organizations can best leverage quantum to their advantage.

Accelerating Breakthrough Quantum Applications with Neutral Atoms

Our planet needs major breakthroughs for a more sustainable future and quantum computing promises to provide a path to new solutions in a variety of industry segments. This talk will explore what it takes for quantum computers to be able to solve these significant computational challenges, and will show that the timeline to addressing valuable applications may be sooner than previously thought.

Powering the AI Engine: AMD Instinct, ROCm and Real-World Wins - Six Five On The Road

MORE VIDEOS

AI-Powered Security - Six Five On the Road

AI, Security, and Mission Momentum: Inside Google’s Public Sector Strategy