Inside Dell and Broadcom’s Next-Gen Networking: Tomahawk 6 and Enterprise SONiC Power the AI Fabric

Home

Inside Dell and Broadcom’s Next-Gen Networking: Tomahawk 6 and Enterprise SONiC Power the AI Fabric - From SC25

Jim Wynia, Director Product Management, Networking at Dell Technologies and Hemal Shah, Distinguished Engineer at Broadcom join host David Nicholson to discuss how Tomahawk 6 silicon, PowerSwitch Z9964, and Enterprise SONiC are advancing AI data center fabrics for scalability, cooling, and manageability.

How are innovations in silicon switch technology and open networking platforms transforming data center infrastructure for evolving AI workloads?

‍

From SC25, host David Nicholson, Global Technology Advisor at The Futurum Group, is joined by Dell Technologies' Jim Wynia, Director Product Management, Networking, and Broadcom's Hemal Shah, Distinguished Engineer and System/Software/Standards Architect, for a conversation on how Dell and Broadcom are redefining the modern AI fabric through the Tomahawk 6 chip, the PowerSwitch Z9964, and Enterprise SONiC open networking software. They highlight the rapid growth of AI and cloud workloads, innovations in cooling approaches, mainstreaming Enterprise SONiC, and delivering enhanced observability and multi-tenancy for scalable, resilient data center networks.

‍

Key Takeaways Include:

‍

🔹AI and cloud workloads are the driving force: New workloads are demanding changes in data center network design, pushing teams to reconsider both architecture and cooling strategies to keep pace.

🔹Direct-liquid cooling is gaining steam: While air cooling remains viable in certain cases, direct liquid cooling in data centers is becoming essential for dense AI network fabrics, offering thermal advantages for high-power switches.

🔹Enterprise SONiC goes mainstream: How this network switching software brings hyperscale-grade flexibility and openness to mainstream IT, with new operational features that lower the adoption barrier for traditional enterprises.

🔹New innovations in capabilities: Multi-tenancy, AI fabric enhancements, and rack-scale visibility via Dell’s OpenManage Networking platform empower IT teams to manage, monitor, and scale large AI networks more efficiently.

🔹Proactive architecture planning: Why investing in future-ready, modular network solutions is critical to ensure that infrastructure can meet evolving demands from next-generation AI workloads.

Learn more at Dell Technologies.

‍

Watch the full video at sixfivemedia.com, and be sure to subscribe to our YouTube channel, so you never miss an episode.

‍

Or listen to the audio here:

‍

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

‍

Transcript

David Nicholson: Welcome to Six Five On The Road at SC25, the supercomputing conference here in St. Louis, Missouri this year. I'm Dave Nicholson, and I am joined by two very special guests, Jim Wynia from Dell and Hemal Shah from Broadcom. Welcome, both of you gentlemen. Thank you.

Jim Wynia: Appreciate it.

David Nicholson: Let's talk about AI and networking. Does networking matter at all in AI? Don't we just need all GPUs all the time? I'm going to throw that to you. Is that a fair assessment? Is networking not necessary anymore?

Hemal Shah: Of course not. But what we are seeing, as you are saying, all we need is GPU. People talk about the compute, compute, compute, and it keeps increasing. But as you have more compute capacity, it's hard to pack that in one rack or one server. Your scale is going up. Once you start seeing that kind of scale, you absolutely need networking to take advantage of that compute. And all these computes are talking with the networking infrastructure they're connected to. more innovations we are seeing as now GPUs are having higher speed networking interfaces just because the compute has expanded so much. They're generating so much data and they're also exchanging a lot of data. And what we see to the opposite of whether networking matters, actually networking in AI is taking this notion of scale up, scale across, and scale out. And what that means is within the tightly coupled computing, you want to preserve the memory semantics. That's where you build the scale up networks and also scale across. What does that mean? It's probably fitting within a rack or a few racks. And they're tightly coupled GPU using that network. When you go to 100K XPUs, GPUs, you are now talking about scale out. Within the building, you're building a two-tier, three-tier network. But then you are geographically going across data centers, which are like kilometers apart. And you are building that kind of scale-out network. So back to your question, networking matters actually more now than what it used to be before. Later on I can, but I can show you here, this is why.

David Nicholson: We're going to make people wait. We're going to make people wait and just be curious about what it is.

Hemal Shah: We'll talk about it more.

David Nicholson: They'll be dying to hear about exactly what it is. Jim, there's a lot of discussion at this show about the constraints that we face as we move into the AI world. One of them is obviously power, which is very tightly coupled with this idea of heat that's generated from all of these things that we do. How do you cool it? How do you dissipate the heat? Is that an issue that affects networking?

Jim Wynia: And what are you doing about it? Absolutely. There's been an ongoing, almost age-old discussion about when do we get to the point where we have to be liquid cooled? We have to have direct liquid cooling. And with each generation, oh, it's going to be the next one. We're not going to do it. And then somehow, they find a way. We thought that with 400 gig and then 800 gig. And now we're coming up, we're on the cusp of 1.6. Liquid cooling has never been a more important part of the discussion. And the servers are transitioning to liquid cooling at the same time networking is coming along. And so you'll see with this current generation coming out right now with Tomahawk 6, You know like we are actually announcing at this show. Thank you for setting me up. Tomahawk 6 product we have both an air cooled version and a direct liquid cooled version because there's still an appetite for both depending on whether you're going for a rack scale solution where everything's just really tightly coupled in a rack or things are a little bit more spread out where air is still palatable. And so liquid cooling becomes super important as part of that discussion. It will not take over the world as of yet, but as the server's implementations do and the networking is co-located with it, it absolutely makes sense that those switch to DLC at the same time.

David Nicholson: Okay, Tomahawk 6 was just, the cat was let out of the bag. What is Tomahawk 6? Tell us what you have here and why it's important.

Hemal Shah: So first I got to show this. So this is the Tomahawk 6. This is showing you a single chip fabric which is capable of 102.4 terabits per second switching capacity.

David Nicholson: Sounds like a lot of terabits.

Hemal Shah: Sounds like a lot of terabit. And today's keynote, I heard that a few terabits of consumer bandwidth were supported by this infrastructure. Just imagine in data center, this only single chip is doing it. And you're building two-tier networks or three-tier networks with multiples of this. It has high-speed port connectivity, like 1.6 terabit port or 800 gig port. And this is where you connect with the AI-optimized NIC, like this Tor Ultra, which we just announced. This is an 800-gig AI NIC, which connects to one of the ports of Tomahawk.

David Nicholson: OK, so let's pause for a second for the sheet metal fans out there who want to understand. So these devices, first of all, the networking, the switching device. This is the switching device. So what does the form factor look like as delivered? How many of those are in an enclosure, and what does that enclosure look like?

Hemal Shah: So I think it depends on the vendors, but you can see a single chip built around one U or even smaller.

David Nicholson: OK, this would be in a Dell networking switch.

Hemal Shah: It will be a Dell networking switch and branded. other branded names, but they will be underneath using this. And then depending on whether you're using co-package optics or just DAC cables or just standard optics, you will be having all those ports coming out, the front or back.

David Nicholson: That's how you connect to the… And can you turn it around on the other side? The other side, yeah. So what I'm fascinated by, I know this might be just me being a nerd. I'm fascinated by the packaging. It's going to be really tough for anyone to actually see, but these are thousands of tiny, tiny, tiny, perfect little connectors in order to actually access the chip if you turn it back over. What the real thing is, is just the little shiny part in the middle, which is completely crazy. But it's like we have to take this thing that is almost, well, like a microscopic scale, and bring it out to human scale for the thing to work. Okay, so we have the switch, we have the nick.

Hemal Shah: And this is the end point, basically.

David Nicholson: So that would go in a network interface card? Correct. And running at what speed now? This is 800 gig. OK, 800 gig. All right.

Hemal Shah: And then, I think previously we were talking about scale up. So by the way, this switch that I was showing, you can use this as an ethernet scale up switch or scale out.

David Nicholson: OK.

Hemal Shah: But what we are seeing recently also for larger scale up domain, you want really low latency, high packet rate switch. So that's where we created this Tomahawk Ultra. Okay. Family. And that one is less than 250 nanosecond forwarding latency. So now that allows you to, what I was talking about, tightly coupled computing to scale up a bunch of XPUs connected together via ethernet fabric. So you have these choices. Again, it's all based on open standards. There are more innovations happening around like how do you, in the NIC, the transport, in the switches, the load balancing, The congestion control, all those things are needed for this large-scale AI network. They're all coming together.

David Nicholson: Now, Hemel and folks at Broadcom, at least parts of Broadcom, they love their shiny objects. They love the toys. We love their shiny objects, too. They love the stuff. And I do, too. I love stuff I can touch. I can touch. But there's a software element to this. Certainly. So talk to us about Sonic. and how that has sort of evolved into, if it has, something that is ready for the most robust data center deployment.

Jim Wynia: Oh, you hit it on the head. So Sonic has been around for, what, seven, eight years, brainchild of Microsoft. And I know you have more sessions where you get a really deep dive in it, so I won't go too deep. But the reality is that it has gone through a trial by fire already. It is ready. It's running in AI applications in large, fabric data centers as well as enterprise. And so Dell has been, we've been all committed to Sonic and moving the state of the industry forward on that. And so we had some announcements about this show already on that. And so some great things are happening there. And the important thing is that it is based on open source, right? And so the community is really behind Sonic and they're moving the state of the OS forward, and they're all contributing. So we take that then, and we harden it. OK, and we have a version that they don't have to worry about what's in there. We know what's in there. We fully support it. We do development on that. And it's great. the best application, I would say. And then you layer on top of that, SFM, Switch Fabric Manager. So now you can control beyond the box and schedule things like maintenance and things like that to make sure that it's all clean. But I want to come back to DLC real quick. Absolutely. We talked about DLC and why it's important. We don't really focus on the benefits so much. I mean, yeah, it's important because obviously it helps you cool. But it's more than just helping you cool. Because today, you spend so much power on, if you walk into a data center, you're going to go deaf, because the fans are spun up so high, and the pitch is just incredible. I mean, it's not uncommon to fail NEBS testing, because the acoustics are just so out of control. So with the ability to remove fans, or maybe if you still need fans, maybe you just turn them way down, because you don't need a lot of air. That is a huge benefit for reduction of power. The DLC brings you plus now. Instead of having a three or four U switch, you can put it into a two U switch because the cooling, the hot plates that you put right on top of the elements, it's right there. So you don't need the extra chamber. So space, you can put more in a rack and then you get better cooling and the power comes down. So as an example, Tomahawk switch, Tomahawk six switch, we would expect an error version of a Tomahawk 6 to be just south of 5,000 watts. Okay. A DLC called one just above 3,000 watts.

David Nicholson: Now, in terms of the heat that needs to be dissipated, measured in watts, or even?

Jim Wynia: Yes. So that's actual wattage used, but that translates directly to. It translates into dissipation.

David Nicholson: OK.

Jim Wynia: Yeah.

David Nicholson: But it's literally using less power because it's cooled. Absolutely. And in the cooled state, it's more efficient. The economics are dramatic. Interesting. Yeah, I went down that rabbit hole years ago by asking the question, how much heat needs to be dissipated from a 1,000-watt GPU? Because looking into the future, it was like, pretty much 1,000 watts. Because these things are perfect heat-generating devices.

Jim Wynia: They really are.

David Nicholson: You don't need to space heat if you've got one of these. The compute is sort of a byproduct. It is interesting. We have a long way to go in terms of efficiency, which is a good thing. When somebody mentions open managed networking, is that kind of open standards protocol or is that something specific to Dell, by the way?

Jim Wynia: So OpenManage Networking is Dell-specific, but it layers in all the different components, server, storage, networking, and allows us to have one place to be able to manage all of the specific, the lights-out capabilities that you need in order to do firmware upgrades and manage the iDRAC on the server to monitor things. And so it's a key element for a data center to be able to really have visibility into what's going on.

David Nicholson: I'm going to ask each of you this question. We'll start with you, Hemal. Let's say that I'm an infrastructure leader and I'm trying to plan for the future. Of course, I want to do the best thing for my organization as possible. I also want to keep my job. by not doing something dumb. Give me your advice in terms of the steps that I should be taking or the things that I should be taking into consideration when I think about building out a fabric design that can scale moving forward.

Hemal Shah: Yep, so I think one of the first things you want to do is what kind of compute capacity you are going after, right? First you want to decide that. For that kind of, what kind of networking you will need? Of course, Ethernet networking is what we do. say that that's the best suited for your network. But you should look at how you're going to split between your scale up and scale out domain. And then what you need to come up with most of the time, how you design the network, fabric design, how the topologies are designed. And also, what are the schemes, different features, like what congestion control capabilities you are enabling, what load balancing techniques you are enabling. Those things need to be well thought of beforehand, because that is end-to-end. And that's where the whole endpoint integration comes into picture. And going back to your software, which is very important for all this, after you have all this, how are you going to run things? So like OpenManage, what Jim mentioned, With all this silicon that you are getting, there's a lot of infrastructure software you will need, like basic drivers, libraries on switches. We support Sonic, but not only switch the abstraction interface. So somebody needs to think at the higher level, let's say they are enabling those features. all the way from the software stack, how it's going to work with the given fabric design that you have. And that's where some of these tools will also come into picture, and how the tools integrate into the underlying software. So you really need to think end-to-end.

David Nicholson: One of the leaders at Broadcom, Charlie Kawas, talks about some key features that you need to consider for AI moving forward. Open standards and scalability are two of those things. In your answer part, I know, but I want to hear from you, Dell offers a variety of ways to achieve something. But same question, when you're going in and someone is saying, hey, I think I need three of these now, but I might need 300 of them in the future. Help me future-proof my environment. What's your advice?

Jim Wynia: Great question. And I think you all nailed it in terms of going for a standards-based, open standards solution is really important, especially if you're thinking about growing into the future. As a seller, yeah, come back to me. I want you to buy from me, because I delivered. You know, if I didn't deliver and you want to go, you want to stick with that technology, you should have that option, in my opinion. Okay, so going with standards based like Ethernet, it's basically run, it's won the battle on networking, clearly. You know, so going that and not proprietary, I think there's a big, you know, story for why that's the right direction to go. And of course, with Sonic, it's the same thing. So, I think those are very important reasons why to stick with those solutions and be able to grow with them as you grow.

David Nicholson: Yeah, it's interesting. The word proprietary sometimes is used as sort of a derogatory term. You could think of proprietary as something worth paying money for, a differentiator that's worth paying money for. But to your point, if there is a Dell lock-in, it should be because they love what you're offering. Absolutely. So we're locked in because we're very satisfied. So if someone were to, if you're a distinguished engineer, As a distinguished engineer, if you were to follow this advice, would that prevent you from quickly becoming an extinguished engineer?

Hemal Shah: Great question. One of the things that I was going to add to what Jim said, when you build something on an open standard, the general misconception is you're not differentiating. You're not innovating. We have always done this. The core infrastructure is built on standards, but how you enable features, the solution, that's where the differentiation comes in. And that's what you want to give to your customer rather than just dealing with proprietary infrastructure where open standards are good enough or more than good enough. The distinguished part needs to be staying on top of innovation. Otherwise, it will get extinguished.

David Nicholson: No chance that you will be extinguished. None of us will by AI. I'm not worried about AI taking our jobs. Back to the headlines. Leave us with some final thoughts. We have Tomahawk 6 that is delivered by Dell in the form of Dell networking technology. We have Thor Ultra, 800 gig. I can't even say it with a straight face. It's so ridiculous. The speeds and capacities, everything that we're getting to right now, but wrap us up with what you would want the audience to leave with an understanding of coming out of SC25.

Jim Wynia: So I'm going to phrase that as, I like to think about it, what's my prediction for next year when we get together? Because last year when we got together, you asked that question. I said, next year, we're going to be talking about 1.6. OK. Ta-da! Next year, we're going to be talking about more 1.6, unprecedented volume and price points to a point where things that are just now turning on, they're just going to be flying. Data center capacities and such will be unlocked to a degree that we didn't even see coming.

David Nicholson: OK, and I'll get stock market advice off camera from you.

Hemal Shah: For me, my thought is this is just the beginning of AI networking. Whatever speed and scale we are at, it's going to keep scaling to higher and higher speeds, and then more and more nodes. And the key is here, the innovation of the feature set, you will see, that will keep driving the AI market requirements. And this is how you optimize the AI workloads with the networking. So I don't see, I mean, it's a very exciting time, as you can see, like I have never imagined that we'd be here and then where we are going.

David Nicholson: So here we are, amazed by where we are, but we are in the early stages of all of this. I think we'd all agree. Hemal, Jim, it's always great to see Broadcom and Dell working together on this stuff, because it really does make a difference for the folks, the customers at the other end that are trying to make money and save money with AI. That's the big struggle. Absolutely. Thanks for joining us today here at SC25. Thank you, Dave. Thank you, Dave. For Six Five On The Road. I'm Dave Nicholson.

Thanks for joining us and stay tuned for more exciting content.

‍

CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks

Alex Rose from Secureworks joins Shira Rubinoff on the Cybersphere to share his insights on the critical role of threat intelligence in modern cybersecurity efforts, underscoring the importance of proactive, intelligence-driven defense mechanisms.

HP Launches World’s First Business PCs to Protect Against Quantum Hacks - The Six Five On the Road

On this episode of the Six Five - On the Road, hosts Patrick Moorhead and Daniel Newman are joined by HP's Ian Pratt, Global Head of Security for Personal Systems.

What is Autonomous Endpoint Management?

Autonomous Endpoint Management is a framework designed to unify IT operations and security teams on a single platform through real-time control and visibility.

QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella

Quantum is no longer a technology of the future; the quantum opportunity is here now. During this keynote conversation, Infleqtion CEO, Matt Kinsella will explore the latest quantum developments and how organizations can best leverage quantum to their advantage.

Accelerating Breakthrough Quantum Applications with Neutral Atoms

Our planet needs major breakthroughs for a more sustainable future and quantum computing promises to provide a path to new solutions in a variety of industry segments. This talk will explore what it takes for quantum computers to be able to solve these significant computational challenges, and will show that the timeline to addressing valuable applications may be sooner than previously thought.

Inside Dell and Broadcom’s Next-Gen Networking: Tomahawk 6 and Enterprise SONiC Power the AI Fabric - From SC25

MORE VIDEOS

Inside Cisco’s Secure AI Factory with NVIDIA: Turning Enterprise Data into AI Fuel – Six Five On The Road from SC25

The Six Five Pod | EP 285: AMD's $100B Data Center Vision, SoftBank's $5B Nvidia Exit, Government Shutdowns and GPU Shortages