Home

Why SONiC Is Powering the Modern Data Center: Dell’s Journey to Open, Scalable Networking - Six Five On The Road

Why SONiC Is Powering the Modern Data Center: Dell’s Journey to Open, Scalable Networking - Six Five On The Road

Saurabh Kapoor, Director, Product Management & Strategy at Dell Technologies, joins Daniel Newman and Patrick Moorhead for an in-depth look at SONiC and open networking’s impact on data center scalability, reliability, and AI readiness.

How are open, modular networking solutions like SONiC advancing data centers to keep pace with new AI and cloud architectures?

From SC25, hosts Daniel Newman and Patrick Moorhead are joined by Dell Technologies' Saurabh Kapoor, Director, Product Management & Strategy – AI Compute & Networking Solutions, for a conversation on how SONiC is powering the modern data center. The discussion explores Dell’s journey to open, scalable networking and focuses on how SONiC is addressing the demands of modern AI and cloud environments.

Key Takeaways Include:

🔹Open, Modular Networking: Rapid growth in AI and cloud workloads is compelling enterprises to move away from traditional, closed networking stacks, opting for flexible solutions like SONiC.

🔹Operational Simplicity: SONiC’s API-driven automation, microservices architecture, and full-fabric observability are streamlining network operations, debunking myths that open solutions are harder to manage.

🔹Dell’s Real-World Experience: Running thousands of SONiC switches has given Dell deep insight into scaling open networking across diverse environments, including cloud, telco, enterprise, and AI clusters.

🔹Strategic Guidance: CIOs and network leaders should prioritize open, automated networks to ensure infrastructure agility and readiness for the next wave of AI and cloud advancements.

Learn more at Dell Technologies.

Watch the full video at sixfivemedia.com, and be sure to subscribe to our YouTube channel, so you never miss an episode.

Or listen to the audio here:

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript

David Nicholson:

Welcome to Six Five On The Road from SC25, the supercomputing conference to end all supercomputing conferences. This year in St. Louis, Missouri. A little trivia, this is where the supercompute organization is based and was founded. So this is a bit of a special event this year. I'm joined by a very special guest, Saurabh Kapoor. Saurabh, welcome. Thank you, Dave. Thank you for having me here. What do you do at Dell, Saurabh?

Saurabh Kapoor:

So I lead product management strategy for software solutions in AI computer networking business unit. And I also champion Dell at the governing board for Linux Foundation for the Sonic project. So championing some of our open source initiatives, making sure that we bring the best of the world to our customers and partners.

David Nicholson:

Very interesting, because it brings up an interesting subject, which is this idea of open standards as people are building out their AI infrastructure. So, start us out kind of with a little 101. What is Sonic? Where did it come from? Why is it important that Dell embraces and extends SONIC's capabilities?

Saurabh Kapoor:

Absolutely. Well, SONIC stands for Software for Open Networking in the Cloud. It's a brainchild of Microsoft. And back in the days when they were looking to build Azure services, they were dealing with pockets of different proprietary vendor networking stacks. And they decided that, you know, just like in the compute world where Windows came in and that led to a lot of software harmonization across the board, across multiple, you know, vendor platforms running common software across the board. Application portability was easier, manageability was easier. They wanted to do something similar on the networking side, so they took Debian Linux, added networking functionality as containers on top of it, and called it Sonic. And the goal was to open source that, you know, project. They enabled that into the community. had every major merchant silicon vendor who wanted to participate in that ecosystem. Enable the stack, had ODMs and OEMs part of the journey. So that was like real open networking, open source networking into the industry, and I think seven years later. SONIC has become the Linux of networking. We started our journey about seven years back. We embraced open networking. This is aligned with how Michael started the company, where choice and flexibility is key to our portfolio. We want to bring in the open standards to our customers. And we got onto the SONIC bandwagon. The entire portfolio is SONIC-based, so 1 gig to 800 gig. We're launching 1.6 terabytes next year. SONIC is the way forward.

David Nicholson:

You mentioned choice, and one of the key hallmarks of what Dell offers, flexibility and choice, is AI. a headwind when it comes to open standards and Sonic in the sense that people are absolutely frantic for time to market, time to value? Or can you make the case that in fact, if I'm a CIO building infrastructure out, that actually it makes a lot of sense? to go with this more open environment. What are your thoughts on that?

Saurabh Kapoor:

That's a great one and you spot on when you say some of the time to value, time to AI kind of concepts. AI is no more a technology trend. It's a technology revolution that's happening now. We know AI is rolling out. in smart cities, helping climate control and pharma, improving the rate of predictive analytics and health improvements, drug discovery, and things like that. It's happening real-time as we see around us. What's important when you talk to some of those largest deployments with AI is realization that time to value, trust of technology, and ease of management. Those are the three big things that we're realizing. And when I say time to value is, they want to make sure that the technologies they're bringing in, in the next-gen technologies, give them the right capacity to build infrastructures on and move faster. Time to first token is what their objectives are. They want to build technologies and infrastructures on partners they can trust on, because these are big investments, massive investments, large clusters. with the goal to move faster. So they want to make sure that technologies that are proven at scale are brought in. And finally, ease of management. Yes, because you can get an infrastructure up and humming, but then you also need to make sure that the day-to-day operations and manageability is easier. And Sonic, like a checkmark across the board on all those three objectives, it's a technology that is proven at scale in the hyperscaler world. Just at Microsoft, you're talking about 600,000 plus switches in production. So now think of, you know, the broader hyperscaler ecosystem that has deployed Sonic at scale. and proven for AI from a feature functionality perspective. The trust element comes in when you see at the scale where the technology has been deployed. And then finally, ease of management. Because the stack is modular, microservices architecture, Everything is API-centric, so you can pull out the northbound APIs, connect into in-house tools, third-party tools, vendor-provided tools, so a lot of flexibility there. And then looking all the way up to streaming telemetry, silicon telemetry. So you need, like you have access to what's happening at the buffer analysis, you know, the flow analysis, buffer statistics, you know, capabilities like mirror on drops and things like that, so that you have better visibility into the stack. So Sonic is checkmarked across the board on all those objectives. It's been proven at scale and is now championing some of the largest AI production.

David Nicholson:

When you talk about proven at scale, you gave the example of Microsoft. What about Dell? Are you drinking your own champagne, as they say? And so you are deploying Sonic. Well, what have you learned from deploying that at scale in your own environments where you can't run and hide if something goes wrong? You can't say, well, it's Dell's fault. It's like, no, no, you are Dell.

Saurabh Kapoor:

Yes, yes. No, you're absolutely right. I mean, and this is a journey we took a couple of years back when we were, you know, bringing this technology to our customers from tier one cloud service providers to large enterprises and telcos. Dell IT is a big ecosystem. We're talking about 12,000 plus switches that we have in production and actively growing and scaling. And this is an environment we're talking about in data centers. We have manufacturing sites, customer solution centers, executive briefing centers, branch offices. So you see the kind of mix we have with respect to connectivity. We brought in SONIC across the board. We started rolling out SONIC across different environments, so 23 global data centers. and we saw a lot of learning during the process, which is feature functionalities, the kind of hardening and testing we need to do, the use cases we were adding as we were extending SONIC from data centers to manufacturing locations and environments where you need power over Ethernet, port security, and all those capabilities. We started extending SONIC for some of those use cases, and we saw that enabling SONIC for these use cases led to a lot of ease of management. Dell IT teams were able to manage those environments, edge locations, just like how they would manage data centers. So they could just scale quickly. There was ease of management across the board. One common software stack that runs across the board was the Big Mantra. And while we were doing this, we were also working on AI. to bring in the next-gen technologies to our customers. So rolling out Codium infrastructures and different AI solutions that we were building and basing it on Sonic. So all of the AI optimizations that make its way into Sonic was another big initiative in-house. So we have a part 200% Sonic. We're working on it. We have about 3,000 switches in production. We're working towards all 12,000 switches in production in the next couple of years.

David Nicholson:

Some people, when they hear open, they get a chill that runs up and down their spine because they think, oh no, open, that must mean I'm gonna have network administrators that are bringing their pet birds to work and they're the only people who are gonna understand it and if they leave, we're in big trouble. Reassure us that just because it's open doesn't mean that it's not a real enterprise ready, hardened solution. Absolutely. Help us out there.

Saurabh Kapoor:

No, you're right. I mean, you know, this is what we get, you know, with some of the customers who are kind of, you know, been using some of the proprietary stack for years and they, you know, they always, you know, have hesitation around, hey, you know, do I do it or not do it? And, What you realize is once these users have adopted Sonic into the environment, they love it. They love it too. That's the only thing they want to do next, right? So Sonic, and especially with Dell championing this technology, you get the best of both worlds. open standards and the pace of innovation that is happening in the community. You have an enterprise partner who's enabling this 24 by 7 support globally across the board. You have a partner who you can call like 1-800-DELL for all things, Sonic, infrastructure, manageability, new capabilities, predictable roadmap, enabling services, support, training, certifications, whatnot. We've helped 1600 plus production deployments across tier one, tier two cloud service providers, large enterprises and telco, but we're just getting started. The road for Sonic as we see others in the vendor community also embracing it. It's very encouraging to see Sonic getting mainstream across the board. The technology has been proven at scale and we are looking to extend this from the hybrid scaler ecosystem to everybody else from core to AI fabrics and the feedback has been mind-blowing. We love the customer feedback, keep our ears to the ground to make sure that we incorporate customer feedback into our product life cycles and just continue to innovate and evolve.

David Nicholson:

So if someone is concerned about possibly painting themselves into a corner in an era where we tend to talk about things for all practical purposes in terms of infinite scale. When someone says 10 gigawatt data center, we've sort of gotten used to that. But that is just, it's mind boggling. It's mind blowing. What would your advice be to someone who is charged with the responsibility of building out infrastructure that isn't going to paint them into a corner? First and foremost, I guess one decision point is around Ethernet for this. A lot has been made about the advantage in terms of power consumption that comes along with Ethernet. And at scale, that becomes incredible with power being such a constraint. But what would your advice be to someone who is saying, hey, I don't want to paint myself into a corner. Okay, I'm not as nervous about the whole open thing. I think you're going to be there as a backstop to support me. But what do I do? Where do I get started? How do I start this?

Saurabh Kapoor:

Right. So, well, you spoke about Ethernet, right? A technology that has been proven at scale. Some say it's 50 years old. We say 50 years young. Every 18 months, you see the speeds double. There's a rich ecosystem around Ethernet. Cloud was nothing but distributed computing all connected over Ethernet and ProNet scale. You see a similar trend happening in the AI world. You now see some of the largest supercomputers being built with Ethernet as the core infrastructure enabling that. What we are seeing now is with all those optimizations that have made its way into, you know, with Ethernet and the network operating system, it is delivering similar performance as, you know, the traditional technologies that champion HPC environments, right? higher edX switching, ability to address congestion management capabilities, better load balancing, adaptive routing capabilities, telemetry-based congestion management, RDMA or Converge Ethernet to address lossless fabrics, all those things packaged with higher edX switching, 800 gig, 1.6 terabyte coming soon, allows the data movement within highly optimized fabric. What the users need to understand is when we're looking at these AI infrastructures, the workloads are different, the characteristics are different. You're looking at elephant flows, busty traffics, links that can get saturated in microseconds. So you need that infrastructure that has all these AI optimizations. We've all brought in all of those capabilities in Sonic. So make sure that you are not leveraging your traditional proprietary networking solutions, you bring in AI-optimized infrastructure to champion those environments. That's first. And second is make sure that you're not just looking at networking independently. AI is all about that solutioning with compute, storage, networking, all coming together to a common objective or certain outcomes and use cases that you need from training, inferencing, to fine-tuning. At Dell, we brought in the concept of Dell AI Factory, which is bringing in the AI-optimized infrastructure, making sure we bring in the data element, not taking AI to data, but enabling that infrastructure to bunch of outcomes and use cases, and simplifying the journey towards AI. T-shirt size, package for different workloads, use cases for an end objective. So those are some of the things that we are seeing as best practices and some of the largest rollouts happening with those.

David Nicholson:

So we hear a lot about rack scale deployments from Dell because these things are becoming largely rack scale deployments. When Dell talks about a rack-scale deployment, they're talking about Dell racks. When you talk about Sonic from an open networking perspective, correct me if I'm wrong, but you can have this Dell Sonic Ethernet network that Forbid this may happen, but a customer could in fact drop racks in that aren't Dell racks of gear, correct? One of the benefits of open standards. So it goes back to that idea of being locked in because you love what you're getting, not because of a proprietary standard. So that whole Dell networking story is open in that sense.

Saurabh Kapoor:

100%, 100%. And then the stack will remain open, as I said, right? This is how Michael started the company and choice and flexibility is big because we want to give customers the control on how they want to build that infrastructure. And he has rightly said, they're looking for, you know, a good heterogeneous environment. Sonic provides them the platform where they can pick whatever technology, hardware technology they want. It's an operating system that is running across the board, follows RFC and IEEE standards with every protocol that is running on it. and allows you that choice and mix that you want for your environment. So you're not locked into a vendor stack, a proprietary stack where you have to use a certain hardware, a certain operating system, a certain set of tools that are vendor provided, or you're locked into a certain vendor roadmap to deliver those capabilities and infrastructures. You now are basing yourself on a stack that is open source based. It gives you the flexibility and economics of scale when it comes to hardware. You seek a partner that gives you a promise of predictability, support, as you champion those open source technologies in your environment, and then just scale and build on top of it.

David Nicholson:

Saurabh, final question for you. As we look forward to SC26, because increasingly, the supercomputing conference is representing a date in the calendar more important to me than even holidays. As you look forward to SC26, any predictions you want to go on the record? They can be crazy predictions about the direction we're going to go. What's going to happen in the next year? And don't just tell me that, oh, you know, speeds, NIC speeds are going to get faster and network. OK, we expect that, right? But what do you think? Anything crazy going to happen in the next year?

Saurabh Kapoor:

Well, what we are seeing is, yes, we've championed some of the largest hyperscales in the world, adopting AI, and building the largest supercomputers. The next wave of trends and evolution is going to be enterprises now rolling out AI. Those are not going to be massive clusters. We're seeing variability from a 256 GPU cluster to a 2K GPU cluster, and building those use cases and outcomes that they're looking at from banking healthcare and different functions within enterprises, I think that's going to be a big thing. So creating those smaller t-shirt sized validated architecture is something that we're laser focused on. And the other thing we're seeing is more likeness towards open standards-based architecture because that helps them grow faster and bring down the cost of building those infrastructures. Power is going to be another big element. We're looking at the next generation switches now with DLC as when the capabilities like liquid cooling capabilities are going to be important because these are power-hungry devices and how do you manage that cost and consumption of power with liquid. Those would be three big ones, AI enterprises, ease of management, power consumption, and validated architectures.

David Nicholson:

Well, Saurabh Kapoor from Dell, thank you for joining us. And by the way, your comments on the age of enterprise AI deployment in 2026 maps nicely with a study that Wharton recently commissioned, interviewing a lot of customers of folks like Dell. And so that seems to ring true. But we'll see. We'll see how your prediction bears out at SC26. But from SC25, thanks for joining us. I'm Dave Nicholson for Six Five Media On The Road.

MORE VIDEOS

Building Context-Aware AI for the Enterprise - Six Five On The Road

Savinay Berry, EVP & Chief Product/Technology Officer at OpenText, joins Patrick Moorhead and Daniel Newman to discuss how context-aware AI, open platforms, and unified data management are shaping the future of enterprise AI at OpenText World 2025.

Microsoft Azure Cobalt 100 VMs Momentum and Adoption - Six Five On The Road

Brendan Burns and Arun Kishan of Microsoft join hosts to discuss the momentum and adoption of Azure Cobalt 100 VMs, sharing insights on migration, performance, and how these custom Arm-based VMs are shaping the future of cloud-native workloads.

Inside Dell and Broadcom’s Next-Gen Networking: Tomahawk 6 and Enterprise SONiC Power the AI Fabric - From SC25

Jim Wynia, Director Product Management, Networking at Dell Technologies and Hemal Shah, Distinguished Engineer at Broadcom join host David Nicholson to discuss how Tomahawk 6 silicon, PowerSwitch Z9964, and Enterprise SONiC are advancing AI data center fabrics for scalability, cooling, and manageability.

See more

Other Categories

CYBERSECURITY

QUANTUM