Network Topology Analysis & Scaling Considerations

Home

Network Topology Analysis & Scaling Considerations - Webcast

Seamus Jones, Director, Technical Marketing Engineering at Dell Technologies, joins the webcast to discuss how AI advances like LLMs and Agentic AI are redefining network topology analysis and scaling for data centers—essential insights for AI infrastructure leaders.

For more insights from the AI Lab, visit:
https://signal65.com/ai-lab/

Download the full report:
https://signal65.com/wp-content/uploads/2025/08/Signal65-Insights_Network-Topology-Analysis-Scaling-Considerations-for-Training-and-Inference-1.pdf

The explosion of large-language models (LLMs), agentic AI, and high-concurrency inference workloads has placed unprecedented demands on data-center networking infrastructure. Many networks designed for compute-bound workloads are now hitting a communication-bound barrier — where the network fabric, not the GPUs, becomes the limit.

This report from Signal65 examines how network topology choices — particularly the shift from traditional leaf-spine (CLOS) architectures to rail-based (RAIL) designs — can deliver measurable performance and scalability advantages for both training and inference.

Key Findings

🔹 Training workloads with large GPU counts (hundreds to thousands) are dominated by synchronized collective communication operations (such as all-reduce and all-gather), where same-rank GPU pairs communicate heavily. These patterns strain CLOS topologies that rely on three-hop leaf-spine routing rather than optimizing for same-rank locality.

🔹 RAIL-optimized and RAIL-only topologies provide significantly higher efficiency for AI workloads. Single-hop intra-rail communication reduces latency and increases bandwidth utilization compared to CLOS architectures.

🔹 As AI model architectures evolve (e.g., mixture-of-experts and sparse routing), cross-rail communication decreases, making rail-only fabrics increasingly advantageous for scale-out training and inference.

🔹 For inference workloads, low latency and high concurrency are essential. Network design must support high throughput of independent requests — not just raw GPU compute — making topology and congestion control critical.

🔹 Modern Ethernet-based fabrics (400 Gb/s and 800 Gb/s) with hardware offloads, lossless protocols (RoCEv2), and intelligent congestion management (ECN, PFC) achieve near-linear multi-node scaling, keeping the network from becoming the limiting factor on GPU utilization.

Why It Matters

Enterprises deploying AI infrastructure can no longer treat networking as an afterthought. The fabric is now a first-class performance and cost lever. By choosing architectures aligned with modern AI communication patterns — such as rail-based fabrics — organizations can:

🔹 Support high-scale training and inference without network-induced bottlenecks.
🔹 Deliver predictable performance as clusters grow, ensuring GPUs stay fully utilized.
🔹 Simplify operations by using open-standard Ethernet interconnects instead of proprietary fabrics.
🔹 Reduce total cost of ownership by avoiding over-provisioned CLOS designs and implementing fabrics built for actual traffic patterns.
🔹 Future-proof deployments by adopting next-generation Ethernet (800 Gb/s, 1.6 Tb/s) and flexible topologies that evolve with emerging model types (e.g., sparse MoE, hybrid training/inference).

‍

Watch the full video at sixfivemedia.com, and be sure to subscribe to our YouTube channel so you never miss an episode.

‍

Or listen to the audio here:

‍

Disclaimer: Signal 65 Webcast is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

‍

Transcript

David Nicholson: I'm Dave Nicholson from The Futurum Group. Welcome to this installment of insights from the AI Lab. I'm joined by Brian Martin from Signal 65 and Seamus Jones from Dell Technologies. Gentlemen, good to see you again. Welcome.

‍

Brian Martin: Great to see you again.

‍

David Nicholson: So today we're going to dive into this topic that's interesting because topic and topology are kind of related in a way, but the question is why is networking even more important than it's ever been in the past? And we want to dive into some of the questions about how network topologies affect the kind of performance you can get, the kind of scalability you can get for both inferencing and training environments. You've collaborated to come up with a kind of a case study, white paper, and a lot of real world testing in Signal 65 labs. Seamus, what was the point originally going into this? What were you hoping to discover?

‍

Seamus Jones: Yeah, I think it's no surprise, right as we have historically had topologies that were set up for a compute bound framework, ensuring that they have performance for those compute architectures. And really with AI, what we're seeing is that it's more of a communication boundary, AI workload and that framework has had a much broader impact on the performance of these AI clusters when you have these highly performant GPU frameworks in place. And if you don't have the right topologies, it really can cause bottlenecks that were unforeseen. So you'd be spending all this money, effort, time, energy in developing topologies or developing systems in the data center, but then you're then throttling them artificially by your topology. And so we wanted to make sure we were able to. Brian and I discussed it. We're like, you know, this is a topic that continually comes up with customers. How can we give some, some good sound advice based in fact.

‍

David Nicholson: So what did this, what did the kind of configuration look like, what did the stack look like that you tested, Brian?

‍

Brian Martin: Well, so we looked at three stacks predominantly. We looked at traditional leaf and spine clone networks configured in what's called a fat tree. So full bisectional bandwidth between all endpoints. This is used to be sort of the premier high end, best network you can build. And then we looked at what's evolved, called rail optimized, where researchers noticed that most GPU traffic stays within a particular GPU stack or rail. So say GPU zero on all the servers tends to talk to other GPU zeros far more often than it does GPUs one through seven in other servers. So they were able to take the concept of a Clofat tree, do a rail optimization on it, and instead of all GPUs talking to any other GPU equally, they said, let's focus on GPUs in a specific rail talking to each other. Eight GPUs per server. Suddenly we have eight times as many servers in a two tier network topology. So we can go from a thousand servers and 8,000 GPUs to 8,000 servers and 64,000 GPUs in a single two tier fabric. It's nuts.

‍

David Nicholson: Anything surprised you, Seamus, about the results or were they kind of in line with what the math showed before you actually ran tests?

‍

Seamus Jones: Yeah, I mean, when you look at the math, the costings didn't completely line up on the optics. I think one hidden cost that I didn't expect, especially in the framework, was the fact that you can drive a lot of power requirements or a lot of power consumption and a lot of cost through the optics that you choose. I know it sounds crazy, but the second you get up into the 6,000, 60,000, you know, optic numbers, you're looking at a large amount of cost and a large power draw. So by going to things like LPOs versus copper, you know, you're, you're actually reducing costs straight away. The other thing that we looked at that I thought was really interesting was when you, I mean the sweet spot for most customers today is really looking at how they can have the lowest cost and the highest scale. Right. And in a reasonable fashion. I think the rail optimized really pointed that out, or excuse me, rail, not rail optimized. So the rail framework, having a dense LLM for training and even having a framework for a model of experts, inferencing that real architecture really made a big difference.

‍

David Nicholson: So when you, you, you know, MOE model experts, mixture of experts. We love our, we love our TLA's three letter acronyms and, but you know, saying exactly, exactly what you're saying. Brian Martin, just give us, give us 20 seconds on what does that mean to have a model that is leveraging a bunch of experts or a mixture of experts.

‍

Brian Martin: Got it. So the big thing with a mixture of experts, a traditional model LLM, says it has 72 billion parameters. All of those parameters are active at once, which means the GPU is processing tensor vectors through those, all of them. For every operation it does a mixture of experts you can think of as a collection of smaller LLMs being called upon when it's appropriate. So imagine having a room with 16 experts and I have a question about medicine or a question about law. And I activate the lawyer. Always activate the lawyer and the doctor. Or I have a civil engineering question, activate the civil engineer expert. So experts activate. And then we see across systems one of two effects. Either a much lower GPU impact because we have only, you know, 8 or 10 or 12 billion parameters active instead of all 72. Or on the flip side, we can run a much larger model, you know, hundreds of million billions of parameters, because I only have 72 active at the same time. So it really opens up that field.

‍

Seamus Jones: That, that sparsity or task routing really means that you're able to get much more efficiency in that mixture of experts. Thank you for the corrections there, Dave.

‍

David Nicholson: Oh, that wasn't correct. It wasn't a correction. Because, because, because people use it both ways.

‍

Seamus Jones: I mean, in a Llama 4 Maverick model, that's a perfect example of a mixture of expert models. You know, we're, we're able to see a dramatic difference in performance if you're able to actually have that, those tasks routed to the right expert like what Brian was talking about.

‍

David Nicholson: Well, so the big difference, the lawyer model, all it needs to do is just generate the response. No. Over and over and over again. It's got the lowest power drop. But from a Dell Technologies perspective. Seamus, are you. Do you think that this whole mixture of experts' ideas is something that is likely to be kind of omnipresent in a lot of environments that you guys are working on?

‍

Seamus Jones: Absolutely. What's happening is that, you know, we're in the age of agentic AI, Right. And having a model or mixture of expert models. So that's a tough one to say over and over again. It really means that we're able to dedicate that traffic through dedicated hardware like four, two nics or Tomahawk switches through the Dell networking portfolio. And it means that we can use these lossless Ethernet fabrics to get the best performance possible. Right. Instead of having just broadcast right. Within other topologies, we're able to use this framework to make sure we get the best performance. So it will not go away. It's actually going to get even more robust over time. And as the fabrics become even more performant, we're going to see it become critically important as well.

‍

David Nicholson: Yeah, Brian, there's sort of an interesting parallel here because it's not like this concept of fit for function. The right tool for the right job is new. Right, Right. But it's finally, you know, we're, we're embracing it in the world of AI from a, from a Dell Technologies perspective you can, you can sort of see that it's analogous to the decisions about hardware that, that, that that are being made. And this is a question for you Brian Martin, because I, because you're, you're the, you know, a little more objective than and Seamus unbiased then you know, since he's, since he's part of Dell Technologies. But the idea of Dell being kind of the Switzerland of infrastructure for these environments, how important is that, how important in this testing was it to know that the Dell networking that was involved adheres to open standards as, as an example I was just going.

‍

Brian Martin: To say that Dave, that's a great lead in you know being open standards based. Scalable means a couple of things. One, it really shows up on the TCO cost line but not just in the hardware purchase but also in operations and management. Managing well known switch technologies, Ethernet technologies is something that our workforce has some experience with and that really contributes to overall lower management and operational costs in addition to capital acquisition costs. These are the things we've known and loved or at least known and worked with for years at scale.

Seamus Jones: What did you find interesting about working with Sonic specifically because Sonic as a framework within those operational costs, I mean we have unified instruction sets, we kind of, we work on Sonic based standards. But I wanted to see what point.

‍

Brian Martin: Your take there was standards based especially when we're testing multiple topologies and configurations in a dynamic lab. The simplicity around management, programmability, whether it's API based, CLI based, configuration based to configure, set up, reconfigure, reload, save. Sonic is a great environment to work in.

‍

David Nicholson: You know I asked Seamus the question about was there anything surprising that came out of this? I'll throw this question out to both of you. What are some misconceptions that people might have who are starting to take a look at networking in the area of AI? One thing we haven't mentioned yet is this idea of training something like a frontier model and how that might be different than specifically then the inferencing side. But what would you expect people to not understand very well?

‍

Seamus Jones: One thing that that literally came up last week was they, their perception was that they could invest in some of these high performance NV link based architectures like the XE9680 with you know, H200 GPUs and they wouldn't necessarily even consider re architecting their networking architecture to optimize that system because they didn't even consider the GPU fabric on the back end. And they thought, well, we could just implement it into our existing fabrics today. And there's definitely a shock factor that comes in place when you start talking about. No, it's, it's critically important. I mean, you think about each of those GPUs as 10 GPUs. 10, 10 connections out of the back of that box. And you know, you can have multiple high performance, low latency fabrics off the back of that. That builds up a number of ports immediately. And if you start to scale those numbers out, it really becomes really interesting. Frontier models, what you're talking about, I mean that's like, what do they call it in the Tesla, the insanity mode or you know, the insanity function, where it goes full, full performance, full connectivity all the time on rail optimized architectures. So it's even the next step beyond rail. And it means that we can scale to exceptionally high levels. What, just beyond prime? What were the numbers on, on that one?

‍

Brian Martin: Rail optimized for a two tier network is eight servers and 64,000 GPUs.

‍

Seamus Jones: Exactly. Yeah, 64,000 GPUs. I mean, I wish we were selling more opportunities with 64,000 GPUs. Right.

David Nicholson: I want to be in the power business. I'm trying to figure out how to invest in where the electricity comes from business. Brian Martin, I don't know if you wanted to follow up on what Seamus, I did.

‍

Brian Martin: You asked sort of what really came surprising, what struck me through this project as I was working on things like drawing, architecting first. Just keeping track of everything is almost mind numbing. But at one point it dawned on me that after spending over a decade in storage and fine tuning storage area networks and the criticality of storage networks in the GPU world, those networks are orders of magnitude bigger and faster, higher bandwidth, more important. What used to be the premier network is now just called the front end network. It's like the storage isn't also ran in these environments. Yeah, it was a little mind boggling.

‍

David Nicholson: Well, I've got, I've got, I've got kind of a little flyer of a question to, to, to wrap on. I think I calculated that between the three of us, we have something like 275 years of experience working with IT infrastructure, something like that, plus or minus two or three accurate. And, and, and so I think we would all agree that data is at the center of what we do with all of these tools, AI or not. First, I want to start with your thoughts on this, Seamus. People are really concerned about the difference between having data available to be retrieved versus training a model on their data and specifically where their data lives. Dell has been delivering infrastructure to the largest cloud providers on earth and the smallest companies on earth. So you don't necessarily have a specific dog in the fight when it comes to that.

‍

David Nicholson: But are you seeing more people want to run these things within their own walled gardens or on premises in a way that may have shocked proponents of cloud from five years ago? Long question, but I think you get the point. What are you seeing out there?

‍

Seamus Jones: Yeah, I mean, to your point, five years ago there were a lot of companies that had a cloud first and cloud only objective. We're definitely seeing a change in that perspective. While cloud is still. A lot of workloads are still being deployed in cloud architectures and I don't think that'll change things like Salesforce, sftc, like some things like that. But the problem that we're seeing is a lot of, a lot of customers, proprietary data has gravity. But we always talk about that fact and just to point it out, we've done a lot of customer POCs or what we would call, sometimes we call them POVs, proof of value. Right. And so what we do is we implement, you know, what the customer is trying to achieve on their, in their data center on prem. And what they're finding is that, look, if I was to do these PoCs using cloud architectures because it's very easy, quick and capable to deploy these in the cloud, I could spin up a GPU in the cloud on a hosted service very inexpensively and very quickly. However, the data. But when you're not using your own data set and you're using cloud specific tool sets to build your, your AI applications and workloads, then it doesn't necessarily translate to the same proof of value as it would when you're on prem and you're dealing with your own data and you're doing that at scale. Right. So really a critical piece that we're, what we're seeing is that a lot of these customers not only are deploying production level AI deployments on prem, but they're actually looking to do PoCs or what we would call proof of value POVs on prem as well, just to show, look, does this show value to our business or not? Because quite often if you do one in the cloud and then you translate that to an on PREM architecture, it doesn't translate. Right, right. So it's really critically important and that's why we developed this lab, this architecture, this infrastructure, so that customers can do an on Prem POV remotely in our lab that Signal 65 has helped us deploy. So we have physical architecture that we can transport data and do and have a framework that would. That would replicate and simulate a customer's environment. Exactly right. That's the objective. So we're showing better pov. It's a long answer. The bottom line is yes. I'm going to give you the quick one. People are moving to an on PREM framework especially for these. These data sets that are, that are, have, have gravity. They might be security concerns. They might have consistent changes, all these things. But yes, PREM has definitely become cool again. On PREM is definitely seeing a resurgence.

‍

David Nicholson: Brian Martin, why do you hate hardware so much?

‍

Brian Martin: It's just been my life. It's great that it's cool again. Organizational data has been and continues to be the crown jewelry. And in this world of AI, access to meaningful value from that data is growing exponentially. So having the compute close to the data and within the security zone definitely makes sense.

‍

David Nicholson: And just a little sanity check here. None of us has signed over power of attorney to an AI agent.

‍

Seamus Jones: You yet?

‍

David Nicholson: Correct?

‍

Seamus Jones: Yeah. Hold on, let me. Let me turn off my AI agents and respond to that.

‍

Brian Martin: Yeah.

‍

Seamus Jones: Exactly.

‍

David Nicholson: Exactly. Any. Any final thoughts on this? Anything, Anything we missed? We're going to have. We're going to have links to the white paper.

‍

Brian Martin: Yeah. One parting thought Dave. For what? What I'll call modest environments. Seamus mentioned earlier, the optics and the connectivity on these switches contributes a lot more than we expected to cost. And power. If you only need 1,000 servers and 8,000 GPUs, then a single tiered rail architecture can do that with eight switches instead of the 96 switches that would be required in a traditional leaf and spine. That's a dramatic savings.

‍

David Nicholson: And it adds up. It adds up. I was actually using an AI assistant to help me run some scenarios on power. We're hearing a lot about 10 gigawatt data centers. If you were to try to power that 10 gigawatt data center with only photovoltaic solar, you would require 400,000 acres of solar panels. The state of Rhode island is 420,000 acres. Then on top of that, because the sun doesn't shine all day long, you would need the equivalent of 1 million Tesla Cybertrucks worth of batteries. And the estimate to build that out, it's about 80. It's between 80 and $100 billion to power one of those 10 gigawatts data centers. So at scale when you start talking about the efficiencies that you're referencing, it is. It's extremely meaningful. And, and, yeah, offline. Let's figure out how we can all get rich investing in power. But with that, I want to thank both of you for being here. Another great conversation. We'll have a link to the white paper, supporting documentation for those who want to dig in more deeply. For the Futurum Group and Signal 65 Labs and Dell Technologies, I'm Dave Nicholson. Thanks for tuning in to this episode of Insights from the AI Lab.

‍

CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks

Alex Rose from Secureworks joins Shira Rubinoff on the Cybersphere to share his insights on the critical role of threat intelligence in modern cybersecurity efforts, underscoring the importance of proactive, intelligence-driven defense mechanisms.

HP Launches World’s First Business PCs to Protect Against Quantum Hacks - The Six Five On the Road

On this episode of the Six Five - On the Road, hosts Patrick Moorhead and Daniel Newman are joined by HP's Ian Pratt, Global Head of Security for Personal Systems.

What is Autonomous Endpoint Management?

Autonomous Endpoint Management is a framework designed to unify IT operations and security teams on a single platform through real-time control and visibility.

QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella

Quantum is no longer a technology of the future; the quantum opportunity is here now. During this keynote conversation, Infleqtion CEO, Matt Kinsella will explore the latest quantum developments and how organizations can best leverage quantum to their advantage.

Accelerating Breakthrough Quantum Applications with Neutral Atoms

Our planet needs major breakthroughs for a more sustainable future and quantum computing promises to provide a path to new solutions in a variety of industry segments. This talk will explore what it takes for quantum computers to be able to solve these significant computational challenges, and will show that the timeline to addressing valuable applications may be sooner than previously thought.