Home

Managing Intelligent Fleets: How HPE Is Redefining Compute Ops at Scale - Signal65 Webcast

Managing Intelligent Fleets: How HPE Is Redefining Compute Ops at Scale - Signal65 Webcast

Signal65’s Ryan Shrout and Russ Fellows discuss HPE’s unified ProLiant compute stack with Ganesh Subramanian, exploring cloud-native fleet management, AI-assisted operations, edge resilience, and how policy-driven orchestration is redefining enterprise infrastructure.

Enterprise compute has changed dramatically, but compute management is struggling to keep pace.

Ryan Shrout and Russ Fellows sit down with Ganesh Subramanian, Head of Product Management, Compute & Software at HPE, to discuss the evolution of unified compute infrastructure and the management layer required to operate modern fleets.

As enterprises shift from single data centers to distributed fleets across colocation sites and the edge, complexity has increased exponentially. Network variance, fewer on-site personnel, tighter security requirements, and inconsistent environmental conditions demand a new operational model.

This Signal65 conversation explores how HPE’s Compute Ops Management (COM) platform enables policy-driven, cloud-native fleet management at scale.

Key Takeaways:

🔹 Compute management is now fleet management: Enterprises are managing thousands of distributed systems, not single locations.

🔹 Cloud-native management scales elastically: COM eliminates the need for regional appliances and delivers centralized policy control.

🔹 Compute Copilot changes workflows: Natural language interaction replaces manual navigation, accelerating troubleshooting and compliance tasks.

🔹 Resilient edge systems are imperative: Stress testing of the DL145 Gen11 demonstrated consistent AI inference performance under extreme edge conditions.

🔹 Security cannot be an afterthought: Support for FIPS 140-3 Level 3 ensures hardened, enterprise-grade deployments.

Watch the full discussion and subscribe to our Youtube Channel for more Signal65 infrastructure insights.

To learn more about HPE Compute Ops Management: https://www.hpe.com/us/en/hpe-compute-ops-management.html

To learn more about the HPE ProLiant DL145 Gen11: https://www.hpe.com/us/en/compute/hpe-proliant-compute/dl145-gen11.html

Read the Signal65 research paper: https://signal65.com/wp-content/uploads/2026/01/Signal65-Insights_Unified-HPE-ProLiant-Compute-Infrastructure-Stack.pdf

Disclaimer: Signal65 is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript

Ryan Shrout:
Hey, everybody. Welcome to a Signal 65 video insights. I'm your host, Ryan Shrout. I'm the president at Signal65, joined by my cohort, Russ Fellows, VP of the data center labs group at Signal 65. Russ, how are you doing today? 


Russ Fellows: 

Great. Hello and welcome, everyone. 

Ryan Shrout: 

And we are joined by a special guest. We've got Ganesh, who is the head of product management of the compute and software group at HPE. Ganesh, thank you so much for joining us today. It's going to be an interesting conversation.

Ganesh Subraman: 

My pleasure, Ryan. And again, Thanks for having me join you guys.

Ryan Shrout: 

It's going to be good. So recently, Signal 65 published a report looking at the unified HPE ProLiant kind of compute infrastructure stack that was a combination of analyzing some of the software layers for management as well as the hardware itself behind this. If anybody hasn't seen that report, I encourage you all to go to Signal65.com and look for it. But we're bringing in Ganesh here to really talk through some of the details of the report, but really just kind of get an overview of the ecosystem and the environment that makes some of these solutions that HPE is providing so valuable to the market, right? So one of the things that was still relatively new to me, which is why Jonathan and Russ, who worked on this project together, were able to even educate me on was talking about you know, enterprises that are managing compute across data centers, colo facilities, edge devices, you know, branch offices of different, you know, businesses. And now you've got this weird, amalgamous, like AI at the edge type of idea as well. From where you sit at HPE, how has the complexity of compute management really like changed over the last few years because of that?

Ganesh Subraman: 

Great question, Ryan. So at XPE, we've seen this industry transition. So over the last few years, compute management has shifted from one data center, one team, one toolset to more of fleet operations across many sites, data centers, colos, branches, and now edge AI locations as well. So the complexity we are seeing is not just with more servers. It's about the variance. So different network conditions, fewer hands-on site, tighter security expectations and a stronger need for standardization. And that is why we've been moving from box by box administration to more policy driven fleet management where onboarding Compliance updates and observability are handled consistently at scale through a cloud rating model. Customers are prioritizing remote management and one-sided industry data point, which is a part of the Signal 65 report, says that remote management is a top requirement for 60% of the enterprises deploying hybrid and edge workloads.

Ryan Shrout:

Yeah, and one of the things that we talked about in the report a little bit, and that we'll get to here in the discussion, right, is like not only managing some of these AI edge cases, but utilizing AI for part of that management process too, which I think is really interesting. So I want to start by just asking a question, and I'll get to both of you on this, on the HPE ComputeOps management platform in general, right? It's described as a cloud-native management platform, kind of to your point you were just getting to. And what does that actually mean for server management, to be cloud-native? Does that architecture really matter for customers when you're talking about those dozens or even hundreds of devices like the cloud-specific piece of it?

Ganesh Subraman: 

Absolutely. And cloud native specifically in this context means that the management plane is delivered as a service that scales elastically, and it's built to manage servers regardless of the location. And that is done through a single pane of glass unified cloud-based console. Practically, this really matters because distributed customers don't want to deploy and maintain a stack of, say, regional management appliances. They want the ability of a centralized policy, fast feature delivery, and the ability to manage hundreds of sites consistently without adding operational overhead.

Ryan Shrout: 

So the idea is basically we can apply the underlying structure that we have come to understand of cloud compute, generally speaking, but to this specific function. Russ, any thoughts on kind of how you have seen that change over the course of your career, either doing some of this kind of operations management for our own labs, right, but also maybe as testing it in different ways?

Russ Fellows: 

Yeah, it's interesting because at first, probably like a lot of IT people, I was a little reticent to use a cloud-only service. It's like, OK, that's fine for an additional layer, but I want something on-premise, right? I want to get my hands on it. And that's all well and good when you're just managing a single site, but quickly realize when you have multiple sites, that's just, as Ganesh said, it's really just not scalable or even tenable anymore. And the software as a service just works much better for fleet management. When you have remote sites, we have a couple of sites ourselves, right? Just doing management with just a few sites quickly realized that having that as a service model scales much better.

Ryan Shrout: 

Now, Ganesh, you guys recently launched Compute Copilot inside COMP. And I'm very curious how you view that changing IT operations. What does it mean to have AI-assisted management? And how does that change the day-to-day administrative functions for a lot of these users?

Ganesh Subraman: 

Great question again, Ryan. So Compute Copilot, if you look at it, it changes the workflow from just like a click and hunt to more like ask and act, right? And admins can use natural language to get their questions answered on configuration, inventory, firmware status, and compliance checks. More so, we've seen more than 30% of the interactions, which is more documentation related use cases, which is where a lot of interactions happen. And we are seeing a lot and a lot of people are using Copilot to actually get the information they want, which they would have spent about like an hour going through the documentation, which is already there. And adding to the top of the co-pilot initial launch, we also have the native OpsRamp integration that kind of turns that into a more ops-related motion, where server events can flow into a centralized command centers where teams can, say, triage an issue, automate, open incidents, and close the loop. without manual correlation or context switching. Net-net, it's faster time to understanding what the issue is, faster time to acting on remediating that, and then faster time to resolution. Altogether, it's a more unified, I would say, operating model across various PANHP domains.

Ryan Shrout: 

I know in the report that we published, we walked through some of the process of onboarding a new server or a new site and kind of how long that takes or rather how fast that can be. And in our experience, we talked a little bit about kind of the hybrid environments of Edge and Colo and Data Center and how all that works. But I'm curious from your perspective, where are customers in your view seeing the most meaningful time savings and or kind of like risk reductions after they actually finalize and start deploying the compute ops management system?

Ganesh Subraman: 

If you look at it, we see the biggest gains in, you know, I would say three larger buckets, right? One is remote ops efficiency, right? Less time is spent in managing the servers. Again, it's all done remotely. The other thing is around, you know, downtime avoidance, right? Faster detection and resolution through I would say centralized visibility and automation. And then the last item, you know, we also see biggest gains for the customers is around travel time and truck roll reduction, right? So which means that they have fewer on-site visits for routine server lifecycle management tasks. And for some of the quantified examples, HP has pointed people to the Forrester TEA findings, which is included as a part of the Signal 65 findings, wherein customers have realized up to 95% reduction in time managing their servers and up to 4.8 to almost five hours less downtime per server per year, and up to almost $50,000 worth of travel cost savings in over three years. Again, these have all been real-world customer references where they have realized operational efficiencies by leveraging HP's ComputeOps management.

Ryan Shrout: That's pretty impressive. I'm going to switch gears a little bit here, and I'm going to talk about the hardware side of things that we looked at in the report. We looked at a HPE DL145 Gen11 system, and we did some performance analysis and some testing of this device and found it to be performing incredibly well, very consistently, even under what I would say difficult conditions meant to mimic kind of an edge environment that Russ and Jonathan were able to set up for us. From HPE's perspective though, What does it mean to have this kind of validated confidence in your devices when consumers are looking to deploy edge compute or edge AI compute so that they, in areas where they're not gonna fully control the environment, right? You don't really know exactly where the end consumer is gonna put some of this gear. We shoved it inside of like a checkout register type environment. But how does that, the value for consumers and that validation?

Ganesh Subraman: From an HPE side, in terms of at a very high level, if you look at it, these servers are designed to be deployed at edge locations. So onboarding is designed to be fairly straightforward. You connect the server securely to COM. You place the right group and slap the policies for configuration and firmware compliance. From there, the DL145 systems primarily is being monitored by COM and the entire lifecycle of the server is automated within COM. complying to the governance model what the customer or the organization has set up. The key value here is that you can repeat the same process across many servers, across the sites, irrespective of where they are. Scaling from that one server to 100 locations is a workflow, not a reinvention. That is one of the key insights we get from the deployment.

Ryan Shrout: Interesting. Russ, anything stand out to you in terms of our stress testing, if you will, of the server itself in our testing.

Russ Fellows: Yeah, that was interesting. Hadn't done that, you know, take a server and purposefully heat it up and stick it in the cabinet and close the door. That's basically a big no-no for any IT person, right? But, you know, that's actually how these systems get deployed a lot of times. Like you said, at checkout kiosk or something, or, you know, in a hospital, it might be in a room, right? So there's oftentimes no specialized cooling equipment. And it performed well, the performance was consistent, but also just as importantly, it was quite quiet. I mean, even with fans going full speed, because we had the door shut and we're running a full workload, an AI inferencing workload on it, it was still, you could just barely hear it with the door closed. So it ran cool enough and you could hardly even tell it was there, so.

Ryan Shrout: I bet we've all had the experience of maybe going to a mall where they have some edge compute in a large multi-display output and you can like walk by it and hear the fans buzzing on it, knowing that they're using some piece of outdated infrastructure for sure. Ganesh, you know, one of the interesting things that I get to ask when we have partners like you come on and do videos, right, is get some more insights in the industry in general, right? So I'm curious, from an HPE perspective, what industries or use cases, are you seeing that are kind of most actively moving to AI inference at the edge today? And how is HPE kind of utilizing the compute ops management and the ProLiant kind of server combination to build the best solutions for them?

Ganesh Subraman: I would say in terms of industry, we've seen different gamuts of industries adopting. I would say we have enterprise customers who are leveraging Calm for their data center management. We have financial customers who now are looking at a secure way to manage their you know, data centers, which has got tens of thousands of servers at scale again being managed via remote cloud-based management like Calm. We also have Cruise Lines as an example, right, where they also are looking at having remote server management as one of the capabilities. So that way, they can be basically able to manage those servers from one single pane of glass, irrespective of where they are, but also looking to keeping the policies, the security posture in check, again, irrespective of wherever they're deployed, depending upon the type of the environment the servers are going. So it's, I would say, like a wide gamut of industries. We have seen our customers deploy

Ryan Shrout: It's definitely a pretty broad brush that we've seen happening as well. So I wanna ask a couple more quick questions kind of looking ahead for what you and the HPE team have planned. Where does HPE.com go from here, right? Like how do you make improvements? What capabilities or integrations are on the roadmap that have you most excited about kind of the future direction of this kind of critical IT environment?

Ganesh Subraman: I mean, the direction is very clear, right? So Calm is evolving from managing servers to run distributed compute as an intelligent fleet, more automation, more cross-domain context, and faster closed-loop action as AI infrastructure spreads across the edge and the data center. If I were to pick three key things in terms of what I am really excited about, one is deeper AI-assisted operations. Expanding co-pilot style experiences from mere Q&A into guided troubleshooting, recommended actions, and policy changes with the human in the loop. That way, admins can move from finding to fixing faster in their control. The other item is around specifically stronger cross-domain integrations. So we are working on having tighter workflows across compute plus other observability or item tools, for example, with OpsRAM. So alerts, incidents, and remediation can run in a very orchestrated manner from within compute. So again, the OpsRamp integration with Calm is a proof point for this path. And we are also working on many other integration opportunities in FY26. The other item is specifically around better ways to operationalize AI inference fleets around capacity planning, utilization, forecasting, and sustainability insights, because AI distribution makes knowing what's running where non-negotiable. So I'll say at the strategy level here, the theme is around automation, integration, and leveraging agentic AI with the human in the loop, with the human in control, because that is what reduces operational toil as infrastructure scales and distributes.

Ryan Shrout: Interesting. Russ, I'm curious from your view, obviously you're not out there designing the next generation of this, but as a frequent user of it and observer of the market, what are some of the future integrations or capabilities that you might like to see? And I'm also curious, the point of view of always having a human in the loop, do you view that as a critical piece of it?

Russ Fellows: Yeah, that is a critical piece as we find more agents being deployed on our desktop. are in awe of what they're capable of, and shock and horror what they're capable of. So even in the loop, it's quite important. On a related note, though, and it's quite applicable to Gen2KI as well, is all about security. So one of the things that we found with the HPE DL145 is the high level of security. For people who care about those numbers, FIPS 140-3 Level 3, The level three really means it's hardened, it's hardened hardware. So you have to like physically crack, you know, chips open in order to, you know, subvert the security mechanisms that are in place, which is very important for a lot of environments. So if you're in a regulated environment, those levels of security are important. Automation with AI can help, but you still have to have the underlying capabilities there in order to ensure that you meet the regulatory compliance environment. So that's all a part of it. Um, but yeah, where we go with the genetic, I agree. Human in the loop is paramount, um, agents to do research and provide recommendations is perfect. But, uh, I think we want to still be pushing the button for quite a while.

Ryan Shrout: All right, one last question for you, Ganesh. For any IT leader that's watching this or reading our report, they're maybe going up to their leadership and trying to justify the investment in a new cloud native management platform for their own environments. What is your strongest argument for moving forward now rather than saying, I'm going to wait till the next generation?

Ganesh Subraman: Another great question, I've been having a lot of discussions with a lot of the IT leaders around that. The way I would phrase this is, waiting is always expensive because distributed infrastructure doesn't fail loudly, it actually fails inconsistently. So the real risk isn't just about downtime, it's about drift. So there are different firmware levels, inconsistent configurations, uneven security postures, even fragmented visibility across their dozens of sites, irrespective of where they are. So the sooner you put a fleet operating model in place, drive more consistent policy automation and have centralized visibility, the less technical debt you would accumulate. So if you were to ask me three practical reasons to move on, one would be that risk compounds with scale. Every new site, every AI workload, it is going to increase the attack vector and also the operational variance. So a cloud native fleet model is going to help you standardize and verify the posture continuously. Next, it's the time to value. And that is immediate. Once you onboard a server, group the servers together, slap a policy, you're going to get quick wins, fewer manual touches, fewer surprises, and faster response without truck rolls. Number three, it's about AI, which is accelerating distribution. Inference is moving outwards. And if you wait until the fleet doubles, you are implementing management change under stress instead of controlling it. So in short, implementing cloud native fleet management early is like putting guardrails on your growing highway. You want it before the traffic explodes.

Ryan Shrout: That's a good way of describing it. I think that's very relevant for those people watching and for our discussion. So Ganesh, I want to thank you for joining Russ and I in this conversation and for the kind of partnership and the report and the testing we were able to publish earlier this year. Thank you for joining us. Russ, thank you for joining me on this. For anybody who hasn't seen the report yet, make sure you go to Signal65.com and look up the paper here on the HPE Calm and Proliant report and analysis that we did. For now, I'm Ryan Shrout, and we will see you in the next Signal 65 video insights.

MORE VIDEOS

The Six Five Pod | EP 298: Arm’s Big Bet, OpenAI’s Pivot, and the Real AI Infrastructure Race

Arm moves closer to owning the silicon layer, OpenAI sharpens its enterprise strategy, and a wave of geopolitical and market pressures exposes what is really driving the AI race. Patrick Moorhead and Daniel Newman unpack how compute constraints, capital intensity, and supply chain risk are starting to dictate who can scale, who can compete, and who gets left behind as the industry shifts from experimentation to execution.

From Storage to Intelligence: Everpure on Redefining Data for the AI Era

AI is turning data into the most valuable asset in the enterprise, but only if it can be secured, understood, and delivered at speed. Everpure explains why data infrastructure is evolving beyond storage into intelligence platforms built for AI.

Six Five Connected | How Dell Is Rebuilding the Enterprise PC for AI

Host Diana Blass and Six Five Media bring together Dell leaders to explore how enterprise PCs and workstations are evolving for the AI era with AI workloads, engineering innovations, and premium user experiences, reshaping the future of enterprise computing.

See more

Other Categories

CYBERSECURITY

QUANTUM