Overcoming AI Bottlenecks: What’s Next for AI Inferencing at Scale?
On this episode of Six Five On The Road, Patrick Moorhead and Daniel Newman are joined by Lenovo’s Anwar Ghuloum to discuss why AI inference, infrastructure, and edge-first design are becoming the defining challenges as manufacturers scale AI beyond pilots.
AI inference isn’t failing the IQ test. It fails the stress test once it hits the factory floor.
Recorded live at Lenovo Tech World from the Sphere in Las Vegas, this Six Five On The Road conversation looks at what actually breaks when AI moves out of the lab and into real-world manufacturing environments.
Anwar Ghuloum, Vice President of The AI Center of Excellence in Lenovo’s CTO Office, speaks with Patrick Moorhead and Daniel Newman to break down why centralized architectures struggle once AI leaves the demo stage. As data scales and latency, security, and power constraints kick in, “the pilot worked” quickly becomes “why is this so slow in production?”
The conversation zeroes in on why inference, not training, has become the real bottleneck on the factory floor.
Key Takeaways Include:
🔹 Inference Is the Real Bottleneck: Scaling AI impact on the factory floor requires overcoming latency, memory, and energy constraints, not just improving models.
🔹 Edge Intelligence Is Essential: Real-time manufacturing insights demand inference closer to where data is generated, rather than relying on centralized processing alone.
🔹 Security Must Span Edge to Datacenter: AI pipelines must protect sensitive data while maintaining consistent governance across environments.
🔹 Hybrid Architectures Enable Scale: Training centrally and inferring locally allows organizations to balance performance, cost, and control.
🔹 Efficiency and Sustainability Matter: AI-ready factories must deliver performance without unsustainable power or energy tradeoffs.
Learn more at Lenovo.
Watch the full video at sixfivemedia.com, and be sure to subscribe to our YouTube channel so you never miss an episode.
Listen to the audio here:
Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.
Patrick Moorhead:
The Six Five is On The Road here at Lenovo Tech World in the iconic sphere in Las Vegas. This event and structure is amazing.
Daniel Newman:
And Dan, we're the live pregame here. Can you believe this? We are the pregame. It is Tech World, Pat. And it's going to be a killer lineup today. It's going to basically be us here for the next couple of hours with all of you. Very excited. And thank you so much all for joining. And of course, we're going to be talking to some of Lenovo's leadership throughout the next couple of hours, and then they're going to have like, it's like a star-studded kind of thing going on here, right? Jensen Huang, of course, YY, CEO of Lenovo, is going to be up on stage. You're going to have Lisa Su, I mean, Cristiano Romano. I don't think I can be fair and give credit to everybody here. I'd be reading off the list, but it's a star-studded event, and yes. This venue never disappoints.
Patrick Moorhead:
Yeah, it is a big show and one of the interesting things about Lenovo is they operate from pocket to cloud, all the way from the smartphone, all the way to the largest hyperscaler data centers and everything in between. One of the big conversations related to AI is physical AI or edge AI. You know, when should you put your AI in certain places? When does it go in your pocket? When does it go in your PC? Where should it go? Now, for the factory floor, it has to operate disconnected from the mother ship power. And I can't imagine a better person to talk about this than Anwar from Lenovo. Anwar, great to see you. Great to see you. Thanks for having me.
Daniel Newman:
Absolutely. So Pat said it well, right? I mean, factory scale is just different. When you're thinking about putting AI to work, there's a couple of big differences. Of course, you've got latency, you've got security, and you also have accuracy. I mean, that's one of those things where, I always say, an LLM, if it comes back and it's wrong, no harm. It's maybe a little annoying. But when the factory floor, when it's not precise, you start to get mis-run batches of gear. You get equipment broken down that shouldn't have broken down. There's a lot going on. So what is, in your mind, the keys, the bottlenecks, to basically do AI and inference at scale inside of these AI factories, literally?
Patrick Moorhead:
AI factories on the factory floor.
Anwar Ghuloum:
You touched on there is really important, which is reliability. I don't know of any company that can afford to have their factories down for any reasonable amount of time. So having intelligence locally, running in the edge or in the factory doing defect detection and so on, having a fast turnaround time to correct that is critical, right? You can't… Comms goes down, power goes down outside the city, it can affect your operations. So that's one of the reasons why hybrid AI on the factory floor just makes a ton of sense. It makes sense in other contexts too that you brought up, you know, personal devices and so on, we can get into that. But like, yeah, on the factory side, it's absolutely mission critical. And, you know, getting scale performance across all the factories in the world is quite difficult to do in a centralized way. So we think the hybrid AI stories are really strong.
Patrick Moorhead:
So Anwar, there has been AI in the factory floor for years, right? And it was traditionally machine learning, very hard to program once you got it right. Now we're in this age of generative AI where you can actually take advantage of all the data that's being produced out there. And my impression from previous generations was it was big data, but it was smaller compared to what we can do with the new technologies. What is it architecturally that needs to change to be able to take advantage of the technology, but also all the data that's being created?
Anwar Ghuloum:
Yeah, I mean, it's amazing. These models that we're getting today are extremely capable. They have huge context windows, meaning you can look at a lot of context to answer a question or make a decision or accomplish what you want to accomplish. I think the challenge, though, has to do with data. You kind of hit the nail on the head here, which is, And there's a lot of results from research and industry showing this. Feeding the right data in is critically important. You can get away, these models are so good these days that even with like a distilled or quantized small model, you can get amazing results if you have the right contextual data that you're providing. So one of the challenges that we're building for in hybrid AI is getting the right data to the model or the agent at the right time. I don't know, a couple of industry blogs recently that kind of really clearly show what goes wrong, where you're just like shoving data into this thing blindly if you're not doing any kind of curating or distillation. So I think that's kind of the next frontier for scaling out models all the way to the edge. And when I say all the way to the edge, I mean like in your pocket, like you said, or on your wrist or whatever.
Daniel Newman:
Yeah, no, that makes sense. Yeah, it's happening really quickly, too. Just being here at this event in the last few days as we came into the year, it was like watching just how fast model innovation. But we saw an interesting thing in a recent presentation, Pat, where they kind of showed the number of models being created. There's so much focus on models, kind of the big frontiers, but so many models for real world, for manufacturing, for use cases, for health care. This is where the volume and all that data it gets unlocked because we're still really trying, most of what people have experienced right now with AI is what, like 5% of the world's data? And so being able to optimize the factory, for instance, is one of those use cases. There'll be a whole set of models that'll have to be created just for these use cases. It's really interesting. But we kind of started, we talked about security, we talked about low latency, all of those things to drive inference, but hybrid is the thing, right? So you might have those zero latency, needs at the edge for running the day-to-day operations on the floor. Same time as all that data is being created, you're not going to want to have, you're not going to have enough compute locally to have all that data stored there, managed there. It's going to all have to be uploaded out to the cloud and then you synthesize, analyze. How are you sort of recommending that organizations design to do this in a way that works?
Anwar Ghuloum:
Yeah, I mean this is part of the mission that our team has. So I run a group called the Lenovo AI Technology Center. We're basically creating a platform to do exactly this. To take data that you've put in your data lake, your lake house, whatever. data storage system you have, and based on the kind of agentic work you want to do or prompts that you're seeing and so on, we can actually go and pull the relevant data, the relevant subsection of the knowledge graph, optimize it for context to be used with that work or that prompt. I think the key thing we want our customers to know is that the data systems themselves don't have to change. It's really what we build on top of it. And getting smarter about how we manage that context, what data wants to live at the edge or can live at the edge for data security reasons or otherwise. and what can live in a lake house in the cloud or in an on-prem data center and whatnot. Our goal is not to be prescriptive about where to move this stuff when you're storing it at rest, but to be prescriptive about how we move it when you're actually working with the data. So really, your existing installation, if you're a customer, your existing storage mechanism should work. There's so many amazing developments in the ecosystem right now. People making old data storage systems work for the modern era. And we just want to make our stuff just work with that as well, so it's like turnkey for our customers.
Patrick Moorhead:
Yeah, the industry seems to be focusing in on this element of a data fabric. It doesn't matter where the data necessarily sits. The industry's already tried piling all of your data into a data lake. That doesn't work. It's unwieldy, and you're one acquisition away from breaking that paradigm. So coming up with a data fabric that cuts across wherever the data is, and then you add on top of that enterprise SAS data, right? So is that what we're talking about here?
Anwar Ghuloum:
Yeah, I think so. I mean, I think it's a fabric in the sense that it's connecting, it's a fairly distributed system that we have with hybrid AI specifically, but in general, if you look at any enterprise installation, you have people with laptops and phones accessing data, you have the data center, you have smaller server clusters, they may have a cloud backend that they're using for some things, for some SaaS application. It's everywhere, so we need to be everywhere. I guess one of the things that maybe is a slight nuance is that it's not just knowing where the data is. You actually have to do some comprehension and semantic understanding of the data, build ontologies that help you understand how this data is related to this other data so that you can actually make sense of it. And this is why, you know, you've mentioned that data lakes in and of themselves don't just work. Just feeding a bunch of data in doesn't work. You really do have to understand the relationships of the data to each other. Is this describing an object? An object on the factory floor, what is it saying about it? That kind of thing. Their company's working on that, we're working on that. It's not an easy problem, but it is a problem that we think we can solve. Maybe the good news here is that with data, there's never an end.
Patrick Moorhead:
who's always going to be part of it, always going to be changing, et cetera. So I think we all agree that an AI factory for a factory on the edge is an important thing, especially in this new age of agentic AI. And I'm curious, what does that look like to you? And what do your customers need to get the steps to get there?
Anwar Ghuloum:
Yeah, I think it's, so it's interesting. You know, there's a lot of, you know, discussion right now in the ecosystem about ROI and so on. As I said earlier, I think the models are amazing. You can do a lot with these models, even the very optimized models. But there's a little bit of work you have to put into it, right? We talked about data a fair bit. I think customers capturing and structuring data as much as possible is helpful. However, I do think one of the big gaps we're seeing now is getting enterprise employees using the tools, familiar with the tools. You can get a license for whatever tool X, Y, and Z for your employees, you're not going to see ROI until they actually use it, your employees are using it, they're seeing that productivity improvement, they're actually experimenting with it, finding new ways to use it, and so on. On some level, I do think your employees are going to be the ones training the AIs, so it's really important that they do that. But yeah, outside of data capture, I would say that's it. I think there's a cultural gap we have right now to getting people using the tools for more than just writing an email your student, your child's teacher. For sure.
Daniel Newman:
You know, really interesting though that we evolve very quickly and now AI creates this different relationship where it can do a lot more of the work we do, and it can make us exponentially more productive, or it can actually kind of replace the productivity, and that relationship between humans and machines. We're still, to what you're saying, to some extent, beholden to people participating. Which, by the way, has been a problem with every technological shift in history, is like, do the people use the tools? Because anyone that's run companies like we do, and bought technology tools, and then been frustrated about why, it's like, the tech works. People and the relationship, it's complicated.
Anwar Ghuloum:
My thing is that it's always like, some people call it a time machine. I think of it as a force multiplier for your human employees. If they're not using it, you're not going to realize that increased productivity. I know for me personally, I've used it. It's uncomfortable at first to play around with these tools and so on. the increased productivity is amazing. I mean, I don't know how I'd live without it at this point.
Patrick Moorhead:
Yeah, no, I'm in that world too. It's funny, I went in one year with my employees saying, if you use AI for certain tasks, I will fire you. And then I moved into, I will fire you if you don't actually use these tools.
Anwar Ghuloum:
So I have a variation on that. Well, we had some candidates we were interviewing for jobs, and some of these interviews are remote, because they live in remote locations, they haven't relocated yet. And the question came up of, well, what do we do if we think they're using AI? And I said, well, what if we said it's OK to use AI? And not only is it OK to use AI, but we want to see how you're using AI as part of it. And I actually meant it, and that's what we're doing now. It's like, OK, you're going to use AI, great. You might be using AI at work. You probably will, so let's.
Daniel Newman:
I think there was a period where you realize there's no turning back. We're not going back to cotton gins, no printing presses. It's not going to be linear television in the future. And we're not going to have a world where AI is not driving productivity, whether it's on the factory floor, or whether it's in the data center, or whether it's out in autonomous vehicles. all these different things. But what I will say is physical AI and what you're talking about is a huge opportunity. For sure. We see it as a multi-trillion dollar, you know, TAM expansion that basically is just getting started. Yeah. And this was a great conversation, Anwar. Thanks so much for joining us.
Anwar Ghuloum:
Thank you. Thank you very much.
Patrick Moorhead:
Appreciate it.
Daniel Newman:
All right. And thanks everybody for joining this session here of our Lenovo Tech World program coverage. We've got a lot more expert conversations coming up here. Stick with us. We'll be back in a few.
AI Inference, Edge AI, Manufacturing AI, Industrial AI, Lenovo, Infrastructure
Other Categories
CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks
Alex Rose from Secureworks joins Shira Rubinoff on the Cybersphere to share his insights on the critical role of threat intelligence in modern cybersecurity efforts, underscoring the importance of proactive, intelligence-driven defense mechanisms.
QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella
Quantum is no longer a technology of the future; the quantum opportunity is here now. During this keynote conversation, Infleqtion CEO, Matt Kinsella will explore the latest quantum developments and how organizations can best leverage quantum to their advantage.

Accelerating Breakthrough Quantum Applications with Neutral Atoms
Our planet needs major breakthroughs for a more sustainable future and quantum computing promises to provide a path to new solutions in a variety of industry segments. This talk will explore what it takes for quantum computers to be able to solve these significant computational challenges, and will show that the timeline to addressing valuable applications may be sooner than previously thought.




