Home

AI Inferencing Everywhere: Scaling Enterprise AI from Core to Edge

AI Inferencing Everywhere: Scaling Enterprise AI from Core to Edge

As AI moves into production, enterprises must solve for distributed execution across core and edge environments. This conversation explores how infrastructure is evolving to support scalable, real-time AI inferencing.

AI is moving out of the lab and into real-world environments, and the challenge is no longer building models, it’s running them everywhere.

At NVIDIA GTC, hosts Patrick Moorhead and Daniel Newman sit down with Vlad Rozanovich of Lenovo and Jon Alexander of Akamai to explore how distributed infrastructure is enabling enterprise AI inferencing from the core to the edge.

The conversation unpacks how enterprises are shifting from centralized AI architectures to highly distributed environments where performance, consistency, and security must hold across locations. Through Lenovo’s collaboration with Akamai, AI workloads are being deployed on infrastructure that spans data centers to edge locations, redefining how organizations think about cloud, latency, and execution. As new performance metrics like time-to-first-token gain importance, the discussion highlights how infrastructure decisions directly impact real-world AI outcomes.

Key Takeaways

🔹 AI deployment is shifting from centralized models to distributed inferencing environments
🔹 Time-to-first-token is emerging as a critical performance metric for AI workloads
🔹 Unified infrastructure is key to maintaining consistency across core and edge
🔹 Lenovo and Akamai are redefining what “cloud” means in the AI era
🔹 Edge AI is enabling faster, more secure, real-time decision-making

As AI scales, the ability to execute models consistently across distributed environments will define enterprise success.

Watch the full conversation at sixfivemedia.com and subscribe to our Youtube channel for more insights from NVIDIA GTC 2026.


Listen to the audio here:

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript

MORE VIDEOS

AI Gigafactories: From Design to First Token at Scale

As AI moves from design to deployment, infrastructure constraints are becoming the primary bottleneck. Lenovo and IREN explore how gigafactory-scale systems, vertical integration, and time-to-first-token metrics are redefining AI at scale.

Rethinking AI Infrastructure: Why Memory Now Drives Performance

As AI workloads scale, memory is emerging as the defining constraint on performance. This conversation explores how innovations in HBM and system integration are reshaping the future of AI infrastructure.

The Inference Inflection: MiTAC on Building Flexible AI Infrastructure for Enterprise Scale

As AI moves into production, infrastructure flexibility, orchestration, and data performance are becoming critical. MiTAC outlines how modular platforms and integrated partnerships are enabling scalable, high-performance AI deployments.

See more

Other Categories

CYBERSECURITY

QUANTUM