Home

Scaling AI at Inference: The Road to Agent-Driven ROI

Scaling AI at Inference: The Road to Agent-Driven ROI

Roman Chernin joins Patrick Moorhead and Daniel Newman to discuss how AI infrastructure is shifting from training to inference, why Nebius built Token Factory to optimize system-level performance, and how agent-driven ROI will define AI success in 2026 and beyond.

AI has moved beyond model training, inference is the new frontier.

This Six Five Webcast features Patrick Moorhead and Daniel Newman, joined by Roman Chernin, Co-founder & Chief Business Officer at Nebius, to explore how AI infrastructure is evolving from massive training clusters to production-grade inference systems built for agents, open-source models, and real ROI.

Nebius positions itself as an AI-specialized cloud, purpose-built to optimize inference workloads at scale. As AI shifts from research labs to product companies and enterprise agents, performance, cost efficiency, and system-level orchestration have become the defining battleground.

Key Takeaways:

🔹 The shift from training to inference: Why budgets, architectures, and customer priorities are changing.

🔹 The Nebius Token Factory: How full-stack optimization across hardware, software, and orchestration improves unit economics.

🔹 Open-source in the enterprise: Why flexibility, tunability, and cost control matter as much as frontier intelligence.

🔹 Agent-driven ROI: Why 2026 will demand measurable business outcomes, not just model benchmarks.

🔹 Performance beyond GPUs: How CPUs, workload orchestration, caching, quantization, and stack optimization tie in to define success.

Nebius combines next-generation silicon access with a purpose-built cloud stack and white-glove technical support to help customers ship AI products that are fast, affordable, and compliant at scale.

The next phase of AI won’t be defined by a model, it will be defined by who can run inference most efficiently.

To learn more about how Nebius is scaling AI for real-world inference and agent-driven ROI, read about it here and explore the full solution: HERE

Watch the full webcast at sixfivemedia.com or subscribe to our YouTube channel so you never miss an episode.

Disclaimer: Six Five Media is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

Transcript

MORE VIDEOS

Resolution as Architecture: Engineering Autonomous Service Systems That Actually Scale

Faster AI responses did not solve the enterprise service problem. The shift now is to resolution as the organizing principle for service platform design. In this Six Five On The Road conversation at Zendesk Relate 2026, Shashi Upadhyay, President of Product, Engineering, and AI at Zendesk, joins Melody Brue and Keith Kirkpatrick to examine what autonomous service systems require at the architecture level, why fragmented tooling prevents consistent AI outcomes, how specialized agents outperform generalist models in production environments, and what will define the next generation of CX platforms as AI becomes embedded into core operational workflows.

Customer Zero at Scale: How Accenture Is Building the Autonomous IT Function on ServiceNow

Accenture operates ServiceNow across 1,900 business services and 800,000 employees as Customer Zero, running AI capabilities in production before advising clients to do the same. CIO Tony Leraris and Global IT Delivery and Capability Director Monika Patel-Mistry break down the live Autonomous Specialist pilot, the AI Control Tower governance architecture, and why secure-by-design is an architectural commitment that has to be made before the first agent goes live, not after the first problem surfaces.

From AI Ambition to AI Outcomes: Building the Infrastructure Foundation for Enterprise AI

The bottleneck slowing enterprise AI is not the model or the compute. It is the memory and storage architecture feeding the compute. In this Six Five On The Road conversation at Dell Technologies World 2026, Alan Walker of Samsung Semiconductor and Ben Burgess of Dell Technologies join Matt Kimball to examine stranded GPU economics, co-engineered infrastructure, and what the shift to agentic AI demands from an enterprise stack that was built for a different operating model.

See more

Other Categories

CYBERSECURITY

QUANTUM