The Inference Inflection: MiTAC on Building Flexible AI Infrastructure for Enterprise Scale

Home

As AI moves into production, infrastructure flexibility, orchestration, and data performance are becoming critical. MiTAC outlines how modular platforms and integrated partnerships are enabling scalable, high-performance AI deployments.

AI infrastructure is shifting from experimentation to production, and that shift is redefining how enterprise systems are built.

At NVIDIA GTC 2026 in San Jose, Brendan Burke sits down with Raymond Huang, GM and VP of Sales & Business Development at MiTAC, to explore what it takes to support AI at scale across training, inference, and emerging RAG workloads.

As organizations move into production, infrastructure must remain flexible by design. Modular platforms like NVIDIA MGX are enabling standardized yet adaptable deployments, while orchestration and data movement are becoming critical to performance. AI systems are no longer just compute-bound, they’re constrained by how efficiently data moves and how effectively workloads are managed across complex environments.

Through partnerships with Rafay and DDN, MiTAC is integrating orchestration and high-performance data pipelines directly into its infrastructure stack, helping enterprises simplify deployment and maximize system utilization.

Key Takeaways

🔹 AI is reaching an inference inflection, shifting focus from experimentation to production
🔹 Flexible, modular infrastructure is required to support diverse AI workloads
🔹 Standardized platforms like NVIDIA MGX enable scalable, configurable deployments
🔹 Orchestration is critical for managing complex GPU and AI workloads at scale
🔹 Data movement and high-performance storage are central to AI system performance
🔹 Turnkey solutions are accelerating enterprise AI adoption and deployment speed

For a closer look at how flexible infrastructure is enabling enterprise AI visit mitaccomputing.com

Subscribe to our Youtube channel for more insights from NVIDIA GTC 2026.

Listen to the audio here:

Disclaimer: Six Five On The Road is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.

‍

Transcript

Raymond Huang:
It also underscores MITEC's leadership in terms of designing the AI hardware infrastructure to cover diverse workload from inference, training, and rack deployment. Fundamentally, we are providing so-called enterprise AI, flexible but designed, to cover these kinds of needs.

Brendan Burke:

Hi, and welcome to Six Five in the booth at NVIDIA GTC. Today, we're here at GTC in San Jose, and we're exploring how to build scalable AI infrastructure. I'm joined by Raymond Huang, GM and VP of Sales and Business Development at MyTech Computing. Welcome to Six Five, Raymond.

Raymond Huang:

Yeah, thanks for having me.

Brendan Burke:

Raymond, it's been a revolutionary week so far. We're hearing about how AI is moving from experimentation to production. Jensen calls it the inference inflection. And from my perspective, everyone in the ecosystem's job got harder this week. It is. What do you think the move to production means for server design and the platform for the next wave of AI?

Raymond Huang:

Yeah, I think GDC represent the pinnacle of the AI innovation globally, and definitely it also underscore my take on leadership in terms of designing the AI hardware infrastructure to cover diverse workload from inference, training, and rack deployment. Fundamentally, we are providing so-called enterprise AI, flexible but designed to cover this kind of needs.

Brendan Burke:

Flexibility seems so critical because we saw how a single GPU rack can split out into seven different chips, five different racks. There's a lot of flexibility there. So from your perspective, why is flexibility so important as organizations move from AI pilots to production?

Raymond Huang:

Indeed, that's a very good question. So by providing a flexible design, meaning a platform customer can use as a standardized across So, for example, we designed an MGX4U. It's a dual socket AMD Venice processor supporting up to eight double-witched GPU. And it's based on also the other one, 1U, single design, R1917, which is the most, the latest, you know, 1U-MGX platform equipped with the ARM architecture. So by providing this kind of design, providing customer option with the standardized platform supporting their HPE for AI inference or training.

Brendan Burke:

That CPU attached seems really well aligned with what we've heard this week and then also leveraging the latest GPUs. And the operational challenge of connecting these systems together is really about managing AI workloads, routing the right workloads to the right systems to get maximum utilization out of the systems. And so how are you thinking about orchestrating these workloads when it comes to networking? And are there any partnerships that you use to distribute workloads to get maximum efficiency out of your systems?

Raymond Huang:

Yeah, so for example, we partner with Buffet. We actually partner with them, integrate advanced software stack to, or seamless GPU management. So this kind of solution enable customer to streamline their Kubernetes orchestration and also automate HPC and AI workload. This kind of, given that some, you know, enterprise-grade governance, so-called AI turnkey solution.

Brendan Burke:

Sounds really well-aligned. Kubernetes is used for scheduling workloads, I think, as the leading orchestrator these days, and it's critical to be able to align agentic workloads with emerging use cases like agents and reg. And I was stopping by one of your partners' booths earlier today, DDN, and looking at a product called Infinia that you've built with them. And it really underscored to me what we're seeing at this show with the launch of NVIDIA's STX platform for high-speed storage, that being able to scale agents and retrieval workloads really relates to your access to storage, and then both being able to separate prefill from decode, ingesting the data, and then running it quickly. And so, how is a solution like the one you have with DDN, called Infiniia, helping enterprises build the high-performance data pipelines needed for modern workloads?

Raymond Huang:

Definitely. We are partnered with DDN. As you know, multi-model rack deployment requires massive data. So that's the reason we integrate in our DDN software. Infineon onto our platform to provide a so-called next-generation AI turnkey solution for enterprise. Through DDN's Infineon, which is HPC high-performance AI-intelligent platform, this kind of help our customer to streamline Kubernetes orchestration and help them to minimize the data movement. This kind of solution is packaged hardware, software as a turnkey. That's the idea that we're providing to the customer. through this partnership.

Brendan Burke:

I saw some really impressive performance gains in that system, seeing a 20 plus times speed ups over conventional storage in terms of being able to ingest data. Are you seeing these types of generational gains as we enhance our access to flash storage and then get access to tightly integrated memory with high performance GPUs?

Raymond Huang:

Yes, definitely. So, through our partnership for this solution, we actually kind of provide a streamlined process for the storage, right? Also, high performance storage. So, we are leveraging, for example, Solidine P7 S1010, as well as the Micron 9550, equipped with the E3 PCIe Gen5 form factor, plus eight of the 400 gig LAN port, through NVIDIA ConnectX adapter as a package for a 10-key solution together with BDN's Infineon solution.

Brendan Burke:

That sounds really well aligned with how PCIe is enabling fast data movement in data centers today and being able to open up to the new storage racks coming online and the ability to access storage with agentic retrieval. So then, as we look ahead to build out AI factories that are able to shift workloads from training to inference, I wonder if you're seeing a change in how data centers are constructed and how they're managed. Just looking this week at the overall shift in superclusters from training to inference. It's clear that you need the ability to take an existing cluster and then reconfigure it and add additional, in some cases, excess storage to be able to go from a training workload to an inference workload. And so, given that pace of change and that there's not necessarily a way one endpoint in mind for the AI factory. How do you think we can set up a platform on top of this hardware to be able to adapt to changing AI?

Raymond Huang:

So I think overall there's no one-size-fits-all, but bottom line everyone wants simplicity and quicker deployment. That's the the trending, so what we are trying to do, so-called deliver AI turnkey solution, whether through Rafay, or also the DDN, the bottom line idea is give customers be able to mix and match configuration, streamline the orchestration of Kubernetes, and also be able still maintaining the throughput, low latency, while their workload is still on. So this is something we are trying to achieve, accomplish, and I believe we, Through the partnership that we have together with Rafael and DBM, we are providing so-called the most responsive rack performance for enterprise market today.

Brendan Burke: Fantastic. You've got to ask, is there an announcement this week that you're particularly excited about? Something you think is really going to benefit customers as they work through problems that you've seen in the field?

Raymond Huang: I think fundamentally, Key message is modular design, NVIDIA MGX platform, where there's 1U or 4U form factor that give customer the flexibility. Like I say, we are trying to provide a modular design, MGX platform, that kind of give customer one standardized product that be able to deploy and cover different workload at efficiency level. And of course, through the partnership with the DDN or Raphate, that kind of provide AI turnkey solution, leverage the modular MGX AI platform, as a turnkey.

Brendan Burke:

It's clear you're setting up a system that's more than just a GPU. It's something that can benefit from the latest innovations in storage and networking, which I think are going to continue to evolve quickly as we need more and more data to run longer and longer agentic sessions. And so building that in from the start I think is well aligned with the future of the AI roadmap. And so I'm excited to see how organizations are able to use a turnkey solution to speed up the time to value for new data centers. I really appreciate you joining me today, Raymond, and sharing how you're working on this fast-moving trend. Yeah, always. Thanks for tuning in to 6.5 Media in the booth at NVIDIA GTC. Don't forget to hit subscribe, follow us on socials, and check out our other material at SixFiveMedia.com. Thanks for joining, and have a great day.

‍

CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks

Alex Rose from Secureworks joins Shira Rubinoff on the Cybersphere to share his insights on the critical role of threat intelligence in modern cybersecurity efforts, underscoring the importance of proactive, intelligence-driven defense mechanisms.

HP Launches World’s First Business PCs to Protect Against Quantum Hacks - The Six Five On the Road

On this episode of the Six Five - On the Road, hosts Patrick Moorhead and Daniel Newman are joined by HP's Ian Pratt, Global Head of Security for Personal Systems.

What is Autonomous Endpoint Management?

Autonomous Endpoint Management is a framework designed to unify IT operations and security teams on a single platform through real-time control and visibility.

QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella

Quantum is no longer a technology of the future; the quantum opportunity is here now. During this keynote conversation, Infleqtion CEO, Matt Kinsella will explore the latest quantum developments and how organizations can best leverage quantum to their advantage.

Accelerating Breakthrough Quantum Applications with Neutral Atoms

Our planet needs major breakthroughs for a more sustainable future and quantum computing promises to provide a path to new solutions in a variety of industry segments. This talk will explore what it takes for quantum computers to be able to solve these significant computational challenges, and will show that the timeline to addressing valuable applications may be sooner than previously thought.

The Inference Inflection: MiTAC on Building Flexible AI Infrastructure for Enterprise Scale

MORE VIDEOS

IBM's Channel Chief on AI Maturity, Ecosystem Strategy, and Building Kareem.ai

Transforming Retail SMBs with Practical AI, Security, and Scale

The Most Consequential Week in AI Infrastructure History | Ep. 303

CYBERSECURITY

Threat Intelligence: Insights on Cybersecurity from Secureworks

HP Launches World’s First Business PCs to Protect Against Quantum Hacks - The Six Five On the Road

What is Autonomous Endpoint Management?

QUANTUM

Quantum in Action: Insights and Applications with Matt Kinsella

Accelerating Breakthrough Quantum Applications with Neutral Atoms