Infrastructure Innovations to Accelerate Your AI Solutions Today
What's the secret to truly unleashing AI's potential in the cloud? It all comes down to infrastructure.
At the Six Five Summit, host Daniel Newman is joined by Sachin Gupta, VP/GM, Infrastructure and Solutions Group at Google Cloud as a spotlight speaker for the Cloud Infrastructure track. They delve into guidance on building a secure, efficient, and scalable foundation in the cloud, as well as infrastructure innovations designed to propel AI solutions forward.
Key takeaways:
🔹 The Transition to A Cross-Cloud World: As AI changes the game, Google Cloud is removing barriers to scale by connecting data, users, and models—securely and efficiently—across hybrid and multicloud environments
🔹 Networking for AI: How Google's backbone delivers "LAN-like" WAN performance and seamless cross-cloud connectivity, addressing the challenges of latency, security, and management complexity across hybrid and multi-cloud setups.
🔹 Security At Scale: Google Cloud's strategies incorporate partner solutions (integrated with Palo Alto Networks, Fortinet, and more) for a comprehensive secure access service edge (SASE) approach.
🔹 Mastering Costs: Strategies for efficient infrastructure utilization, including AI-driven cost optimization tactics for AI workloads on GPU/TPU. Utilizing these resources fully through innovations like the GKE Inference Gateway, which delivers up to 60% lower latency and 30% lower cost.
🔹 Rethink Your AI Storage Strategy: Optimized storage plays a pivotal role in AI performance. Sachin highlights how Zonal Cache reduces read latency by up to 70 percent.
🔹 Security & Sovereignty: The importance of maintaining robust security and data control without compromising AI innovation, even in air-gapped environments. Google currently delivers sovereignty at scale with local partnerships in France (S3NS), Japan (KDDI), and the US (WWT).
Learn more at Google Cloud.
Watch the full video at Six Five Media, and be sure to subscribe to our YouTube channel, so you never miss an episode.
Or listen to the audio here:
Daniel Newman: Hi, everyone, and welcome to the Six Five Summit: AI Unleashed. For this cloud infrastructure spotlight, I'm joined by Sachin Gupta, Vice President and General Manager of Google Cloud's Infrastructure Solutions Group. We're going to be talking about infrastructure innovations to accelerate AI solutions. Sachin, thanks for being back. Good to see you again.
Sachin Gupta: Yeah. Great to see you again too.
Daniel Newman: So let's talk a little bit about cloud before we dive straight into AI. Tell me a little bit about what you're seeing as some of the most significant trends impacting cloud computing in the current market?
Sachin Gupta: Well, I think you'll get the same answer from everyone. It's all about AI. Generative AI, I mean, it is moving so quickly and we've talked about hundreds and hundreds of unique applications and how customers are already moving to agents and solving problems in many unique, innovative new ways, leveraging the power of Google technology, the Google AI stack. And so innovation at every single layer of that AI stack from infrastructure, AI platform all the way up to models and agent development capabilities is super important. That's what we hear about repeatedly from customers, help them out on that.
But then to get the best out of AI, you need to manage your data. And so how do you get your data into your models and then your users connected to those models, connecting all of these environments, bringing your data to the right places, doing it securely, efficiently, low latency, all super important. And then finally, all this stuff can get super expensive. So how do you manage costs? How do you make sure you're getting the most bang for your buck? There's one plus one equals more than two because of architectural advantages, etc. How do you just save on costs like if there's data you're not accessing frequently, move it to cold storage automatically. So leverage almost AI operational capabilities to reduce cost as well. So it's all about AI. It's all about how do I get my data to the right places and think about networking especially, and then how do I think about cost?
Daniel Newman: Yeah. I'm glad you didn't fall for my trick question. I came out the gate and said we're not going to talk about AI, but this event is AI Unleashed. But I think the thing that you really answered very nicely there is that the cloud and AI have become synonymous and symbiotic in many ways. And the other thing that I think is going on that's really... You called it everything AI, but it's also really about a very new stack and the cloud is powering that. If you kind of think about what it looked like before there was this CPU era of cloud and everything over the last few years... And by the way, this has been wonderful for Google, but as the cloud was reborn, I call it the CPU era and the GPU era.
Now, that's not exactly right because there's also the TPU era, but we've had these multiple eras and now the way you build applications is different, what you've built with Vertex and sort of democratizing. Now you've got new languages that allow us to build agents that can talk to each other. And all these things are really made possible by the three things that you mentioned, having a AI first approach, building your cloud to be hybrid multi, depending on your enterprise's architecture. And then of course, I like that you did... Most cloud providers would never acknowledge that cost is an issue because as far as you're concerned, run the meter, right? I mean, I joke, right? I digress. But the joke is obviously that more use is good, assuming it's productive, right? Assuming it's productive.
Sachin Gupta: Absolutely. You have to get efficient use out of that infrastructure, yes.
Daniel Newman: I mean, look, if we're spending to make, that's a great thing for business. It's the problem like you said though, is sometimes it gets very expensive very quickly for development efforts. Sometimes it gets very expensive just to run the business, and this is where cost management has to be thought a lot about. Another thing that I think has to be thought a lot about, Sachin, is security. Part of bringing all your data to the cloud, trying to deliver AI... And by the way, trying to do it as fast as companies like yours are doing it.
Enterprises in every industry trying to take advantage of cloud to run simple, monotonous day-to-day things and use AI also to build and expand and innovate their companies using AI. There's pace. With that, pace comes exposing data. It becomes exposing workflow, systems. Creates a lot of risk, especially from the network as you're just trying to move all that data around. Talk about how you're thinking about security from that networking context.
Sachin Gupta: Yeah. Let's talk about network just from a sheer connectivity aspect first, and we can touch on security after that because you raised some really important points. If you've got your on-prem data center, you're trying to use GPU, TPU resources in one or different cloud providers. Connecting this environment at high performance, low latency cost effectively is just increasingly complex. I think Google has taken a leadership in hybrid and multi-cloud where, for example, obviously, you can interconnect into Google Cloud, but we also enabled cross-cloud interconnect so that we will deliver the SLA for you, make it much easier for you to connect multiple other cloud providers directly into your Google Cloud environment so that you can move data across more cost-effectively and make it easier for you to run operations in this environment.
We've now added a few more capabilities. If you want to take your SD-WAN headend and run that in Google Cloud, you can do that. And what that means is the traffic stays on our backbone for as long as possible, giving you a much better experience. We're adding a new capability called cross-site interconnect. You can take two of your own sites, maybe two of your own data centers and connect them through Google Cloud. There's a new capability that we launched called Cloud WAN. Now, in reality, because your data can be in different locations and you need to bring it all together for training, for ML inferencing, you need to start thinking about that WAN that you had historically, the wide area network as a new LAN from a performance cost-effectiveness and a Latency point of view. We're not providing the Google backbone, the Google wide area network, if you will, to enterprise customers as their own with LAN-like performance. And so 40% better performance is what our customers are seeing with up to 40% lower total cost of ownership.
So completely changing how you think about connectivity, high performance and how you manage those costs in the world of AI. So let me just pause here before we talk a little bit maybe about security as well. So hopefully that makes sense on how we're making our backbone the only backbone that enterprises need to worry about for any of their connectivity needs.
Daniel Newman: Well, I think you hit an important point. Enterprises need to think about where to orchestrate different layers of this new stack. And I think if I'm hearing you, you're really talking about Google is really trying to address that orchestration to simplify network. Same thing you're trying to do with agents and other parts of the AI stack is it's never going to be all in one. I think Google has acknowledged that from day one. Ideally, most enterprises are going to have some on-prem, some edge, some cloud, maybe more than one cloud. They've got to network all these things. Then of course they have to be able to run these things, but you're not going to want orchestration with every one of these things. So it sounds like you're trying to address that. Now, carry me over to the security side of this. How does this impact security?
Sachin Gupta: Look, so we obviously moved compute to the cloud, storage to the cloud. I'm always saying, "Your entire network is actually delivered to the cloud." But if we did that and said, "Hey, now the only security stack you can deploy is the one we offer, and your best of breed that you stack that your security teams might have certified on, you can't deploy anymore. Well, that creates friction. You now have to change things around... I mean, I don't want to change my security stack necessarily as an enterprise customer. And so we offer two things in conjunction with Cloud WAN. One is we offer the best of breed, security capabilities ourselves. So for example, our anti-DDoS product with Cloud Armor, we've stopped the largest attacks in the world with that product. It's just built in into Cloud WAN. The second one I'll point out is our Cloud Next Generation Firewall. It has up to 20 times the efficacy of competitive solutions.
Now, those are our products. If you are using SASE services from Palo Alto, CheckPoint or Fortinet, and you want to use our backbone, but you want to use those security services. We natively integrate those services into our backbone. And so with Google, you can bring your third party services of choice and seamlessly get them delivered virtually through our cloud instead of being forced to say, "No, no, no. The only things you can use are the first party services we provide." Now, of course, as I said, our first party services are also industry-leading.
Daniel Newman: Yeah. And choice is another theme here, right, Sachin? Let's talk a little bit about efficiency. We've hit on all the rate limiters, but in the era you really have your compute network and power, are the three things that are... But in the end, you need enough compute, enough network access and enough power, but then you want to get as efficient as possible. You want to put the right workload on the right silicon to try to optimize this is cost impact, this is energy impact, all these things. So when it comes to optimizing for AI workloads, you have TPU, you have GPU, and you have silicon diversity. You have different GPUs. And then of course you have TPU. How is Google thinking about doing this to maximize utilization but optimize for the user?
Sachin Gupta: Yeah, that's a great point. With these expensive GPUs and TPUs, if they're sitting idle, that's a huge problem. And so making sure that you're not just load balancing in the traditional ways. You're sending flows for web traffic that we typically understand, for example. Instead, you're actually understanding parameters and metrics that come from those GPUs and TPUs. For example, the queue depth, the KV cache, the number of tokens that are being processed, for example, because you may find that there's a new query that comes in that requires lower latency.
Hey, what's the best GPU that can service that at this time? How do you make sure that you're not going to put that into something with a very, very large queue? And so making sure that you understand what are your different types of ML queries or Gen AI apps, what are their needs? And then also understanding how is the infrastructure actually performing with a rich set of metrics like queue depth, like KV cache? That's something we pull together in something we call GKE inference gateway.
GKE inference gateway takes all of that, lets you set up the policy and then therefore get much, much better efficient utilization of the infrastructure, which saves cost. So up to 30% savings and cost, for example, but at the same time gets you much, much better performance in terms of lower latency, faster response times. And so we have customers like Snap for example, already using our inference gateway to improve the performance and to improve the utilization of the infrastructure.
Daniel Newman: So I have to pivot this, Sachin, because there was a lot there, but I want to pivot this to data and storage. AI is, first of all, it's unleashing applications that are touching so much more data on an average use, right? Of course, you still have your file and your block and your typical storage, but that's kind of the past. We still need it. It's not going to go away. But in the future, it's going to be all about compute, being able to get to all the data, very low latency, structured, unstructured. It's going to be able to discern, utilize, access. Talk about the kind of forgotten piece of storage to the AI puzzle and what is going to be critical for being successful in the AI era as it relates to storage? How do you see it over at Google?
Sachin Gupta: Yeah. I'm actually really glad you brought this up because I think the amount of innovation happening here is tremendous to support these specific use cases. So you said it really, really well actually. Those compute resources can operate faster and faster and faster. Every six months, every year there's a new TPU version, for example. It just goes faster and it can rewrite data really fast. But if your storage becomes your bottleneck, again, that expensive GPU TPU infrastructure is sitting idle, your training jobs take longer, your inferencing is slower, it's not a great experience. It's not cost-effective.
So there's two things that we do. Again, once again, staying true to choice and making sure that there's open choices that are available to our customers. One is if they're used to using a parallel fast system like Lustre on-prem, how do we make sure that we provide the same experience in Google Cloud? We're bringing out a managed Lustre service. Great performance, great scale cost-effective for AI use cases, also for high-performance computing types of use cases. So that's an open general, compatible, semantically compatible solution. Many of our customers though, love Google Cloud storage, which is our optic storage. Has massive scale, very, very cost-effective. Very easy to use. Well, with Google Cloud storage, there's two enhancements that they've been asking for. We've been working with customers on to innovate. One is when I'm trying to read data, can you automatically cache the data that many of my GPUs or TPUs need to read locally?
And we're able to, by the way, automatically do that caching one petabyte, so large amounts of cache, reduce the latency, 70% or so, so that I don't have to keep waiting. I can read very, very quickly because it's cached locally. That's something we call Anywhere Cache. Then we have another product as part of Google Cloud storage called Rapid Storage. And Rapid Storage says, "I'm going to actually going to take the storage you have and put it in a zonal bucket that is right next to your compute. So your GPUs and TPUs. And now you can get sub-millisecond reads, writes and appends, all kinds of great things. So providing both open choices like managed Lustre as well as innovation in Google Cloud storage specifically for AI. That has just tremendous benefits. That's why we have customers like Anthropic already using this to get 2.5 terabytes of throughput, for example, using our automated caching solution, Anywhere Cache with their AI models.
Daniel Newman: I mean, that itself is... Of course, I'm guessing Google is probably using it for some of its own models, but not just your own. It's one thing to be customer zero. It's another when a company of doing what they're doing with the types of model size parameters that are seeing what you're doing as the right partner to access the data storage to make their experiences good. Because I mean, in the end, you start using these tools and it's slow. I mean, these are going to be the small differentiations between one and another that makes us decide whether we're going to pay open AI $20 a month or Anthropic $20 a month, or Google... Of course, we want it to be accurate. We want it to not hallucinate. We want it to have really good insights, but we also want it to be fast. I mean, that's a big part of what people are looking for.
I want to end this conversation, Sachin, by talking a little bit about the complex political landscape and sovereignty. We've heard from a number of companies that that's going to be a big opportunity. We know that data stays... The EU, for instance, has been very well known, very complex data rules. In this climate, what are you hearing from your customers as it relates to AI, sovereign AI, data sovereignty? And how is Google addressing this to make sure that you meet this opportunity?
Sachin Gupta: Yes. And I think it goes back to choice and having the right set of controls. And so what I mean by choice is having a complete portfolio of sovereignty solutions so that customers based on their environment, their regulatory needs, their compliance needs can pick the right solution. And the solution typically involves different types of controls. First set of controls are around data. So we have something called Google Cloud Data Boundary, which ensures that you can keep data in your region of choice, and you control your encryption keys and you also control who has access to that data. It's amazing. It's available across our regions because we see that need in many, many, many different countries. We also have specialized offers where you can have operational controls, and we have... For the most stringent, if you will, like if you are a defense intelligence, public sector where a type of agency and you simply cannot have your cloud environment connected to anything else. You need something fully air-gapped.
We would offer that with Google Distributed Cloud. And so ensuring we've got the full portfolio solutions is super important for us. And as we work with customers, we help them try to understand, look, where exactly do you fit on the spectrum? Where do you land? And we have the right sovereign solution for you. One other point I'll just quickly make on that, I'm sorry. I just remember some of these customers, they don't want to trade off innovation for sovereignty. So it's not about, "Oh, I get sovereignty, but now you can't do AI." And so it's also been super important for us to make sure that with those controls, you still get access to things like Vertex. You get access to Gemini. You get access to things like Agentspace search. And so we support those on both Google distributed cloud, which can be all the way fully air-gapped as well as our public cloud regions. And so it's not about a compromise between AI or innovation and sovereignty. You should be able to do both as a customer. And that's exactly what we enable.
Daniel Newman: And for analysts like us to kind of give the seal of approval, it's going to be important that the experience is deprecated as little as possible in terms of achieving sovereignty, but gaining access to all the tools that makes AI powerful. And I absolutely believe that is what's going to happen. And I do believe sovereign AI as a large growth opportunity for the industry, and it looks like from what you're saying, Google is ready to meet that opportunity. Sachin, I want to thank you so much for joining us for this cloud infrastructure spotlight at the Six Five Summit.
Sachin Gupta: Thank you so much. It was a great chat.
Daniel Newman: It was great to talk to you. All right, everybody. Let's stay connected. All of our Six Five Summit content is available at sixfivemedia.com/summit. More insights coming up next.
Disclaimer: The Six Five Summit is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.
Speaker
Sachin Gupta is the Vice President and General Manager of Infrastructure and Solutions Group (ISG) at Google Cloud. Having joined Google in 2020, Sachin brings together Product and Engineering for Storage, Networking, Google Distributed Cloud and Telco, and Product teams for Reliability and Global Expansion.
Through a deep understanding of enterprise and cloud-first customers, Sachin has a 20+ year track record of building solid teams and creating innovative products and services. Before joining Google, Sachin worked at Cisco for 23 years, where he served as SVP for Enterprise Networking. Within his teams, Sachin has been a long-time champion of customer focus and an inclusive team culture. He holds a Bachelor of Science in Electrical Engineering from Purdue, a Master of Science in Electrical Engineering from Stanford University, and an MBA from Wharton.


