How Qlik is Enabling Modern AI Initiatives
How are enterprises leveraging open lakehouse architectures to create trusted, cost-effective data foundations for AI?
For Data As Infrastructure: Decoded, host Bradley Shimmin is joined by Qlik’s Sharad Kumar, Field CTO – Data, to discuss how the Qlik Open Lakehouse on Apache Iceberg is becoming a critical enabler for scaling cost-efficient modern AI initiatives.
Key Takeaways Include:
🔹Unified Data Access & Integration: Qlik Open Lakehouse enables organizations to maintain a “single source of truth” for data while allowing them to leverage existing investments in platforms like Snowflake or Databricks, minimizing data movement and duplication.
🔹Accelerating AI Outcomes: The unified motion of real-time data ingestion, transformation, and automated optimization through Apache Iceberg provides a streamlined path from raw data to actionable AI outcomes, reducing time-to-value for machine learning initiatives.
🔹Data Trust, Quality & Lineage: Qlik’s approach ensures data quality and lineage within Apache Iceberg-based data warehouses, supporting robust governance and making AI models both trusted and auditable.
🔹Cost Optimization: Qlik Open Lakehouse delivers significant cost efficiencies, up to 50-80% in some areas, by reducing unnecessary data duplication and movement, with CIO/CFO-relevant financial metrics emphasizing operational savings.
🔹Transformative Power of Iceberg: The open format, ACID compliance, and flexibility of Apache Iceberg are positioned as essential enablers for organizations looking to build and scale governed AI solutions with confidence and agility.
Learn more at Qlik.
Watch the full video at sixfivemedia.com, and be sure to subscribe to our YouTube channel, so you never miss an episode.
Disclaimer: The Six Five Media Decode Summit is for information and entertainment purposes only. Over the course of this webcast, we may talk about companies that are publicly traded, and we may even reference that fact and their equity share price, but please do not take anything that we say as a recommendation about what you should do with your investment dollars. We are not investment advisors, and we ask that you do not treat us as such.
Transcript:
Brad Shimmin: I'm Brad Shimmin, an analyst with The Futurum Group and I'm here with Six Five Decode Summit. And today we're going to talk about the enterprise rushing to operationalize AI and the pressures that the enterprise comes under in doing so spiritually, especially in trying to create what we would say is a trusted, cost effective data foundation that can support both the innovation and governance that you need to do AI at scale. And to do that. It is my pleasure to welcome today to the Decode Summit my partner Sharad Kumar, who is the Field CTO for Data at Qlik. Sharad, welcome.
Sharad Kumar: Hi Brad, how are you doing?
Brad Shimmin: I'm doing really well and we're going to talk about the open Lakehouse on Apache Iceberg. So I love this idea. I've long been a fan of decoupling storage and compute and bringing your own query engine and sort of turning the storage layer into a more open, accessible layer as opposed to a wall. So I'm excited to jump in here. Would you mind if I started this out by just talking a little bit about Click's Lake House, the open Lake House on Iceberg. And I noticed that in looking through it, you guys emphasize one copy of Truth while still allowing customers to leverage their existing investments in platforms like Snowflake and Databricks and many others, all without, and this is critical, unnecessarily moving or duplicating that data, that's a big promise. Can you tell us a little bit more about what, what this one copy of Truth means to you?
Sharad Kumar: Yeah, so Brad, that's a good question. So let's kind of unpack a little bit. So over the past few years, if you look at how customers have been building these cloud based data warehouses, you essentially take your data and you load it into the warehouse. Let's say whether you're Snowflake or a data breaks or a big query, you take your data, you load it into the warehouse and like you said, the compute and the storage in those warehouses are coupled together, right? So coupled together meaning once you load the data, you have to use the compute within the warehouse to operate on it. Now I talk to a lot of customers and one very few customers just have one data platform and you would probably say, so some of the larger customers I talk to, they may have a Snowflake, they may have a databricks, they may have a snowflake, they also have some data on S3, they may have a redshift. Right. So a lot of time the multiple data stores, it's very rarely you say all data is in one place. Now also what happens is a lot of customers, again, some may say, okay, for data warehousing workloads, I want to use Snowflake because that's the best engine for that, for low latency, highly concurrent queries. But some customers might say my data science organization, a machine learning organization, wants to use databricks. Right? So what the custom or some other tools, what customers end up doing there is either they have to ingest data, dual put some data into take into Snowflake because I need for my BI workloads, take the same data into Snowflake or sorry, into databricks because I have my machine learning workloads. So that's one dual pipeline duplication of data. Or some customers do if they take the data into Snowflake, then they have to unload the data out of their warehouse platform to take it and make it available for other compute engines. So that's a big challenge. A lot of duplication of data, a lot of compute power needed to load data, move data, move data around. And unlike the word used, no single version of truth. So that's the promise of a kind of iceberg, which we are enabling through open lakehouses. You bring the data in once and you keep that copy of the data in an iceberg format on an object store. Right. And you register it into an iceberg catalog. So if you're talking about Amazon, we would keep the data on S3, iceberg data on S3. And let's say if a customer is using glue as a catalog, you register the iceberg data there. Now different engines can get to it. So if I'm in Snowflake, I still, because that's the best engine for my, let's say, BI workloads, I can still use my snowflake engine against that copy of the data to further process the data or serve it up. I want to run machine learning. I can use a databricks engine against that same copy of data to build my machine learning model. Or if I have some ad hoc queries, I could use an Athena or I could use an open source Trino or a Dremel engine against the same copy of the data. So that's what we mean by one copy of the data, eliminating data duplication and one version of truth.
Brad Shimmin: Yeah, I like that. And I think it kind of does something very important that is subtle. I mean, I grew up with databases starting in the 90s and the, you know, truism we all had in mind was, well, Most companies have seven databases and that was back in 1990. It's much, it's much worse now. And, and yet, you know, the pressure on companies is to get as much value as they can as quickly as they can. You know, getting that, speeding up that path from data to outcome, particularly with agentic AI right now, which everyone is, is selling. So I think I noticed in looking through your materials and from what we've talked about in the past that Qlik emphasizes this idea of a unified motion within its open lakehouse. And I wanted to dig into that if you don't mind really quick. And it covers a lot of ground, like everything from real time ingestion to optimized or sorry, automated optimization. Can you explain a little bit about this streamlined approach that you guys have and how you think that impacts time to value? Because it is always about time to value, isn't it?
Sharad Kumar: Yeah, you're absolutely right. Like everybody's like, how do you get from data to insights to actions quickly, right? So it all starts with the first acquisition of data. So if you look within Qlik, the first thing we have is a set of connectors which can I would say tap data from anywhere. Whether you're talking about data coming from databases, sitting in files, streams, whatever we can tap. It's first thing we have enabled with open Lakehouse is be able to get into mainframes, your SAP environment, relational databases and through chain data capture be able to get the data out and put it into an iceberg format. Now it's not just trivial because it's not just you create a file because as you know, when you do change data capture, you have all these changes coming in, you have inserts, update, delete, so handling of that in the most optimal way at the lowest compute cost and create that iceberg data copy which is suitable for change data capture type of loads. Right? Because what in chain data capture you're doing very frequently data is coming in, so you constantly have to write, write, write, create new copies now. So that's the first, right, to be able to handle that thing. And very soon we'll be announcing streaming ingestion. So you can take data from streams like Kafka or Kinesis and be able to take that data also and add it into your iceberg lake, right, Lake house. So that's the first thing you constantly hydrate and have your iceberg data latest and the greatest because like you mentioned, power Bi and more. So for AgentIQ you need the freshest data to act on. So that's first. Now the second component of that is you have the data. But in the iceberg, multiple things happen. One, it's immutable, right? So each time new data comes in, you're writing new files. And especially when you think of changing data capture, your changes are coming in fast and furious, right? You're no longer doing a batch once a day, once a week, you could be doing every five minutes, data could be coming in and you're writing new data all the time. So what happens in the environment? Your number of files continues to grow. Right. So you have to do certain things. Almost like I call it pruning of the environment. Optimization of the environment. We call optimization in the background multiple things run. So we may do compaction, right? So you have all these small, small files. Now what happens, think of it, when you have a reader reading the file, you know, this comes in from that world. Each time you open a file from the system, you read a lot of writes and a lot of I/O which ultimately leads to degraded query performance. So what we do is we do this thing called cost based compaction. That means we have an intelligent engine which runs in the background constantly looking at say what's the best compaction I can do?
Brad Shimmin: Right.
Sharad Kumar: So what that does is it reduces your storage footprint as well as improves your query cost. So that's one. So there are multiple other types of optimization built into the engine. Dynamic partitioning. The snapshots become old because you're keeping history each snapshot over time you have to clean that up. There's something called an orphan file sitting there, but not really referenced anywhere. So a lot of constant cleanups. That's what the engine does. So one part of the engine is constantly bringing in this data and managing the changes to create this iceberg data. Another part of the engine is constantly optimizing it to give you the best, lowest, smallest storage footprint and the best query performance.
Brad Shimmin: Yeah, it's like the idea or the ideal that we have with query plan optimization applied to the underlying infrastructure. The strategy.
Sharad Kumar: Exactly. Yeah.
Brad Shimmin: So, you know, we can't have this conversation without bringing up governance, can we? And it is such a critical aspect for companies right now, and especially with AI and because it does depend upon high quality data. And as we have found in our research, the number one concern amongst enterprise practitioners is how do I get qualified quality data. So you know, how does Qlik address the crucial elements of data quality in particular? And I like, you know, ideas like lineage. And so when I'm thinking about, you know, an iceberg based lake house what does that mean? And specifically, you know, how does Qlik ensure that the data that's, that's powering governance and AI models is consistently trusted and auditable? Most important.
Sharad Kumar: Yeah, so I think that's a great question because everything is about the governed data and we call it more than quality, we use this word trust. How do you have trust in the data? Because I think you mentioned certain things. Trust has multiple elements to it, right? Trust has an okay, timeliness element to it. Trust has a quality element to it. Trust has lineage. Where did it come from? How did it evolve? How did it change? Who changed it? That whole part of it. So if you look at. So let's unpack this a little bit. So one, I think iceberg provides you first that trust layer because multiple things it provides like the asset properties that it has, right. You come from a database world, so you understand that unless you have asset property as well, the data is not trustworthy. Right. You need to have that transaction integrity, that isolation of readers and writers and all those things that come with assets. It's fundamental to. And that's why people gravitate toward databases. Right? I mean that's. But they are trustworthy. But in a lake, a typical data lake, that was the area like it was swampy, it was non governed. But first it was very swampy in the lake house environment. Icebergs provide that first layer where you have asset properties. You have snapshots each time you write a new snapshot of the data and it's totally auditable. So you can go back anytime, you can look at the data. Then you can look at the lineage of it, how it evolved, that iceberg data over time, who evolved. So the first thing is almost like thinking of the infrastructure layer of an iceberg. You have trust inherently built in with these mechanisms like acid compliance and snapshot and time travel and things like that. Now comes in at the next layer to the data layer. Now you've built an iceberg as tables. Right? Now multiple things come into play. Now those tables are just like any other table reflected in a glue catalog in a snowflake catalog. Now in Qlik and the cloud environment we have quality and validation rules and checks you can apply on the data. Just like because ultimately it's just a table, right? It's a representation of a table in a catalog. So you can build quality rules, validation rules which are business specific, right? So that's one second as the data. So let's say you bring that data, ingest it in an iceberg layer, then you reflect the data and continue your transformation processes within. Let's say a snowflake engine to further refine the data. All that lineage of the data is already automatically maintained. We know what data came from the source, where it landed on the iceberg, which snowflake table that data is getting reflected in and how it's further processed. The entire lineage is maintained through it. Right. Again, typically another part of governance is I would say access control. Right. How do you control the data? And that's typically done through the iceberg catalog. So if you're using glue or you're using a snowflake catalog, that's where you define all your policy in terms of who can access iceberg data. So the iceberg layer itself doesn't provide the protection, but that catalog layer on top of the iceberg provides that data protection where you can apply those policies on the tabular data. So I think it's a combination of the iceberg layer itself providing trust and the Qlik layer right on top of it providing things like data quality, like lineage, like cataloging, like providing additional business semantics on top of the iceberg. So the iceberg table has a definition. It's typically technical metadata. Now you mentioned an agentic system, it's very important to understand the context of the data which comes from the business side of it. So applying more business semantics to it, it's all that is done within the Qlik platform on top of the iceberg layers to enrich the data and make it more trustworthy. Quality lineage business semantics, making sure the data is the freshest data possible. So through chain data capture or through acquiring data more as streams, we're making sure that iceberg data is the latest and the copy of the data.
Brad Shimmin: Yeah, and it's so critical, isn't it, with agentic systems because there's so much data that's actually created by these systems as they operate and you need to capture that and make that a part of your system of record and your system of insight to support and govern and improve those over time. This, this whole idea you're talking about of having a very flexible, you know, non monolithic approach, you know, is like the only way you're going to do that. Okay, would you mind, Sharad, I would love to go back and talk about money very quickly.
Sharad Kumar: That's important, right?
Brad Shimmin: I mean it is. So, you know, you guys have promoted, you know, lowering costs and I think I saw a number that was like 50 to 80% reduction in select areas. And that, that fascinates me how, you know, if, just adopting a solution can, can save you money I, I love that. So you know, if, if you put your like CIO CFO hat on, you know, what are the key metrics that you think they should focus on in evaluating an architecture and how does that relate to what you guys are doing with efficiency building efficiencies within the Qlik Open Lakehouse?
Sharad Kumar: Good question. Because the cost is at the top. So when I talk to cio, CDO, look, the runaway costs of cloud compute, how beautiful that is, and the flexibility it gives you. Everybody's worried about the runaway cost because you have democratized data. Right now people are running queries, more businesses using data, but your costs are rising dramatically. So let's look at it. Where do those costs come from? So if I'm in a cloud warehouse environment. Right. What do the costs come from? I put the data in. Yes. I have storage costs within the warehouse. Snowflake bigquery, like we said earlier, all of them store data internally. So there's a storage cost for it. Then you have the compute cost and compute cost, which is the biggest portion of it. I break it down into two sides of it. One I call a job compute. So think of it. Compute to process incoming data, applying. So let's think of it without a lake house or iceberg concept. What am I doing? I'm taking data directly into the warehouse. Then I'm using the warehouse compute to apply the changes that have come in. Right. To create the bronze. Then I'm using compute within the warehouse to transform the data and finally create that business ready data form. Right. So all that compute, I call job compute because that's run in the background. Think of the classic ELT ETL type of compute, the data pipeline computer. That's, that's showing up a lot of compute cycles. And then obviously you have a query compute where bi and all these workloads come into which is I would say is a good compute. You want a good problem to have because you want users to use the data.
Brad Shimmin: Right? Right.
Sharad Kumar: The other component of cost comes from data duplication like we talked about. If you are loading data into a warehouse and I need it somewhere else, I gotta unload the data. So processing storage duplicates or I'm ingesting data twice. So the cost of doing that. So these are different cost points. If you look at the four or five areas of cost within the cloud warehouse environment. Right. So what are we doing? So what we are doing right now Brad, is we are really affecting the first part of compute which is all compute tied to ingestion. Right. Taking the data, applying those delta, processing constantly and actually earlier in the year I was listening to on the Snowflake Summit, one of the customers of Capital One and they are one of the largest users of Snowflake. They have a huge Snowflake environment. Early adopters and they're all in on Snowflake. So what somebody there was talking about is that half their Snowflake cost is tied to data ingestion.
Brad Shimmin: No doubt.
Sharad Kumar: Yeah, no doubt. Right. So query, fine, because you need to query it because that's the best engine for your BI workload. So you got a query, but it's tied to ingestion. So what we are doing is we are taking that ingestion cost down dramatically.
Brad Shimmin: Right.
Sharad Kumar: So we are providing our own Qlik Compute, which is another secret sauce built on top of low cost compute as a cluster which runs in the customer's cloud environment which processes all the change data and all the optimization we talked about, all that compute runs in that cluster. Right. Which is much cheaper than the high power warehouse compute. Right. So the question becomes okay, why do I need this expensive horsepower to just do my chain data capture processing or my ingestion processing? Right. I totally understand. For a warehouse query that's absolutely critical. So what we've done is we are taking that ingestion compute and thinking of it, we have put it, taking it outside and what we do is using that, we create that iceberg bronze layer. So two things happen. One, we're able to reduce that cost dramatically and that's what the cost savings come from. That cost saving compared to what you would spend in a warehouse could be 60, 70, 80% cheaper in the way we are processing that. So that's where the cost savings really come from is really we're addressing that ingestion. So besides creating an open interoperable data in iceberg form, we're able to reduce the cost as well.
Brad Shimmin: Yeah, I love that we're seeing this in the AI field quite a bit right now in terms of optimizing spend according to underlying infrastructure, you know, matching the inference type to the model to the use case, et cetera. So that's brilliant. And it's also, you know, interesting when you talk about this problem in general because we've been seeing a huge run of companies promoting ideas like, you know, zero ETL or zero Copy or you know, whatever they want to call it.
Sharad Kumar: Yeah.
Brad Shimmin: To, to try to cut down on it because it is a big problem. It'll be okay. So I know we're getting close on time, but I wanted to ask you one more question. And, and it's, you know, I'm going to start by just saying iceberg is winning and actually I think it's already won due to, you know, the stuff we've been talking about like, like you mentioned earlier with its asset properties and flexibility and whatnot. Yeah. So from your perspective, you know, just blue skying a bit. What's, what's the single most transformative capability that you see coming out of the iceberg that you think can help enterprises really build scalable, governed, as you said, trusted AI solutions.
Sharad Kumar: Yeah. So if you look at the iceberg, multiple properties. Right. So one which I feel is, I would say probably foundational within the iceberg is acid. Right. We talked about being able to guarantee a single truth copy of it. But I look at that, that's probably just table stakes and foundational. But what excites me personally more is the whole snapshot capability of an iceberg that is able to have and maintain multiple snapshots of data. So now think of what a couple of things have meant right. In the machine learning world. Right. So I've been in this space for a long time and talking to data scientists is a big challenge. You have your machine learning model, you're built on some data and your data is constantly changed and being able to now guarantee whether something was a model drift or your data changes. So now it's becoming much easier to bound your model to a snapshot and say, okay, my model is on this snapshot, your model is on different snapshots, the same data. And now you can prove very, very, very easily. So I think that's the trust in the data, the copy of the data is absolutely, absolutely critical. And also if we extend that from machine learning to the agentic world now. So you could have an agent which wants to look at historical data, a certain snapshot. I could have an agent who wants to look at what's the new data that came in, what's the new data that I may need to act on. So again, it gives you the snapshot, it gives you the capability to almost tap into data at any point in time, in time travel. Give it to me. That's super exciting. And then second thing, what that does is automatically gives you that governance and lineage because as you go from snapshot to snapshot, everything is recorded. How did it evolve? How did the schema change? Right. Who changed it? What's new about it? So inherently you get this versioning and traceability on it. And the third thing I like about this whole concept of snapshotting is also Operationally, it makes it more efficient. What I mean is, let's say you built an application on one snapshot of the data and if that was constantly changing, schema changed on you, your application may break. Right. But the good thing about snapshotting is my application, my agent could be built on one snapshot or of it. Schema evolves, things change, which is the right thing to do. You want a dynamic environment, schema change. You want to create a new snapshot, but different schema. My agent is not going to break because working on a snapshot of data. So I think those are qualities. So I would say asset properties to me are table stakes, foundational. But I think what's more transformational within the iceberg world is this notion of time travel and snapshot, which I think enables a lot of interesting use cases.
Brad Shimmin: Yeah, that last one there is fascinating to me because it's almost counterintuitive in a way, but it's so impactful because if you think about what we've been striving for for years now in the industry is to get toward this idea of treating data analytics and AI as we treat software and to have those CICD principles wrapping that in terms of how we build this stuff. So I hadn't thought of it as enabling that with snapshots. That's awesome. All right, well, thank you, Sharad and everyone. That wraps our conversation with Sharad. Thank you for doing this today.
Sharad Kumar: Thank you, Brad. Enjoy the conversation.
Brad Shimmin: Great. And everyone else, thank you for tuning in to dive deeper into how today's leaders are decoding the next wave of enterprise AI and data innovation. You can stay tuned for more sessions that we'll have on the six five decode Summit. Here, here. And. And you're also welcome to explore more conversations we have at SixFiveMedia.com. And with that, I will say goodbye. This is Brad Shimmin. Have a great day.
Midway through GTC 2025, Pat Moorhead, Founder, CEO, and Chief Analyst at Moor Insights & Strategy, shares his early insights, revealing a landscape buzzing with AI innovation and a clear vision for the future. Pat highlights takeaways from NVIDIA’s latest GPU launches, AI-driven infrastructure, and how Samsung’s advancements with GDDR7, HBM3E, and SOCAMM will impact the next wave of AI.
Key takeaways include:
🔹Hyperscaler Investment Surge: Despite concerns about demand saturation, the reality is a significant increase in computing demand, driven by reasoning engines and agents, leading to rising H100 and H200 prices.
🔹Fueling the AI Ecosystem: Talks underscored the critical role of advanced memory technologies, with next-generation HBM, eSSD, and the collaborative integration of Samsung’s 24Gb GDDR7 DRAM into NVIDIA’s RTX 50 series, driving the future of AI and high-performance computing.
🔹AI Pervasiveness: From data centers to gaming PCs, AI is permeating every aspect of technology. NVIDIA’s messaging from GTC underscored the broad applicability of AI across diverse sectors.
🔹Software Leads the Charge: Contrary to expectations, NVIDIA’s focus was on software, with advancements in CUDA and the introduction of Dynamo, highlighting the critical role of software in enabling AI at scale.
🔹Hardware Innovations: Announcements like Blackwell Ultra, Rubin, and the enterprise data platform, showcase NVIDIA’s commitment to pushing the boundaries of computing, networking, and storage.
🔹Data Management is Key: The introduction of the enterprise data platform addresses a critical challenge for CIOs, highlighting the importance of efficient data management in AI deployments.
Watch the full video above, and be sure to subscribe to our YouTube channel, so you never miss an episode.
Speaker



Sharad is a data & analytics thought leader, strategic thinker, and technologist specializing in GenAI, Data Products / Data Mesh, Data Integration, Data Governance and Cloud. In his current role as Field CTO for Data in Americas at Qlik, Sharad Kumar is responsible for technical leadership, customer engagement, sales support, and providing input and feedback to product development around Qlik’s diverse portfolio of data integration and data quality products.
Sharad brings over three decades of extensive experience in the technology sector, with a predominant focus on Data and Cloud solutions. Prior to joining Qlik, Sharad was the founder of Mozaic Data, pioneering the development of an innovative Data Product Experience platform designed to enable organizations to build, secure, govern, deploy, and manage domain-centric data products in the cloud. Prior to that, as Global CTO of Accenture’s Data & AI group, Sharad worked with some of the largest data-driven customers assisting them with their data strategy and transformation to Cloud.



Other Sessions
.png)
How Elastic Is Expanding Data Observability for the AI Era - Six Five Decode Summit
Steve Kearns, GM of Search Solutions at Elastic, joins Brad Shimmin to discuss how Elastic is redefining AI search, context engineering, and developer experience to help enterprises build smarter, more relevant AI applications from day one.
.png)
How Snowflake Is Building the Future of Data Intelligence with AI - Decode Summit Keynote
Carl Perry, Senior Director of Product Management at Snowflake, joins Brad Shimmin to discuss how Snowflake is blending open data standards, native AI tools, and unified governance to shape the future of enterprise data intelligence.