Introducing the Microsoft Sentinel Data Lake: What You Need to Know

Breaking News

Quorum Cyber Recognized as a Microsoft Security Excellence Awards Winner for Security MSSP of the Year

In July, Microsoft announced its new Sentinel data lake, and rolled it out in Public Preview. In a blog post, the company described it as “giving security teams a powerful, cost-effective way to unify, retain, and analyse all security data… ” The Sentinel data lake was “built to remove data silos, simplify security data management, and deliver AI-ready data & analytics without having to manage complex infrastructure.”

In an article posted on LinkedIn, Clive Watson, Quorum Cyber’s Solutions Director and Microsoft MVP, together with Jon Shectman, Microsoft Principal Program Manager for Security, shared their expertise of the data lake in a detailed Q&A article.

In this blog, Clive summaries the key points. Let’s dive straight in.

What is the difference between the analytics and data lake tiers?

Microsoft Sentinel now offers two distinct data tiers to optimise both performance and cost across different security use cases:

Analytics Tier

Designed for high-value, real-time security operations, the Analytics Tier supports all log types with:

Full analytics capabilities including Kusto Query Language (KQL) queries, alerting, and detection rules
Real-time insights for threat detection and incident response
Ideal for active monitoring and security automation workflows
Use cases include things like:
High-performance queries
Analytics rules
Threat hunting

Data Lake Tier

Built for scalable, cost-effective storage and long-term data retention, the Data Lake Tier provides:

Low-cost raw data storage
Think high-volume, lower fidelity data
Support for Kusto queries, Spark notebooks, and scheduled jobs
Best suited for investigations, threat hunting, and data enrichment workflows

Integrated Management in Defender Portal

The Defender portal now includes unified management for:

Table and data connector configuration
Lake Explorer for navigating stored data
Role-Based Access Control (RBAC) for secure access.

What is the new Table Management experience in Defender and how does it work?

When you onboard to Sentinel data lake, you’ll see Tables under the Configuration section.

One important new benefit of the data lake is that all data you continue to send to the Analytics tier is automatically (and at no additional cost) mirrored to the data lake. So the new tiering approach comprises essentially three options:

Analytics Tier (Hot Storage)

In the analytics tier, data is fully accessible for real-time analytics, including:

By default, data in this tier is retained for 90 days. However:
You can extend retention for all tables up to 2 years

Total Retention (Mirrored to Data Lake)

All data stored in the analytics tier is automatically mirrored to the data lake for the same retention duration. You can further extend retention in the data lake independently, allowing for:

Up to 12 years of total retention
Low-cost storage for long-term compliance and historical analysis. This dual-tier approach ensures that even after analytics-tier data expires, it remains accessible in the lake if desired

Data Lake Tier (Cold Storage)

The data lake tier is designed for cost-efficient long-term storage. Key characteristics include:

Data is not available for real-time analytics or threat hunting. To be clear, analytics rules do not function in this tier
Access is provided via:
KQL jobs for on-demand queries
Scheduled KQL or Spark jobs for trend analysis (preferred) and ingestion into the analytics tier
Summary rules for periodic aggregation of insights (less preferred)

This tier is ideal for historical investigations, compliance audits, and strategic analytics that don’t require immediate access. Expect slower response times to queries and jobs due to the lower price.

Looking at the interface, what is Analytics as opposed to Total retention?

In a nutshell, Analytics is how long the table is stored in the analytic tier, while Total is the length of time the data is stored in the lake.

There’s no lake-only retention option because for XDR tables the retention is 30 days. It is integrated with the data lake only if either retention or long-term retention is set above 30 days, and data will become billable. In plain English, XDR tables are not eligible for lake-only storage. As of now, there is no out-of-the-box way to move a table from XDR to the lake without first ingesting it into the analytics tier (and paying ingress costs). In other words, if you don’t first ingest the data into the analytics tier, it cannot be stored in the lake.

In the context of what I can manage, what is Tier, what is a Table type, and what are the limitations?

Table type has three categories as you might expect: XDR, Sentinel, and Custom. These are relatively straightforward as they are probably pretty familiar.

Tables with Table type XDR and Tier XDR are Default XDR tables. These cannot be moved to the data lake and follow the standard 30-day retention policy. (They would first need to be ingested into the analytic tier, moving their Tier type to Sentinel.) XDR tables are not integrated with the data lake. There is no lake-only option available. If you see only analytics tier as a retention option heading, it means there is no lake-only option available.

What’s the impact on older Sentinel features like Archive (long-term retention) and Summary Rules now that the data lake is here?

The new data lake is essentially designed to replace or streamline those older mechanisms. For example, Archive is no longer the primary method for long-term storage once you have the data lake – instead, you would use the data lake tier for long-term retention. During preview, any existing Archive data stays accessible as before (it isn’t automatically moved), but going forward we suggest using the data lake for cheaper long-term storage.

Similarly, Summary Rules (which were used to periodically summarize data to cheaper tables) are being supplanted by KQL jobs in the data lake. Over time, we feel that scheduled KQL jobs should replace Summary Rules for most scenarios in the lake model.

The July 2025 Microsoft press release for data lake described a significant expansion of Microsoft Sentinel’s capabilities through the introduction of Sentinel data lake, now rolling out in public preview. Security teams cannot defend what they cannot see and analyse. With exploding volumes of security data, organizations are struggling to manage costs while maintaining effective threat coverage. Do-it-yourself security data architectures have perpetuated data silos, which in turn have reduced the effectiveness of AI solutions in security operations. With Sentinel data lake, we are taking a major step to address these challenges.

Microsoft Sentinel data lake enables a fully managed, cloud-native, data lake that is purposefully designed for security, right inside Sentinel. Built on a modern lake architecture and powered by Azure, Sentinel data lake simplifies security data management, eliminates security data silos, and enables cost-effective long-term security data retention with the ability to run multiple forms of analytics on a single copy of that data. Security teams can now store and manage all security data. This takes the market-leading capabilities of Sentinel SIEM and supercharges it even further. Customers can leverage the data lake for retroactive TI matching and hunting over a longer time horizon, track low and slow attacks, conduct forensics analysis, build anomaly insights, and meet reporting & compliance needs.

By improving security data management and capability, Sentinel data lake provides the data foundation for many solutions. Let’s look at some of Sentinel data lake’s core features in the next blog on Thursday 11th September.