Microsoft Sentinel Data Lake: Nine Tips to Help You Swim Faster

Breaking News

Quorum Cyber Recognized as a Microsoft Security Excellence Awards Winner for Security MSSP of the Year

In last week’s blog, Introducing the Microsoft Sentinel Data Lake: What You Need to Know, Clive Watson, Quorum Cyber’s Solutions Director and Microsoft MVP, introduced the technology and explained how you can get started. Together with Jon Shectman, Microsoft Principal Program Manager for Security, he’s shared a series of articles, packed with useful advice, on LinkedIn.

In this blog, Clive goes deeper to help you to be more productive from the start and save bags of time as you navigate your way through the data lake.

Tip#1 Support in Preview

My first tip is very simple. Please note that in Preview mode, the data lake is only supported in the same region as your Entra tenant home region. You can find your Entra tenant home region by logging in. This must match the Azure region of your Microsoft Sentinel Workspace.
This may mean you won’t be able to join the preview, Microsoft is aware, and you can always create a demo Workspace in the right Azure region for data lake testing.

Tip#2 The Default Workspace

Make use of the Default workspace. Once you’ve onboarded to the Sentinel data lake, you’ll navigate to data lake exploration/KQL queries.

Security.microsoft.com –> Microsoft Sentinel –> Configuration –> Data Lake exploration –> KQL Queries

You’re actually in the new default workspace, which is effectively an asset store. This is a data lake specific workspace (not to be confused with a Primary Workspace in Microsoft Defender or the ones in Sentinel or log Analytics)

This workspace is:

Fully managed by Microsoft
Designed to provide a minimal starting point for data lake operations
Automatically populated with a handful of logs so that you can begin running queries.

This “store” or default workspace contains tables populated with data from:

The Azure Resource Graph (ARG), a service that enables you to efficiently query and explore your Azure resources at scale across subscriptions and management groups
Entra asset information. This is a bit different than the Entra ID information we’re used to seeing in Sentinel, as it’s identify-related information.

Tip#3 Finding your tables

To navigate to your onboarded Sentinel data lake workspace, look in the upper right-hand corner of the screen. You will see either “default” or your other Workspace selected (you can only select one at a time). Currently you are unable to JOIN data using KQL between the two Workspaces, Microsoft have used “check boxes” rather than radio buttons so perhaps this hints at a future change?

Tip#4 Mirrored data

When using mirrored data, always set the Total retention interval to be equal to or greater than Analytics tier. It’s mirrored at no additional cost anyway. The Total retention time should be at least as long as the Analytics retention time for data in the Analytics tier. In almost all cases, the data lake constitutes the “Total retained” record of your data.

Tip#5 Finding the jobs blade

If you can’t see the jobs blade under Data lake explorer, then you have a permissions problem.

Because data lake is a different technology to Log Analytics, it has a different permission model.

To be able to create a KQL job you must have one of these permissions or the Jobs tab won’t be visible to you:

You should now see the following Jobs option (you may have to logoff/on again).

Tip#6 Be careful of costs when running jobs

Be careful with this one… There is no direct, supported, out-of-box way to store XDR tables in the data lake without ingesting them into the analytics tier first. So, you have to ingest the logs into Sentinel’s analytics tier before moving them to the data lake tier to lengthen the total retention beyond what is available in Advanced hunting.

Jobs are tasks that execute KQL queries against data stored in the lake. These jobs are incredibly useful. They are designed to promote query results to the analytics tier, enabling deeper analysis, advanced hunting, and integration with other Microsoft services. You can run Jobs once (immediately) or at defined intervals such as daily, weekly, or monthly.

Jobs are useful for tasks like:

Running complex queries including joins and unions across multiple tables
Creating or append tables: Results can be written to new or existing tables (with schema matching)
Supporting long lookback periods: Up to 12 years of historical data.

You might use jobs for SOC use cases like incident investigation, historical threat hunting or breach analysis, or enrichment with signals from the lake.

However, because jobs move data from the data lake to the analytics tier, they trigger analytic ingress charges. This is one way you could end up with a hefty bill you if you don’t plan ahead carefully. So, I advise you to summarize, bin, and filter regularly.

Tip#7 Double-check before you move data

You need to consider five key points before moving data to the data lake:

For data lake integrated tables, you can switch a table’s tier and retention settings at any time. However, bear in mind that Sentinel data lake is fill forward. This means any changes you make to tables is for future data ingress only. If you want to actually move data, you’ll have to re-ingest it with a KQL Job, which triggers analytics tier ingress charges.
When you shorten a table’s Total retention, Microsoft waits 30 days before removing the data, so you can revert the change and avoid data loss if you made an error in configuration or change your strategy for that table.
When you increase Total retention, the new retention period applies to all data that was already ingested into the table.
When you change the analytics retention settings of a table with existing data, the change takes effect immediately.
When you change a table’s tier from analytics to data lake, all real-time analytics and hunting queries stop working. This is by design; you pay far less for this tier. This means that if you take advantage of reduced costs and extended retention, all analytics rules and hunting queries stop working. The data is no longer in the analytics tier.

Tip#8 Finding Search & restore

While you technically can use Search & restore over data lake tables, in most cases, you are best served with KQL Jobs. KQL Jobs are sort of a hybrid, or joining, of Summary Rules (the ability to summarize data sets back into analytics tier) and Search jobs (the ability to search and restore data). If you need to use frequent queries to summarise tasks, use Summarization Rules. If you need one-time, long-running asynchronous queries across massive data sets, you might use a Search job.

For most other use cases, KQL Jobs in the data lake are your best bet. Bear in mind that, regardless of what you choose, in the vast majority of cases, you’ll encounter fees for all three.

Tip #9 Billing

There is some specific guidance for billing during the data lake preview and the main page for data lake costs: Plan costs and understand pricing and billing – Microsoft Sentinel | Microsoft Learn.

Important: While in preview, once onboarded to the Microsoft Sentinel data lake, billing through new meters is billed at the respective meter’s list rate. Pricing from previous meters doesn’t carry over. For more details on pricing, see Microsoft Sentinel pricing. Existing customers who are currently billed for Auxiliary logs ingestion, long-term retention and search, will see charges transition to the new data lake ingestion, data lake storage, and data lake query meters respectively.

In my next blog I explain how to plan your Microsoft Sentinel data lake strategy.