Designing an Azure Logging Pipeline for Control Plane Visibility

A hands-on project focused on designing and validating an Azure logging pipeline that provides real control plane and application-level visibility, with an emphasis on detection value, tradeoffs, and lessons learned.

The Challenge

I wanted to move past the idea of logging as a checkbox and actually design a pipeline that I could trust. Not just logs flowing somewhere, but logs that could support real detections and survive scrutiny.

The core problem I was trying to solve was visibility. Specifically, how control plane actions and application behavior in Azure actually surface from a defensive perspective once they leave the Azure portal and land in a SIEM.

It is easy to say you are collecting logs. It is much harder to prove that those logs show the right events, at the right time, with the right fields, and that you would notice if something meaningful went wrong.

This project was about answering a simple question honestly. If someone abused my Azure environment, would I actually see it?

My Approach

I approached this as a logging design problem rather than a Splunk or Azure exercise. The tools mattered, but the decisions mattered more.

Instead of pulling in every available log source, I focused on two streams that represent very different attacker behaviors. Azure Activity Logs for control plane actions, and App Service HTTP logs for application-level probing and abuse.

I deliberately avoided managed integrations at first. I wanted to understand the raw data path, what the events looked like before enrichment, and where things could silently break.

Throughout the project, I kept asking whether a detection would still work if I walked away for a week and came back. If the answer was no, something needed to change.

Build Process

I started by enabling Azure Activity Logs at the subscription level and routing them through Diagnostic Settings into an Event Hub. This immediately highlighted how different control plane logs are from application logs, both in structure and in intent.

For application visibility, I configured App Service logging and sent those logs through a separate Event Hub. This separation was intentional. I wanted to be able to reason about control plane activity independently from web traffic noise.

To move data into Splunk, I built an Azure Function that consumes events from Event Hub and forwards them to Splunk using the HTTP Event Collector. I intentionally kept the function simple so failures would be obvious.

During testing, I used ngrok to expose the function endpoint and inspect raw payloads. This step caught several issues early, including unexpected nesting, missing fields, and assumptions I had made about event structure.

One of the biggest adjustments was realizing that not all Azure logs arrive as single events. Some arrive as arrays of records, which meant my function and my searches had to explicitly handle that structure.

Once data was reliably landing in Splunk, I focused on normalization in search rather than perfect ingestion-time parsing. This made it easier to iterate and prevented early schema decisions from blocking progress.

From there, I built dashboards that answered very specific questions. Is data flowing. Who is making control plane changes. Which operations are failing. Are there patterns that look like misuse rather than normal administration.

Only after the dashboards made sense did I create alerts. The alerts were not the goal. They were the proof that the pipeline supported real detection logic.

Security Focus

This project reinforced how different offensive actions appear depending on where you are looking. Control plane abuse does not look like malware or exploitation. It looks like legitimate API calls used in the wrong way.

Role assignment changes, repeated failed operations, and resource deletions all showed up clearly in Activity Logs once I stopped treating them as background noise.

On the application side, simulated reconnaissance immediately surfaced as repeated 404 responses across sensitive paths. From an attacker perspective, this is trivial probing. From a defender perspective, it is high-signal behavior if you are actually collecting and reviewing the data.

One important lesson was how easy it is to generate false confidence. Dashboards can look clean while hiding the fact that important fields are missing or inconsistently populated.

By forcing alerts to trigger and reviewing the raw events behind them, I caught several cases where a detection technically fired but told an incomplete story. That feedback loop was one of the most valuable parts of the project.

Results

By the end of the project, I had a working logging pipeline that reliably delivered Azure Activity Logs and App Service logs into Splunk with enough fidelity to support meaningful searches.

I built dashboards that provided immediate answers about pipeline health, high-risk control plane operations, failed administrative actions, and application-level reconnaissance behavior.

I created and successfully triggered alerts for IAM role assignment changes, repeated failed control plane operations, and potential 404 scanning activity.

Along the way, I ran into several assumptions that turned out to be wrong. I assumed logs would arrive as single events. They did not. I assumed field names would be consistent. They were not. I assumed seeing data meant I understood it. That was the biggest mistake.

This project fundamentally changed how I think about logging. I no longer see it as something you enable and forget. I see it as infrastructure that needs to be designed, tested, and questioned just like any other security control.

Want to dig into the code?

This project is fully documented on GitHub, including notes, commits, and future ideas.

View repo on GitHub →