“Drift detection” sounds like one of those must-have features in modern infrastructure. It’s baked into tools, marketed heavily, and often treated as a best practice, but do we really need it? In most well-structured environments, drift shouldn’t be something you detect, it should be something you prevent.
Drift Is a Symptom, Not the Problem
Terraform drift happens when your real infrastructure no longer matches your code. That’s not the root issue. The root issue is someone (or something) was able to change infrastructure outside of Terraform. If that’s possible in your system, then drift detection is just a monitoring layer over a deeper control problem.
The Better Approach: Eliminate Drift at the Source
The goal should be to stop the ability of things changing the environment. If you design your permissions model properly, drift becomes rare by design.
The goal is simple:
- No direct changes in production environments
- All infrastructure changes go through Terraform
- Terraform runs via controlled pipelines
In this model, Terraform isn’t just a tool but it’s the only path to change. If that’s enforced properly, drift doesn’t happen in normal operation. Therefore do you need to spend resources and time to monitor the drift of Terraform?
“Break Glass” Still Matters
In reality though, even if we want to restrict all changes in environments manually, there will come a time in support you need someone to save the day. To enable this you will need a role that has higher privileges to through them fixes in like updating a Network Security Group, or putting a patch on a Virtual Machine.
This is where you need to be strict with both who and what you give:
- Limited access to what it needs using fine grain access.
- Used only in emergencies and not just because it is easier.
- Assigned to trusted engineers (e.g. SREs), so it doesn’t spread.
Following this allows BAU support to be maintained and keeps it controlled, but this only works if there’s discipline afterward:
Every manual change must be reflected back into Terraform.
If you skip that step, drift becomes permanent and your IaC loses credibility. You still question the effort for drift detection as these changes should happen very rarely and the changes reflected back into your code.
When Drift Detection Does Make Sense
However, I understand that even with strong controls, drift detection can still be useful as a safety net for:
- Auditing environments
- Catching misconfigurations
- Detecting accidental or unauthorized changes
- Highlighting gaps in your permission model
But it should be your backup plan, not your primary strategy. This shouldn’t be your first focus and where your design/development activities are put. Below is a simple solution I would use for Drift Detection.
A Simple, Tool-Agnostic Way to Detect Drift
If you do want drift detection, you don’t need anything complex.
A straightforward approach:
- Run a scheduled pipeline/workflow (e.g. daily)
- Execute
terraform planagainst your environment - Compare the output to detect unexpected changes
The key detail:
Run the plan against the exact version of code that is currently deployed.
If you don’t, you’ll just detect differences between commits not actual drift.
Something that can help with the detection and putting logic into your drift is using the Terraform output json which I have written about before in Terraform plan output to JSON.
From there, you can:
- Inspect the planned changes
- Detect additions, updates, or deletions
- Trigger alerts if anything unexpected appears
Final Thought
If you rely heavily on drift detection, it’s worth asking why. In most cases, the better investment is:
Stronger permissions, tighter control, and a single path for change.
Drift detection is useful but it’s not a substitute for good system design.
Taking It Further: An Agentic Approach to Drift
If you want to push this further, this is where things get interesting and potential for AI use.
Instead of just detecting drift, you can interpret it.
Using an MCP-style integration (for example, something like the Azure and Terraform Model Context Protocol), you can introduce an agent that sits on top of your Terraform pipeline and adds reasoning to the process.
Conceptually, the flow looks like this:
- Scheduled pipeline runs
terraform plan - Output is converted to JSON
- Data is passed into an MCP-enabled agent
- The agent:
- Reviews the changes
- Classifies them (expected vs unexpected)
- Correlates with recent deployments or incidents
- Provides a human-readable report
- Suggests remediation actions
Instead of a binary “drift detected” alert, you get something closer to:
- “This change appears to be a manual scaling adjustment on a production resource.”
- “This likely originated from a break-glass action during an incident.”
- “No corresponding Terraform change exists in the current codebase.”
That’s a completely different level of signal, then you can move it from Detection to Action.
Once you trust that reasoning layer, you can go a step further, so the agent doesn’t just report drift it can propose fixes.
For example:
- Generate the required Terraform changes to match reality
- Or suggest reverting the infrastructure back to code
In more advanced setups, it could even:
- Automatically raise a pull request with the proposed changes
- Tag the relevant team
- Include a summary of why the change is needed
Now drift detection becomes:
Drift → Analysis → Recommendation → Resolution
This solution is what advances the Drift Detection to make it more purposeful and useful to teams.