Aug 2023 • Technical Deep Dive
AWS Cost Optimization Drift Detection
Deep technical implementation details and lessons learned.
Technical Implementation
Technical details coming soon...
Frequently Asked Questions
How accurate is the drift detection?
" Very accurate for resources that should be in Terraform. The key is maintaining good filters for dynamically created resources. We maintained a whitelist of resource patterns that were expected to exist outside Terraform - like CloudFormation stack resources or service-linked roles. "What about production resources? Isn't auto-deletion dangerous?" Production resources should all be in Terraform - that's the point. The system had multiple safety checks: production account resources got longer grace periods, databases triggered backups before deletion, and anything tagged as 'production' required manual approval. We never had a false positive deletion impact production. "How do you handle resources that can't be imported to Terraform?" Some AWS resources don't support Terraform import or have complex dependencies. We maintained an exclusion list with justifications. These resources were still tracked for visibility but excluded from auto-deletion. The goal was pragmatic infrastructure management, not perfection. "What's the Slack integration architecture?" Built with Python and AWS Lambda. Driftctl results triggered a Lambda function that processed the drift report, enriched it with cost and usage data, then posted formatted messages to Slack via webhook. Simple, serverless, and reliable. "Can this work with other IaC tools besides Terraform?" Driftctl specifically works with Terraform, but the concept applies to any IaC tool. The key is having a source of truth for what should exist. We've seen similar approaches with CloudFormation drift detection and even custom scripts comparing configuration databases to cloud reality. "How do you handle cross-team resources?" Resources used by multiple teams required special handling. We implemented a "shared resource" tag that prevented auto-deletion and instead notified all stakeholder teams. This prevented one team from accidentally deleting resources others depended on. Lessons That Apply Beyond AWS This project taught me that technical solutions need organizational adoption to create real impact. The best automation is worthless if people don't engage with it. By making the solution social and visible, we achieved adoption rates that no amount of training could have delivered. I also learned that infrastructure governance doesn't have to be bureaucratic. When you make the right thing to do the easy thing to do, people generally do the right thing. The Slack integration removed friction from resource cleanup, so cleanup actually happened. The executive engagement aspect surprised me. I expected pushback about "micromanagement," but developers actually appreciated the attention to infrastructure efficiency. It showed that the company cared about doing things right, not just shipping features. The Ongoing Impact The system is still running, still saving money, still catching drift. But its impact goes beyond cost savings. It fundamentally changed how the organization thinks about cloud resources. New developers learn about resource lifecycle management from day one. Projects budget for cleanup time. Teams discuss resource efficiency in sprint planning. What started as a cost-saving tool became a cultural transformation. The patterns we established influenced other areas too. The "make it visible, make it social" approach was applied to security findings, performance issues, and code quality metrics. The success of automated governance with human oversight became a template for other operational improvements. Why This Matters In the end, this project proved that some of the highest-impact work happens in the unglamorous spaces. Nobody gets excited about resource cleanup. There's no cutting-edge technology here. But the business impact was immediate and substantial. It also showed that the best solutions often combine technical automation with human psychology. The drift detection was the technical foundation, but the Slack integration and executive engagement were what made it actually work. You know what's satisfying? Logging into AWS console now and seeing a clean resource list. No mysteries, no "what's that?" moments, no forgotten experiments burning money. Just the resources we need, tracked and managed properly. That's the mark of good infrastructure work - when waste becomes visible, it becomes manageable. And when it becomes manageable, it usually disappears. We proved that with the right tools and approach, you can transform infrastructure chaos into infrastructure discipline. One Slack message at a time.
What about production resources? Isn't auto-deletion dangerous?
" Production resources should all be in Terraform - that's the point. The system had multiple safety checks: production account resources got longer grace periods, databases triggered backups before deletion, and anything tagged as 'production' required manual approval. We never had a false positive deletion impact production. "How do you handle resources that can't be imported to Terraform?" Some AWS resources don't support Terraform import or have complex dependencies. We maintained an exclusion list with justifications. These resources were still tracked for visibility but excluded from auto-deletion. The goal was pragmatic infrastructure management, not perfection. "What's the Slack integration architecture?" Built with Python and AWS Lambda. Driftctl results triggered a Lambda function that processed the drift report, enriched it with cost and usage data, then posted formatted messages to Slack via webhook. Simple, serverless, and reliable. "Can this work with other IaC tools besides Terraform?" Driftctl specifically works with Terraform, but the concept applies to any IaC tool. The key is having a source of truth for what should exist. We've seen similar approaches with CloudFormation drift detection and even custom scripts comparing configuration databases to cloud reality. "How do you handle cross-team resources?" Resources used by multiple teams required special handling. We implemented a "shared resource" tag that prevented auto-deletion and instead notified all stakeholder teams. This prevented one team from accidentally deleting resources others depended on. Lessons That Apply Beyond AWS This project taught me that technical solutions need organizational adoption to create real impact. The best automation is worthless if people don't engage with it. By making the solution social and visible, we achieved adoption rates that no amount of training could have delivered. I also learned that infrastructure governance doesn't have to be bureaucratic. When you make the right thing to do the easy thing to do, people generally do the right thing. The Slack integration removed friction from resource cleanup, so cleanup actually happened. The executive engagement aspect surprised me. I expected pushback about "micromanagement," but developers actually appreciated the attention to infrastructure efficiency. It showed that the company cared about doing things right, not just shipping features. The Ongoing Impact The system is still running, still saving money, still catching drift. But its impact goes beyond cost savings. It fundamentally changed how the organization thinks about cloud resources. New developers learn about resource lifecycle management from day one. Projects budget for cleanup time. Teams discuss resource efficiency in sprint planning. What started as a cost-saving tool became a cultural transformation. The patterns we established influenced other areas too. The "make it visible, make it social" approach was applied to security findings, performance issues, and code quality metrics. The success of automated governance with human oversight became a template for other operational improvements. Why This Matters In the end, this project proved that some of the highest-impact work happens in the unglamorous spaces. Nobody gets excited about resource cleanup. There's no cutting-edge technology here. But the business impact was immediate and substantial. It also showed that the best solutions often combine technical automation with human psychology. The drift detection was the technical foundation, but the Slack integration and executive engagement were what made it actually work. You know what's satisfying? Logging into AWS console now and seeing a clean resource list. No mysteries, no "what's that?" moments, no forgotten experiments burning money. Just the resources we need, tracked and managed properly. That's the mark of good infrastructure work - when waste becomes visible, it becomes manageable. And when it becomes manageable, it usually disappears. We proved that with the right tools and approach, you can transform infrastructure chaos into infrastructure discipline. One Slack message at a time.
How do you handle resources that can't be imported to Terraform?
" Some AWS resources don't support Terraform import or have complex dependencies. We maintained an exclusion list with justifications. These resources were still tracked for visibility but excluded from auto-deletion. The goal was pragmatic infrastructure management, not perfection. "What's the Slack integration architecture?" Built with Python and AWS Lambda. Driftctl results triggered a Lambda function that processed the drift report, enriched it with cost and usage data, then posted formatted messages to Slack via webhook. Simple, serverless, and reliable. "Can this work with other IaC tools besides Terraform?" Driftctl specifically works with Terraform, but the concept applies to any IaC tool. The key is having a source of truth for what should exist. We've seen similar approaches with CloudFormation drift detection and even custom scripts comparing configuration databases to cloud reality. "How do you handle cross-team resources?" Resources used by multiple teams required special handling. We implemented a "shared resource" tag that prevented auto-deletion and instead notified all stakeholder teams. This prevented one team from accidentally deleting resources others depended on. Lessons That Apply Beyond AWS This project taught me that technical solutions need organizational adoption to create real impact. The best automation is worthless if people don't engage with it. By making the solution social and visible, we achieved adoption rates that no amount of training could have delivered. I also learned that infrastructure governance doesn't have to be bureaucratic. When you make the right thing to do the easy thing to do, people generally do the right thing. The Slack integration removed friction from resource cleanup, so cleanup actually happened. The executive engagement aspect surprised me. I expected pushback about "micromanagement," but developers actually appreciated the attention to infrastructure efficiency. It showed that the company cared about doing things right, not just shipping features. The Ongoing Impact The system is still running, still saving money, still catching drift. But its impact goes beyond cost savings. It fundamentally changed how the organization thinks about cloud resources. New developers learn about resource lifecycle management from day one. Projects budget for cleanup time. Teams discuss resource efficiency in sprint planning. What started as a cost-saving tool became a cultural transformation. The patterns we established influenced other areas too. The "make it visible, make it social" approach was applied to security findings, performance issues, and code quality metrics. The success of automated governance with human oversight became a template for other operational improvements. Why This Matters In the end, this project proved that some of the highest-impact work happens in the unglamorous spaces. Nobody gets excited about resource cleanup. There's no cutting-edge technology here. But the business impact was immediate and substantial. It also showed that the best solutions often combine technical automation with human psychology. The drift detection was the technical foundation, but the Slack integration and executive engagement were what made it actually work. You know what's satisfying? Logging into AWS console now and seeing a clean resource list. No mysteries, no "what's that?" moments, no forgotten experiments burning money. Just the resources we need, tracked and managed properly. That's the mark of good infrastructure work - when waste becomes visible, it becomes manageable. And when it becomes manageable, it usually disappears. We proved that with the right tools and approach, you can transform infrastructure chaos into infrastructure discipline. One Slack message at a time.
What's the Slack integration architecture?
" Built with Python and AWS Lambda. Driftctl results triggered a Lambda function that processed the drift report, enriched it with cost and usage data, then posted formatted messages to Slack via webhook. Simple, serverless, and reliable. "Can this work with other IaC tools besides Terraform?" Driftctl specifically works with Terraform, but the concept applies to any IaC tool. The key is having a source of truth for what should exist. We've seen similar approaches with CloudFormation drift detection and even custom scripts comparing configuration databases to cloud reality. "How do you handle cross-team resources?" Resources used by multiple teams required special handling. We implemented a "shared resource" tag that prevented auto-deletion and instead notified all stakeholder teams. This prevented one team from accidentally deleting resources others depended on. Lessons That Apply Beyond AWS This project taught me that technical solutions need organizational adoption to create real impact. The best automation is worthless if people don't engage with it. By making the solution social and visible, we achieved adoption rates that no amount of training could have delivered. I also learned that infrastructure governance doesn't have to be bureaucratic. When you make the right thing to do the easy thing to do, people generally do the right thing. The Slack integration removed friction from resource cleanup, so cleanup actually happened. The executive engagement aspect surprised me. I expected pushback about "micromanagement," but developers actually appreciated the attention to infrastructure efficiency. It showed that the company cared about doing things right, not just shipping features. The Ongoing Impact The system is still running, still saving money, still catching drift. But its impact goes beyond cost savings. It fundamentally changed how the organization thinks about cloud resources. New developers learn about resource lifecycle management from day one. Projects budget for cleanup time. Teams discuss resource efficiency in sprint planning. What started as a cost-saving tool became a cultural transformation. The patterns we established influenced other areas too. The "make it visible, make it social" approach was applied to security findings, performance issues, and code quality metrics. The success of automated governance with human oversight became a template for other operational improvements. Why This Matters In the end, this project proved that some of the highest-impact work happens in the unglamorous spaces. Nobody gets excited about resource cleanup. There's no cutting-edge technology here. But the business impact was immediate and substantial. It also showed that the best solutions often combine technical automation with human psychology. The drift detection was the technical foundation, but the Slack integration and executive engagement were what made it actually work. You know what's satisfying? Logging into AWS console now and seeing a clean resource list. No mysteries, no "what's that?" moments, no forgotten experiments burning money. Just the resources we need, tracked and managed properly. That's the mark of good infrastructure work - when waste becomes visible, it becomes manageable. And when it becomes manageable, it usually disappears. We proved that with the right tools and approach, you can transform infrastructure chaos into infrastructure discipline. One Slack message at a time.
Can this work with other IaC tools besides Terraform?
" Driftctl specifically works with Terraform, but the concept applies to any IaC tool. The key is having a source of truth for what should exist. We've seen similar approaches with CloudFormation drift detection and even custom scripts comparing configuration databases to cloud reality. "How do you handle cross-team resources?" Resources used by multiple teams required special handling. We implemented a "shared resource" tag that prevented auto-deletion and instead notified all stakeholder teams. This prevented one team from accidentally deleting resources others depended on. Lessons That Apply Beyond AWS This project taught me that technical solutions need organizational adoption to create real impact. The best automation is worthless if people don't engage with it. By making the solution social and visible, we achieved adoption rates that no amount of training could have delivered. I also learned that infrastructure governance doesn't have to be bureaucratic. When you make the right thing to do the easy thing to do, people generally do the right thing. The Slack integration removed friction from resource cleanup, so cleanup actually happened. The executive engagement aspect surprised me. I expected pushback about "micromanagement," but developers actually appreciated the attention to infrastructure efficiency. It showed that the company cared about doing things right, not just shipping features. The Ongoing Impact The system is still running, still saving money, still catching drift. But its impact goes beyond cost savings. It fundamentally changed how the organization thinks about cloud resources. New developers learn about resource lifecycle management from day one. Projects budget for cleanup time. Teams discuss resource efficiency in sprint planning. What started as a cost-saving tool became a cultural transformation. The patterns we established influenced other areas too. The "make it visible, make it social" approach was applied to security findings, performance issues, and code quality metrics. The success of automated governance with human oversight became a template for other operational improvements. Why This Matters In the end, this project proved that some of the highest-impact work happens in the unglamorous spaces. Nobody gets excited about resource cleanup. There's no cutting-edge technology here. But the business impact was immediate and substantial. It also showed that the best solutions often combine technical automation with human psychology. The drift detection was the technical foundation, but the Slack integration and executive engagement were what made it actually work. You know what's satisfying? Logging into AWS console now and seeing a clean resource list. No mysteries, no "what's that?" moments, no forgotten experiments burning money. Just the resources we need, tracked and managed properly. That's the mark of good infrastructure work - when waste becomes visible, it becomes manageable. And when it becomes manageable, it usually disappears. We proved that with the right tools and approach, you can transform infrastructure chaos into infrastructure discipline. One Slack message at a time.
How do you handle cross-team resources?
" Resources used by multiple teams required special handling. We implemented a "shared resource" tag that prevented auto-deletion and instead notified all stakeholder teams. This prevented one team from accidentally deleting resources others depended on. Lessons That Apply Beyond AWS This project taught me that technical solutions need organizational adoption to create real impact. The best automation is worthless if people don't engage with it. By making the solution social and visible, we achieved adoption rates that no amount of training could have delivered. I also learned that infrastructure governance doesn't have to be bureaucratic. When you make the right thing to do the easy thing to do, people generally do the right thing. The Slack integration removed friction from resource cleanup, so cleanup actually happened. The executive engagement aspect surprised me. I expected pushback about "micromanagement," but developers actually appreciated the attention to infrastructure efficiency. It showed that the company cared about doing things right, not just shipping features. The Ongoing Impact The system is still running, still saving money, still catching drift. But its impact goes beyond cost savings. It fundamentally changed how the organization thinks about cloud resources. New developers learn about resource lifecycle management from day one. Projects budget for cleanup time. Teams discuss resource efficiency in sprint planning. What started as a cost-saving tool became a cultural transformation. The patterns we established influenced other areas too. The "make it visible, make it social" approach was applied to security findings, performance issues, and code quality metrics. The success of automated governance with human oversight became a template for other operational improvements. Why This Matters In the end, this project proved that some of the highest-impact work happens in the unglamorous spaces. Nobody gets excited about resource cleanup. There's no cutting-edge technology here. But the business impact was immediate and substantial. It also showed that the best solutions often combine technical automation with human psychology. The drift detection was the technical foundation, but the Slack integration and executive engagement were what made it actually work. You know what's satisfying? Logging into AWS console now and seeing a clean resource list. No mysteries, no "what's that?" moments, no forgotten experiments burning money. Just the resources we need, tracked and managed properly. That's the mark of good infrastructure work - when waste becomes visible, it becomes manageable. And when it becomes manageable, it usually disappears. We proved that with the right tools and approach, you can transform infrastructure chaos into infrastructure discipline. One Slack message at a time.
what's that?
" moments, no forgotten experiments burning money. Just the resources we need, tracked and managed properly. That's the mark of good infrastructure work - when waste becomes visible, it becomes manageable. And when it becomes manageable, it usually disappears. We proved that with the right tools and approach, you can transform infrastructure chaos into infrastructure discipline. One Slack message at a time.