Over-provisioned EC2 instances are one of the most predictable findings in an AWS cost audit. The pattern is the same in almost every account: instances that were sized for a peak workload that never materialized, or that were right-sized for production but then copy-pasted into dev, staging, and QA environments without adjustment.
The result: you’re paying for CPU and RAM that sits idle. The fix is right-sizing — moving instances to a smaller type that matches actual utilization. Done carefully, it’s zero-risk and delivers immediate, recurring savings.
Here’s how to find the candidates, quantify the opportunity, and execute safely.
Where to look first
AWS Compute Optimizer is the right tool. It analyzes your EC2 CloudWatch metrics over the past 14 days (or up to 93 days if you enable enhanced infrastructure metrics) and makes recommendations based on actual CPU, memory, network, and disk I/O utilization.
To access it:
- Go to AWS Compute Optimizer in the console (you may need to opt in the first time)
- Select EC2 instances in the left nav
- Filter to your region and look at the Recommendation column
Optimizer classifies each instance as: Over-provisioned, Under-provisioned, Optimized, or Insufficient data (if CloudWatch metrics are missing).
Focus on Over-provisioned first. Optimizer will show you the current instance type, the recommended instance type, and the estimated monthly savings.
Cost Explorer right-sizing recommendations are a second source. Less granular than Optimizer, but faster to get to if you’re doing a first pass.
The patterns I find most often
Dev and staging environments running production-size instances. This is the most common. Someone provisioned a t3.xlarge in production because the spec called for it. They cloned the environment for dev — same size, same cost. Dev gets 1/10th the traffic and could run on a t3.small.
A company running 10 engineers and three environments (prod, staging, dev) might have $3,000/month tied up in dev and staging instances that could be cut 60–70% without any impact on anyone’s work.
Legacy instance families. t2, t3a, m4, c4 — older generations that are more expensive per vCPU than their current equivalents. AWS releases new instance families on a roughly 18-month cycle, and older families aren’t automatically retired — they just become poor value. Moving from m4.large to m6i.large is typically a 10–15% cost reduction with a performance improvement.
Burst instances running sustained workloads. t3 and t4g instances use CPU credits for bursting. They’re cheap at baseline but expensive at sustained load if you hit the credit ceiling and run in unlimited mode. If CloudWatch shows sustained CPU above 20%, a fixed-performance instance (m6i, c6i) is usually cheaper and more predictable.
Seasonal workloads at peak sizing all year. A batch job that runs heavily in Q4 but barely at all from January through September. The instance is sized for Q4. The other 9 months are waste.
How to right-size safely
The risk in right-sizing is taking down a workload that turns out to need the current size. Here’s the process that keeps that risk low:
Step 1: Verify the metrics are reliable. Compute Optimizer needs CloudWatch agent metrics for memory utilization — EC2 doesn’t publish memory to CloudWatch by default. If you haven’t configured the CloudWatch agent, Optimizer only sees CPU and network. That’s useful but incomplete. For a complete picture, install the CloudWatch agent on key instances before acting on recommendations.
Step 2: Check instance size constraints. Some workloads have memory requirements that aren’t obvious from CPU metrics. Check with the team that owns the service: what’s the JVM heap configured to? What does the application’s config say about memory? Don’t rely on utilization metrics alone for memory-bound workloads.
Step 3: Change instance type in a test environment first. For services you’re not confident about, change the type in dev or staging first. Confirm the application behaves correctly. Then move to production.
Step 4: Use EC2 Stop/Start for the type change. Right-sizing requires stopping the instance (for most type changes). For EBS-backed instances, this is non-destructive. Instance store (ephemeral) instances lose their local data on stop — check first.
Step 5: Stage production changes with observability. For critical production instances, do the change in a maintenance window, watch your metrics for 30 minutes after restart, and have a documented rollback plan (the old instance type is one stop/start away).
The math
A t3.xlarge on-demand in us-east-1 costs ~$150/month. A t3.medium costs ~$38/month. If you have five dev/staging instances that could drop from xlarge to medium, that’s ~$560/month in savings — $6,700/year — with no production impact.
At scale, right-sizing typically delivers 15–30% of the total compute bill back. For an account spending $20K/month on EC2, that’s $3,000–$6,000/month.
The savings are immediate and recurring. Unlike Savings Plans and Reserved Instances (which require purchasing commitments), right-sizing reduces your underlying on-demand cost, which then stacks with any commitments you have in place.
What to do with the Optimizer recommendations
When Compute Optimizer flags an instance as over-provisioned, it shows you the recommended type and an estimated savings percentage. Don’t take the recommendations blindly — use them as a starting list for investigation.
Work the list by value: sort by estimated monthly savings, descending, and work from the top. Instances with $5/month in potential savings are less interesting than ones with $200/month. Spend your investigation time proportionally.
For Insufficient data instances, the first step is getting CloudWatch metrics flowing. Until you have utilization data, you can’t make a confident sizing decision.
When to bring in help
Right-sizing sounds simple, but doing it correctly across a large account requires:
- Correlating CloudWatch metrics with application behavior
- Understanding which instances are owned by which teams and services
- Coordinating change windows with engineering without disrupting sprint cycles
- Capturing the savings in a way that shows up in the next billing period
If your account has more than 50 EC2 instances, a structured right-sizing pass is usually worth doing as a dedicated engagement. I run them as part of my cost audit — identify the candidates, quantify the savings, and give you a prioritized list with effort estimates for each change.
Contact me or email nick@coldsmokeconsulting.com.
Nick Allevato is an AWS Certified Solutions Architect Professional with 20 years of infrastructure experience. He runs Cold Smoke Consulting, an independent AWS consulting practice.