📊 New: Amazon CloudWatch Agent Now Supports Detailed EBS Performance Metrics (June 2025)

#aws #ebs #ec2 #cloudwatch

Good news for developers, SREs, and cloud engineers — Amazon CloudWatch Agent now supports collecting detailed performance statistics for EBS volumes attached to EC2 and EKS nodes.

This means you can finally monitor and troubleshoot your EBS storage like a pro — with visibility into NVMe-level metrics such as:

🔁 IOPS (read/write operations)
📦 Throughput (bytes read/written)
⏱️ I/O wait time
🎯 Queue depth

Let’s break it down with a real-world example.

🔧 Use Case: App is Slow, But CPU & RAM Look Fine?

You’re running a production web app on EC2 with a gp3 EBS volume.
The app gets sluggish during peak hours, but CloudWatch shows:

CPU: fine
Memory: fine
Network: fine

Now, thanks to the new update, you can collect EBS disk-level metrics and discover the real problem.

🧪 Step-by-Step Example

Step 1: Enable EBS Metrics in CloudWatch Agent

Update your amazon-cloudwatch-agent.json config:

{
  "metrics": {
    "metrics_collected": {
      "diskio": {
        "resources": ["*"],
        "measurement": [
          "reads", "writes", "read_bytes", "write_bytes",
          "io_time", "await", "util", "queue"
        ],
        "metrics_collection_interval": 60
      }
    }
  }
}

Then restart the agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
  -a fetch-config -m ec2 -c file:/path/to/config.json -s

Step 2: View in CloudWatch

You'll now see custom metrics like:

await → time the app waits for I/O
queue → how many I/O ops are waiting
io_time → total time EBS spends on operations
read_bytes, write_bytes → data throughput

Step 3: Analyze & Act

During peak load:

queue = 22 (too high)
await = 120ms (delays noticeable)
write_bytes drops sharply

🧠 Root cause: EBS is bottlenecked. Time to provision more IOPS or switch from gp3 to io2.

✅ Why This Matters

Benefit	Impact
Granular storage insights	Understand app latency at disk level
Real-time metrics	Catch slowdowns before users do
Automation ready	Build alarms & dashboards
Works with EC2 + EKS	Great for both VMs & containers