AWS EBS snapshots stuck at 0%

19.10.2016 12:51 quick-tips aws

A common action you have to do in while using AWS EC2 is an EBS (elastic block store) volume snapshot. If you don’t know what an EBS volume is, you can think of it as a networked disk that you can attach and detach from instances. AWS also makes it easy for you to snapshot them (back them up to S3), so you can restore them in the future.

One thing you can’t do is resize a volume. So a common task when you start running out of disk space is:

Unmounting the device (SSH into the instance): umount /dev/xvdb1.
Detach the EBS volume from the instance: aws ec2 detach-volume --volume-id vol-123.
Wait until its state is “available”: aws ec2 wait volume-available --volume-ids vol-123.
Snapshot it: aws ec2 create-snapshot --volume-id vol-123.
Wait until the snapshot is “completed”: aws ec2 wait snapshot-completed.
Create a new volume based on that snapshot with a larger size: aws ec2 create-volume --size 50 --snapshot-id snap-123 --availability-zone eu-west-1a --volume-type gp2.
Wait until the volume is “available”: aws ec2 wait volume-available --volume-ids vol-321.
Attach it to the instance: aws ec2 attach-volume --volume-id vol-321 --instance-id i-123 --device /dev/sdb.
Wait until the volume is “in-use”: aws ec2 wait volume-in-use --volume-ids vol-321.
Mount it (SSH into the instance): mount /dev/xvdb1 /mymountpoint.
Use parted to expand the partition (SSH into the instance): `parted unit s rm 1 mkpart primary 2048s 100%.
Check volume integrity (SSH into the instance): e2fsck -f /dev/xvdb1.
Expand it (SSH into the instance): resize2fs /dev/xvdb1.
Delete the old volume: `aws ec2 delete-volume –volume-id vol-123.

Voilá, a larger disk in, erhm, thirteen simple steps and an undetermined waiting time.

Now imagine you start your snapshot, and while it’s going (but usually as soon as you press the button), you realize you made a mistake, and forgot something and you abort the snapshot. The snapshot will disappear from your AWS Console, you make your changes and you initiate a snapshot again. Two hours go by and all you’re seeing is a stuck snapshot just keep reading.

What is really going on

When you first issue a snapshot and then abort it, the snapshotting process isn’t actually aborted by AWS, it keeps running on their servers, but disappears from your console. The second snapshot that you issue will then be queued after that this “ghost” snapshot which supposedly has been deleted. Fun. Queuing snapshots is something you can do yourself without aborting them, in fact you can have up to 5 of them pending for each volume. The other tidbit is that snapshots are incremental. If AWS detects a previous snapshot for an EBS volume, the new one will only take a fraction of the time to create and cost a fraction of the money to store, since only the diff is generated and kept.

Adding all of this together, you can expect your stuck at 0% snapshot to go from 0 to 100% very quickly, because as soon as the “ghost” one terminates, your new one will start and the diff from the previous one will be very easy to calculate and create. Oh, the joys of the cloud.