Daily Backups

Amazon’s guide to AWS Backup is here: https://docs.aws.amazon.com/aws-backup/latest/devguide/whatisbackup.html

Our Backup plan is called xyze-prod-backup and can be viewed by clicking on ‘Manage Backup plans’ from the AWS Backup dashboard.

I’ve set up a backup rule called ‘DailyBackups’ which are kept for 7 days in EBS as most recent backups are likely to be the ones that may be required to restore a server. I’ve written a boto3 script using their API to copy weekly backups to their ‘Deep Archive’ tape backups. Keeping the weekly backups on the Deep Archive for 12 months will save about $140 per month from our AWS bill and can be restored in less than 12 hours if needed. The bucket lifecycle rules are set to: keep for a year and then delete, and the backups will transition from standard to deep storage right away.

They’re based on snapshots which are also stored as AMI’s which allow for easy restores instead of messing with snapshots and volumes like before.

What to backup (aka Resource assignments) hasn’t changed and uses the ‘Daily’ tags. To add an instance to the backup plan all we do is tag an instance as ‘Daily’ under the Backup tag.

My boto3 script requires the latest python so the easiest thing to do is to run it in a virtual environment which can be set up as shown. In the end I had added it to Lambda running once a week which meant we didn’t have to run an additional backup server like before. Unfortunately Lambda only lets your function operate for 15 mins at a time which wasn’t enough for the whole script to run as I’d introduced a wait between each one as there was a maximum gigabytes of concurrent copying that you could use at a time. I may modify the script but for now I’ve created a new instance which, like the previous one, is called backup.cloud.xyze. The script runs once a week as a cronjob on the ubuntu user. Apparently lifecycle transitions are queued before midnight UTC so 10pm was chosen so standard storage will cost very little. This failed once so the time was put back to 7pm. I’ve set it to power down after the backup completes and sends an email. The minutes sleep in the crontab is because Postfix needs some time to send us the email. I’ve created a small Lambda function called boot-backup-instance and created a cronjob in EventBridge which starts backup.cloud.xyze at 6.45pm every Friday which should be enough time for it to initialise before the backup starts using the cronjob below:

# m h  dom mon dow   command
0 19 * * 5 cd /home/ubuntu/newscripts && source env/bin/activate && python3 ./copy-ami.py 2>&1 | /usr/bin/mailx -s "Weekly backup to glacier. Please keep in public-support-internal" [email protected] && /usr/bin/sleep 60 && /usr/bin/sudo /usr/sbin/poweroff

At first the script failed as ‘source’ is only in bash not dash so I had to reconfigure using ‘sudo dpkg-reconfigure dash’ (which is preferable to alternatives according to the internet) and then it failed again as the backup user needed the additional IAM permissions detailed here:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebsapi-permissions.html
I was using my permissions originally and Lambda was internal so that’s why its different. The backup user has EC2 & S3 full access too.

sudo apt-get install python3-venv
python3 -m venv env
source env/bin/activate
pip3 install boto3
export AWS_PROFILE=xyze

To run the copy-ami.py script below we need to export our AWS credentials.

#!/usr/bin/env python
#James Holland September 2021

import boto3
import datetime
import json

#Uncomment for Lambda
#def lambda_handler(event, context):

# A counter was added for readability when cron sends email
count = 0
s3 = boto3.client('s3')
client = boto3.client('ec2')

#Get yesterdays date as today's backups might not have been done yet
date_filter = (datetime.datetime.now() - datetime.timedelta(days=1))
#This is only needed if archiving legacy backups as was done initially and is always one less day than the above date
date_filter_minus = (datetime.datetime.now() - datetime.timedelta(days=0))

#Documented here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.describe_images
response = client.describe_images(
Filters=[
       {
           'Name': 'tag:Backup',
           'Values': [
               'Daily',
           ]
       },
   ],
)

for ami in response['Images']:

   ami_creation_date_str = ami['CreationDate']
   ami_creation_date = datetime.datetime.strptime(ami_creation_date_str, "%Y-%m-%dT%H:%M:%S.%fZ")
   ami_image_id = eval(json.dumps(ami['ImageId']))
   ami_image_name = ami['Name']
   ami_image_id_bin = eval(json.dumps(ami['ImageId'])) + ".bin"
   name = [tag['Value'] for tag in ami['Tags'] if tag['Key'] == 'Name'][0]
   if datetime.datetime.timestamp(ami_creation_date) > datetime.datetime.timestamp(date_filter) and datetime.datetime.timestamp(ami_creation_date) < datetime.datetime.timestamp(date_filter_minus):
#Print the result to standard output to maybe send by email - but not implemented yet
       count = count + 1
       print(count, ami_image_id_bin, ami_creation_date, name)

#Documented here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.Client.create_store_image_task
       copyami = client.create_store_image_task(
       ImageId=ami_image_id,
       Bucket='xyze-backups',
       S3ObjectTags=[
       {
               'Key': 'Name',
               'Value': name
           },
           {
               'Key': 'Backup-Date',
               'Value': ami_creation_date_str
           },
       ],
       DryRun=False
       )

#A waiter was added because doing them all at once exceeded the limit imposed by Amazon
#Now the bucket is polled every 30 seconds to see if the backup file is there before doing the next one
#I've included an hour's worth of checking because sometimes the copying pauses for minutes at a time
#Documented here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html?highlight=waiter#S3.Waiter.ObjectExists
       waiter = s3.get_waiter('object_exists')
       waiter.wait(
       Bucket='xyze-backups',
       WaiterConfig={
          'Delay': 30,
          'MaxAttempts': 120
       },
       Key=ami_image_id_bin
       )

Restore from backup

Hopefully the backup you need will be from the last 7 days. All you need to do here is to go to the AMI images on the EC2 dashboard on the AWS console, sort for the date you want, choose the image and launch it as a new instance. You can check what the instance family is from the current instance.

Restoring from cold storage is a bit more involved particularly as this method is new and Amazon hasn’t developed it much yet. Lets say you want to restore a previous copy of our backup box. This xargs cludge will search through all the backups and grep the results to your terminal as shown:

aws s3 ls s3://xyze-backups | awk '{print $4}' | xargs -n1 --verbose -I {} aws s3api get-object-tagging --bucket xyze-backups --key {} | grep -B5 backup.cloud.xyze

aws s3api get-object-tagging --bucket xyze-backups --key ami-00323c135ac547f5b.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-006955f1847ecac96.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-018110dd5440eeb60.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-01c721b7baa2f09d9.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-037206e2c482b8c07.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-0423091a04eed102e.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-054d2d2434c6acd87.bin
           "Key": "Backup-Date",
           "Value": "2021-09-01T07:07:54.000Z"
       },
       {
           "Key": "Name",
           "Value": "backup.cloud.xyze"
aws s3api get-object-tagging --bucket xyze-backups --key ami-05ce0d7981dfa3afb.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-0615d33a9898f5a41.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-06eed532a0d19e117.bin
aws s3api get-object-tagging --bucket xyze-backups --key ami-07f0a939f8d408d3f.bin

Choose the date you’re after and the ami key is shown above the grepped date. In our example we then use the aws cli to restore the ami image. Again we’re utilising the s3api. I’ve set the days the restored object will expire to 3 days when I am next on shift. A restored object is charged at the standard rate so don’t set it any more than necessary. If you click on the object in the s3 dashboard you will see a ‘Restoration in progress’ dialog. The restoration is normally complete within 12 hours but in practice can be shorter. You can check on its progress using ‘s3api head-object’ as shown.

When its complete you can run the ‘create-restore-image-task’ and the object will soon appear in the AMI images on the EC2 dashboard and as before you can now launch a new instance from the backup.

aws s3api restore-object --bucket xyze-backups --key ami-054d2d2434c6acd87.bin --restore-request '{"Days":3,"GlacierJobParameters":{"Tier":"Standard"}}'

aws s3api head-object --bucket xyze-backups --key ami-054d2d2434c6acd87.bin

aws ec2 create-restore-image-task --bucket xyze-backups --name ami-backup.cloud.xyze --object-key ami-054d2d2434c6acd87.bin

Leave a comment