20. November 2016 18:02
by Aaron Medacco

Using S3 Lifecycles to Save Storage Costs w/ AWS

20. November 2016 18:02 by Aaron Medacco | 0 Comments

The company you work for just moved to the AWS (Amazon Web Services) cloud, and you've been put in charge of migrating the existing infrastructure. An important element of the business is storing customer related documents, photos, and other static content for 5 years that cannot be lost under any circumstances. You chose to use Amazon's S3 service because of this constraint after hearing about it's 99.999999999% object durability guarantee and everything is running smoothly. However, 6 months after the migration, management is complaining that the bill from Amazon Web Services keeps getting higher and higher each month. You login and check the billing dashboard to find the S3 service cost is the culprit, climbing higher with each additional object stored. You know some of the objects migrated are from over 5 years ago and delete those to free up space, but suspect the cost will still be too high. What are you supposed to do? How can you further reduce cost in this scenario? You can't, give up.

Just kidding...

You will likely need to leverage the additional storage class offerings within the S3 service. AWS allows you to shave your bill down by being clever with how you store your data in S3. There are 4 storage classes available: S3 Standard, S3 Standard - Infrequent Access, S3 - Reduced Redundancy and Amazon Glacier. (for cold storage, get it?) By default, Amazon stores your objects using S3 Standard, which is the most expensive option in terms of just storing files. S3 Standard boasts high durability (you won't lose the objects) and high throughput (access is fast). S3 Standard - Infrequent Access is very similar to S3 Standard, but is designed for, you guessed it, less access than S3 Standard. You pay less to store the objects using this storage class, but you pay more for each access request. S3 - Reduced Redundancy is another way to save money over S3 Standard. Your objects are less durable (99.99%, instead of 99.999999999%) because Amazon will not replicate them the same way they do for S3 Standard, but you will save on cost. Amazon Glacier is the cold storage option designed for data archives and information that will hardly ever be accessed. Currently, AWS charges $0.007 per gigabyte stored per month on Amazon Glacier. Data stored in Glacier is available but not readily accessible. You will wait hours in order to retrieve an object, but the savings is substantial if you don't plan to do it very often.

S3 1

Since you know you can get rid of files once they are over 5 years old, you can use the lifecycle feature of S3 to automate actions to your stored objects. In this case, we'll create a lifecycle rule to delete objects in our bucket 5 years after they are put. Then, we'll further optimize the rule to squeeze additional cost savings by taking advantage of the different storage options.

1) Navigate to the bucket where the objects you want to delete after a set time live. Make sure the "Properties" tab in the top right is selected.

step 1

2) Expand the "Lifecycle" section and click the "Add Rule" button.

You'll see the following interface for configuring a lifecycle rule on your objects. AWS allows you to set rules on an entire bucket of objects or only those with a given prefix. Remember, S3's structure is flat, and not like a file system even though the UI may convince you otherwise. "Folders" are really just object name prefixes. Think of it like you would an index for a database table, where prefixes allow S3 to access objects quickly using the name with prefix as a key. I will select the whole bucket but most real world scenarios will require certain actions be taken on specific prefixes, and you'll need to define multiple lifecycle rules for additional granularity.

step 2

3) Select what group of objects you want to apply the rule to and click the "Configure Rule >" button.

 step 3

Here you can specify what actions S3 should take on objects you've selected. In our hypothetical, we know we can delete customer files after 5 years, so check the "Permanently Delete" action and specify 5 years worth of days (1825). Amazon will automatically get rid of the files when appropriate and we no longer need to constantly keep coming back to the console to manually remove them.

Great! We've automated deletion but we haven't really impacted the bill yet since we're still using S3 Standard for each object. This is where we can be creative and leverage the different options of S3 to save our business money. Suppose you also discover your company only accesses these files regularly for 1 month after initially storing, and that it's very rare a representative would need to pull a file that is 6 or more months old. The 5 year policy comes from a legal obligation your company has to retain customer records.

step 4

Okay. We'll use the knowledge that Standard - Infrequent Access is less costly for objects receiving smaller request amounts, and specify a transition to Standard - Infrequent Access after 1 month of the object's initial storage date. Furthermore, we'll then configure a transition to move objects from Standard - Infrequent Access to Amazon Glacier after 6 months. We'll now be paying less than a penny per gigabyte per month for the majority of our stored objects.

4) Configure the rules that make sense for your scenario and click the "Review" button.

Review your changes to make sure they are correct. You should give your lifecycle rule a name, especially if you are going to be creating more than one.

5) Click the "Create and Activate Rule" button.

step 6

That's it! You've successfully added a lifecycle rule to your S3 storage to optimize costs. Obviously, this was a simple hypothetical and you'll need to tailor your lifecycle rules to suit the requirements of your situation, but this is one way you can optimize your S3 usage.

You will be hailed as a hero, management will kneel to your greatness, and next month will yield a reduced Amazon Web Services bill.


Copyright © 2016-2017 Aaron Medacco