4. December 2016 15:42
by Aaron Medacco
1 Comments

AWS VPC Basics for Dummies

4. December 2016 15:42 by Aaron Medacco | 1 Comments

AWS VPC (Virtual Private Cloud), one of the core offerings of Amazon Web Services, is a crucial service that every professional operating on AWS needs to be familiar with. It allows you to gather, connect, and protect the resources you provision on AWS. With VPC, you can configure and secure your own virtual private network(s) within the AWS cloud. As an administrator, security should be at the top of the priority list, especially when others are trusting you with the technology that powers their business. And if you do not know how to secure resources using VPC, you have no business administering cloud infrastructure for anyone using Amazon Web Services. The following is a high level outline (using a simple web application architecture) for those who are new to cloud or unfamiliar with the service and is by no means comprehensive of everything AWS VPC has to offer.

VPC 1

VPCs, which are the virtual networks you create using the service, are region specific and can include resources across all availability zones within the same region. You can think of a VPC as an apartment unit for an apartment building. All of your belongings (your virtual instances, databases, storage) are separated from other tenants (other AWS customers) living in the complex (the AWS cloud). When you first create a VPC, you will be asked to provide a CIDR block to define a collection of available private IP addresses for resources within your virtual network.

In this example, we'll use 192.168.0.0/16.

Later, we'll need to partition this collection of IP addresses into groups for the subnets we'll create.

However, since we are hosting a web application that needs to be public, we'll first need a way to expose resources to the internet. AWS VPC allows you to do this via Internet Gateways. This is pretty self-explanatory using the web console. You simply create an internet gateway and attach it to your VPC. You should know that any traffic going to and coming from the internet to your resources will go thru the Internet Gateway.

Moving down a layer, the next step is to define our subnets. What is that?

A subnet is just a piece of a network (your VPC). It is a logical grouping of connected resources. In the apartment analogy, it's like a bedroom. AWS VPC allows you to select whether you want your subnets to be private or public. The difference between whether a subnet is public or private really just comes down to whether the subnet has a route to an internet gateway or not. Routes are defined in route tables and each subnet needs a route table.

What is a route table, you ask? Route tables define how to connect to other resources on the network (VPC). They are like maps, giving directions to the resources for how to get somewhere in the network. By default, your subnets will have a route in their route table that allows them to get to all other resources within the same VPC. However, if you do not provide a route to the Internet Gateway associated to the VPC, your subnet is considered private. If you do provide that route, your subnet is considered public. Simple.

Below is a diagram of the VPC explained so far. Notice that the IP ranges of the subnets (192.168.0.0/24 and 192.168.1.0/24) are taken from the pool of IP addresses for the VPC (192.168.0.0/16). They must be different since no two resources can have the same private IP address in the virtual network. The public and private subnets exist in different availability zones to illustrate that your network can and should (especially, as you seek high availability) span multiple zones in whichever region you provision your resources.

 VPC Diagram 1

So how do we secure the network we've established so far?

NACLs (Network Access Control Lists) are one of the tools Amazon Web Services provides for protecting resources from unwanted traffic. They are firewalls that act at the subnet level. NACLs are stateless, which means that return traffic is not implied. If communication is allowed into the subnet, it doesn't necessarily mean that the response communication back to the sender is allowed. Thus, you can define rules to allow or deny traffic that is either inbound or outbound. These rules exist in a list that is ordered by rule number. When traffic attempts to go in or out of the subnet, the NACL evaluates the rules in order and acts on the first match, whether it is allow or deny. Rules further down are not evaluated once a match is found. If no match is found, traffic is denied (by the * fallback rule). The following is what a NACL might look like in the AWS management console.

NACL Example

In this example, inbound traffic is evaluated to see if it is attempting to use SSH on port 22 from anywhere (0.0.0.0/0). If this is true, the traffic is denied. Then, the same traffic is checked to see if RDP on port 3389 is being attempted. If so, it is denied. If neither is true, the traffic is allowed in because of rule 200, which allows all traffic on all ports from anywhere. Notice there are gaps between rule numbers. It's good practice to leave some space between rule numbers (100, 150, 200) so you can come back later and place rules in between those already existing. That is why you don't see rules ordered 1, 2, 3...where if you needed to insert a rule in between two others, you would have to move a lot of rules to achieve the correct configuration.

This is just an example NACL. You'd likely want to be able to SSH or RDP into your EC2 instances from a remote location, like your office network or your home, so you wouldn't use this configuration which would obviously prevent that.

VPC Diagram 2

Now we can add resources to our protected subnets. We'll have one web server instance in our public subnet to handle web requests coming from the internet, and one database server our web application depends on in the private subnet. The infrastucture for a real world web application would likely be more sophisticated than this, accounting for things such as high availability at both the database and web tier. You'd also see an elastic load balancer combined with an auto scaling group to distribute traffic to multiple web servers so no particular resource handling requests gets overwhelmed.

Great, so how do we secure our resources once we are within the subnet? That is where security groups come in.

Security groups are the resource level firewalls guarding against unwanted traffic. Similar to NACLs, you define rules for the traffic you want to be allowed in. By default, when a security group is created, it has no rules so all traffic is denied. This is to help you implement best practice which is to configure the least amount of access necessary. Unlike NACLs, security groups are stateful, so when communication is allowed in, the response communication is allowed out. When you create an inbound rule, you provide the type, protocol, port range, and source of the traffic that should be allowed in. For example, if our web server instance was running Windows Server 2016, I'd create a rule for RDP, protocol TCP (6), using port 3389, where the source is my office IP address. When the security group is updated, I'll be able to administer my web server remotely using RDP. You can also attach more than one security group to a resource. This is useful when you want to combine multiple access configurations. For instance, if you provisioned four EC2 instances running in a subnet and wanted to have RDP access to all of them from your office, but you also wanted FTP access from your home to two of the instances, you could configure one security group for the RDP access, and another for the FTP access, and attach both security groups to the instances requiring both your home FTP and office RDP access.

VPC Diagram 3

Those are really the bare essentials for configuring a VPC in Amazon Web Services. If you want another layer of security, there is nothing stopping you from using a host based firewall like Windows Firewall on your virtual instances. Additionally, your going to want to create an Elastic IP for the instances you want publicly accessible, which in this case would be our EC2 instance acting as a web server. Otherwise, you'll find that the public IP address of your instance can change which is definitely not going to be okay for your DNS if you are running a web application.

AWS VPC also allows you to configure dedicated connectivity from your on-premises to the AWS cloud using Direct Connect, setup VPN connections from your network to your cloud network, create NAT Gateways, DHCP Option Sets, and setup VPC Peering to allow connectivity between VPCs. These are great tools and knowing what they are and when you would use them is important, but they aren't necessary for all use cases.

A final recommendation is to tag and name your VPC configured resources. Name your security groups, name your NACLs, name your route tables, name everything. As your infrastructure grows on Amazon Web Services, you will be driven insane if everything has names like "acl-1ad75f7c" that provide no context into what its purpose is. Therefore, its a good idea to name everything from the start so things stay organized and other users (particular those that weren't there when you set things up) of the AWS account will have a clue when they need to make changes.

For even more detail on VPCs, I'd recommend the Pluralsight course, AWS VPC Operations by Nigel Poulton which you can find here.

Cheers!

20. November 2016 18:02
by Aaron Medacco
0 Comments

Using S3 Lifecycles to Save Storage Costs w/ AWS

20. November 2016 18:02 by Aaron Medacco | 0 Comments

The company you work for just moved to the AWS (Amazon Web Services) cloud, and you've been put in charge of migrating the existing infrastructure. An important element of the business is storing customer related documents, photos, and other static content for 5 years that cannot be lost under any circumstances. You chose to use Amazon's S3 service because of this constraint after hearing about it's 99.999999999% object durability guarantee and everything is running smoothly. However, 6 months after the migration, management is complaining that the bill from Amazon Web Services keeps getting higher and higher each month. You login and check the billing dashboard to find the S3 service cost is the culprit, climbing higher with each additional object stored. You know some of the objects migrated are from over 5 years ago and delete those to free up space, but suspect the cost will still be too high. What are you supposed to do? How can you further reduce cost in this scenario? You can't, give up.

Just kidding...

You will likely need to leverage the additional storage class offerings within the S3 service. AWS allows you to shave your bill down by being clever with how you store your data in S3. There are 4 storage classes available: S3 Standard, S3 Standard - Infrequent Access, S3 - Reduced Redundancy and Amazon Glacier. (for cold storage, get it?) By default, Amazon stores your objects using S3 Standard, which is the most expensive option in terms of just storing files. S3 Standard boasts high durability (you won't lose the objects) and high throughput (access is fast). S3 Standard - Infrequent Access is very similar to S3 Standard, but is designed for, you guessed it, less access than S3 Standard. You pay less to store the objects using this storage class, but you pay more for each access request. S3 - Reduced Redundancy is another way to save money over S3 Standard. Your objects are less durable (99.99%, instead of 99.999999999%) because Amazon will not replicate them the same way they do for S3 Standard, but you will save on cost. Amazon Glacier is the cold storage option designed for data archives and information that will hardly ever be accessed. Currently, AWS charges $0.007 per gigabyte stored per month on Amazon Glacier. Data stored in Glacier is available but not readily accessible. You will wait hours in order to retrieve an object, but the savings is substantial if you don't plan to do it very often.

S3 1

Since you know you can get rid of files once they are over 5 years old, you can use the lifecycle feature of S3 to automate actions to your stored objects. In this case, we'll create a lifecycle rule to delete objects in our bucket 5 years after they are put. Then, we'll further optimize the rule to squeeze additional cost savings by taking advantage of the different storage options.

1) Navigate to the bucket where the objects you want to delete after a set time live. Make sure the "Properties" tab in the top right is selected.

step 1

2) Expand the "Lifecycle" section and click the "Add Rule" button.

You'll see the following interface for configuring a lifecycle rule on your objects. AWS allows you to set rules on an entire bucket of objects or only those with a given prefix. Remember, S3's structure is flat, and not like a file system even though the UI may convince you otherwise. "Folders" are really just object name prefixes. Think of it like you would an index for a database table, where prefixes allow S3 to access objects quickly using the name with prefix as a key. I will select the whole bucket but most real world scenarios will require certain actions be taken on specific prefixes, and you'll need to define multiple lifecycle rules for additional granularity.

step 2

3) Select what group of objects you want to apply the rule to and click the "Configure Rule >" button.

 step 3

Here you can specify what actions S3 should take on objects you've selected. In our hypothetical, we know we can delete customer files after 5 years, so check the "Permanently Delete" action and specify 5 years worth of days (1825). Amazon will automatically get rid of the files when appropriate and we no longer need to constantly keep coming back to the console to manually remove them.

Great! We've automated deletion but we haven't really impacted the bill yet since we're still using S3 Standard for each object. This is where we can be creative and leverage the different options of S3 to save our business money. Suppose you also discover your company only accesses these files regularly for 1 month after initially storing, and that it's very rare a representative would need to pull a file that is 6 or more months old. The 5 year policy comes from a legal obligation your company has to retain customer records.

step 4

Okay. We'll use the knowledge that Standard - Infrequent Access is less costly for objects receiving smaller request amounts, and specify a transition to Standard - Infrequent Access after 1 month of the object's initial storage date. Furthermore, we'll then configure a transition to move objects from Standard - Infrequent Access to Amazon Glacier after 6 months. We'll now be paying less than a penny per gigabyte per month for the majority of our stored objects.

4) Configure the rules that make sense for your scenario and click the "Review" button.

Review your changes to make sure they are correct. You should give your lifecycle rule a name, especially if you are going to be creating more than one.

5) Click the "Create and Activate Rule" button.

step 6

That's it! You've successfully added a lifecycle rule to your S3 storage to optimize costs. Obviously, this was a simple hypothetical and you'll need to tailor your lifecycle rules to suit the requirements of your situation, but this is one way you can optimize your S3 usage.

You will be hailed as a hero, management will kneel to your greatness, and next month will yield a reduced Amazon Web Services bill.

Cheers!

Copyright © 2016-2017 Aaron Medacco