25. March 2017 12:23
by Aaron Medacco
0 Comments

Ensuring AWS Resources in Your Account are Tagged w/ Names Using Config

25. March 2017 12:23 by Aaron Medacco | 0 Comments

If you're like me, you want everything in your Amazon Web Services account to be organized and well kept. Whether it be EC2 instances, VPCs, RDS instances, or security groups, I want context around the resources in my AWS environment so I know what I'm working with. Tagging accomplishes this by allowing you to ascribe attributes that you define to everything in your environment. The most common tag is simply "Name", which at a minimum, usually provides some insight into whether the instance is a web server, test instance, database server, cache, etc. 

Note: Don't name your instance Foo.

This becomes difficult when you have more than one person managing an account. From a practical standpoint, it's unreasonable to mandate that every single component to every thing you build in AWS have a name. For instance, if someone didn't tag a NACL attached to one of your subnets, it's probably not a big deal. In a utopia that would be nice, but would get in the way of getting things done. That being said, I don't think it's unreasonable to expect that infrastructure pieces such as EC2 instances, RDS instances, VPCs, EBS Volumes, ACM Certificates, ELBs, etc. always be tagged with a name.

AWS Config

The AWS Config service helps you ensure that practices such as tagging (among more important configurations like security and compliance) are maintained in your organization's AWS account. Simply specify what resources you want Config to record, select a predefined or new SNS topic to publish to, and create a rule defining what you want Config to keep tabs on. I'll assume you've gone thru the initial Config setup process for defining what resource you want recorded, the S3 bucket to store history and snapshots, and the SNS topic you want Config to publish to.

Adding a Config rule to monitor tagging:

  1. In your management console, navigate to the Config service.
  2. Click "Rules" in the sidebar.
  3. We're going to use an AWS managed rule, so browse the managed rule until you find "required-tags" and select it.
  4. Edit or accept the default name and description.
  5. Select "Resources" for the Scope of changes.
  6. Choose which AWS resources in your environment you want Config to monitor.
    By default, it's a large group, so you might want to customize this part unless you want to get notifications all day. 
  7. Under the rule parameters, change the value of "tag1Key" to "Name".
  8. Leave everything else, and click "Save".

 

That's it. Regardless of whether your environment adheres to this rule already, you'll likely receive several notifications right away. Config needs to evaluate the rule on the resources you selected, and mark them as COMPLIANT or NON_COMPLIANT. From this point, you can fix any NON_COMPLIANT resources and know that Config is tracking your environment going forward. The pricing for Config can be found here.

Cheers!

23. March 2017 00:18
by Aaron Medacco
3 Comments

Enforcing HTTPS For ASP.NET Applications Hosted on EC2 Behind an AWS Elastic Load Balancer

23. March 2017 00:18 by Aaron Medacco | 3 Comments

Note: This post addresses issues with infinite redirects when attempting to enforce HTTPS using an Application Load Balancer. Those using a Classic Load Balancer should not have this issue if they are passing HTTPS traffic to instances on port 443 and HTTP traffic to instance on port 80. Application Load Balancers only allow one forwarding port to instances, which can cause issues outlined in this post.

If you've ever managed ASP.NET web applications hosted on EC2 that are load balanced by an Application Load Balancer, you may have run into issues enforcing users to communicate over SSL. Having SSL termination occur on the ELB is common practice, and removes some work from the instances themselves. In fact, this is a primary use for the AWS Certificate Manager. Find out how to get an SSL certificate for your ELB in my earlier post on provisioning an SSL certificate using AWS Certificate Manager.

 AWS Elastic Load Balancer

The traditional method for enforcing requests to occur with HTTPS instead of HTTP with an ASP.NET web application is to use the URL Rewrite Module within IIS. A quick Google search for how to do this will result in numerous examples using web.config entries that look like or are very similar to the following:

<rule name="HTTP to HTTPS redirect" stopProcessing="true">
  <match url="(.*)" />
    <conditions>
      <add input="{HTTPS}" pattern="off" ignoreCase="true" />
    </conditions>
  <action type="Redirect" redirectType="Permanent" url="https://{HTTP_HOST}/{R:1}" />
</rule>

This can result in a series of infinite redirects when you try to visit a webpage. This happens because SSL is being terminated on the Application Load Balancer. That means that when requests come in to the load balancer over HTTPS, they are sent to the EC2 instances over HTTP. Looking at our rewrite rule configuration, these requests will then be redirected to use HTTPS, which will come in via the load balancer, ad infinitum. Thus, regardless of what protocol users visit your website with, they will never land on a page due to the cycle.

However, this doesn't mean we can't use the URL Rewrite Module for IIS to enforce SSL usage of our site. I'm going to give some credit to Ross Pace at Stack Overflow for answering his own question regarding this very issue. In his answer, you'll find the following rule which accomplishes what we're after:

<rule name="Force Https" stopProcessing="true">
   <match url="healthcheck.html" negate="true" />
   <conditions>
       <add input="{HTTP_X_FORWARDED_PROTO}" pattern="https" negate="true" />
   </conditions>
   <action type="Redirect" url="https://{HTTP_HOST}{REQUEST_URI}" redirectType="Permanent" />
</rule>

This gets around our issue. But why?

You can see the rule does not take effect (negate="true") if the {HTTP_X_FORWARDED_PROTO} header has a pattern of HTTPS. This header forwards the protocol that was sent to the application load balancer. Therefore, if the user used HTTPS to connect to the load balancer, there is no need to redirect. If the value is anything else, that means we need to perform the redirect. Simple.

You'll also notice that this rule is ignored if the URL matches "healthcheck.html". As explained in his response, this is required to prevent the ELB health checks from failing because of URL Rewrite Module. You can substitute the URL value for whichever page you are using for your own ELB health check. 

Hopefully this saves some .NET developers from walking outside with a pistol. :)

Cheers!

21. March 2017 00:28
by Aaron Medacco
0 Comments

Moving Load From Your Master Database: Elasticache or RDS Read Replica?

21. March 2017 00:28 by Aaron Medacco | 0 Comments

You have a database managed thru the AWS RDS service. Your application's a massive success and your database's workload is now heavy enough that users are experiencing long response times. You decide that instead of scaling vertically by upgrading the instance type of your RDS database, you'd like to explore implementing Elasticache or an RDS read replica into your architecture to remove some of the load from your master database.

But which one should you choose?

Like always, it depends.

RDS read replicas and Elasticache nodes both enhance the performance of your application by handling requests for data instead of the master database. However, which one you choose will depend on your application's requirements. Before I dive too deep into how these requirements will shape your decision, let's talk about what we are comparing first.

Elasticache Or Read Replica

Note: For those already familiar with Elasticache and RDS features, feel free to skip down.

Elasticache Clusters

Elasticache is a managed service provided by AWS that allows you to provision in-memory data stores (caches) that can allow your applications to fetch information with blazing speed. When you use Elasticache, you create a cluster of nodes, which are blocks of network attached RAM. This cluster can then sit in between your application tier and your data tier. When requests that require data come in, they are sent to the cluster first. If the information exists in the cache and the cluster can service the request, it is returned to the requester without requiring any database engine steps or disk reads. If the information does not exist in the cache, it must be fetched from the database like usual. Obviously, their is no contest in performance between fetching data from memory vs. disk which is why the Elasticache service can really give your applications some wheels while also reducing the load on your database.

RDS Read Replicas

RDS read replicas offer an alternative to vertical database scaling by allowing you to use additional database instances to serve read-only requests. Read replicas are essentially copies of your master database where write operations are prohibited except thru asynchronous calls made from the master after the master has completed a write. This means that read replicas may return slightly stale data when serving requests, but will eventually catch up as write propagations invoked from the master complete. Read replicas have additional benefits as well. Since they can be promoted to serve as the master database should the master fail, you can increase your data's availability. The database engine you choose also determines available features. For instance, databases utilizing the MySQL database engine can take advantage of custom read replica indexes which only apply to the replicas. Want to take advantage of covering indexes for expensive queries without forcing your master database to maintain additional writes? With this feature, you'd be able to.

Great. So which is better?

Clearly, both of these services can reduce your master database's workload, while also boosting performance. In order to choose which service makes most sense for your situation, you'll need to answer these kinds of questions:

Can my application tolerate stale data? How stale? 5 minutes? 5 hours?

If you want to store your data in Elasticache nodes for long periods of time, and you need almost exactly current data, read replicas are likely the better option. Read replicas will lag behind the master database slightly, but only by seconds. On the other hand, if you can tolerate data that is more stale, Elasticache will outperform read replicas performance-wise (if data exists in the cache) while also preventing requests from hitting the master database. 

Are the queries generated by my application static? That is, are the same queries run over and over all day? Are they dynamically constructed, using items like GETDATE()?

If the queries being run by your application are ever-changing, your in-memory cache won't be very useful since it will have to continue to update its data store to serve requests it otherwise cannot satisfy. Remember, the cache can only act on queries it recognizes. Not only does this not prevent calls to the database, but it can actually degrade performance because you are effectively maintaining a useless middle man between your application and data tiers. However, Elasticache clusters will shine over read replicas in cases where queries do not change and request the same data sets over and over.

How much data is my application asking for?

The volume of data your Elasticache clusters can store will be limited by the amount of memory you allocate to the nodes in your cluster. If your queries return huge result sets, your cache is going to be occupied very quickly. This can create a scenario where queries constantly compete for the available memory by overwriting what's existing. This won't be very helpful, especially if you aren't willing to dedicate (and pay for) additional memory. Alternatively, read replicas will be able to serve the request regardless of the returned data size, won't incur the penalty of a wasted middle man trip, and will still prevent the request from hitting the master.

What would I do?

My approach would be to consider how effectively you'll be able to use an in-memory cache. If, given all you know about your application, you decide the cache will be able to catch a large portion of the read requests coming in, go with the Elasticache cluster. If you're right, you should notice a difference right away. If you're wrong you can always fall back to implementing read replicas. I've avoided pricing in this post, which is obviously important when making architectural decisions. Elasticache pricing and RDS pricing is available on Amazon Web Services' site. Readers will need to do their own analysis of the kinds of instances and node types they'd provision and how their costs compare to make a decision.

If anyone knows of some additional considerations I should include, leave a comment or reach out to me via e-mail: acmedacco@gmail.com. Shout out to the last guy who caught my miscalculation of pricing in the Infinite Lambda Loop post

Cheers!

19. March 2017 12:57
by Aaron Medacco
0 Comments

Renaming Your SQL Server Databases on AWS RDS

19. March 2017 12:57 by Aaron Medacco | 0 Comments

Last night, I was converting the data source for this blog from using local files to using databases managed in AWS RDS. Somewhere along the way I decided I wanted to rename one of the databases, but found I couldn't just right click and rename it thru SQL Server Management Studio. Running the sp_renamedb system stored procedure or the ALTER DATABASE command didn't work either. 

Rename RDS Database

After doing some investigation, it looks like RDS went several years without allowing you to do this. And while you still can't use any of these methods to do a database rename, they have since added a way with a custom stored procedure:

EXEC rdsadmin.dbo.rds_modify_db_name N'<OldName>', N'<NewName>'

There are some additional steps required if you are using Multi-AZ with Mirroring Deployment. You can find Amazon's documentation for this here.

Typically, I feel the AWS documentation is pretty clear and thorough, but I think a lot of users will end up on this page, scroll down to "Limits for Microsoft SQL Server DB Instances", read:

You can't rename databases on a DB instance in a SQL Server Multi-AZ with Mirroring deployment.

and conclude that they are in fact, screwed, when this isn't the case. Just took a little more digging than usual.

Cheers!

13. March 2017 01:01
by Aaron Medacco
1 Comments

What Happens When Your Lambda Functions Execute in an Infinite Loop?

13. March 2017 01:01 by Aaron Medacco | 1 Comments

Amazon Web Services' serverless compute product, Lambda, is an event-driven service that executes code without requiring the customer to manage servers. It can be triggered in response to a variety of different events or manually triggered via the AWS CLI, Shell tools, SDKs, and web console. Lambda functions can do just about anything within your AWS environment, given the appropriately configured IAM role.

So what happens if you architect (or misarchitect) a solution where your Lambda functions execute more than you anticipated? What if your Lambda functions are written so that they call themselves or other Lambda functions where, given a limitless infrastructure to facilitate, execution should logically never stop?

While the AWS platform is not actually limitless, you certainly can rack up a massive amount of compute if you're not careful. Given the pricing model of Lambda, that massive compute will become a massive bill that punching the developer who wrote the functions won't pay for.

Note: There are cases where recursive Lambda invocations are intentional and do solve problems. This post addresses concern where infinite looping is unintentional.

AWS Lambda Infinite Loop

For new and old AWS accounts that have never requested a limit increase, the maximum number of concurrent executions across all your Lambda functions in a region is restricted to 100. Apparently, Amazon has thought of this scenario, and implemented some safety rules to prevent customers from coding themselves into a real problem.

"The default limit is a safety limit that protects you from costs due to potential runaway or recursive functions during initial development and testing."

You can find Amazon's documentation on Concurrent Exections w/ Lambda here.

However, I decided to try it out myself, anyways. I did this by writing a small Lambda function which invoked itself:

var AWS = require("aws-sdk");

exports.handler = (event, context, callback) => {
    var lambda = new AWS.Lambda();
    var params = {
        FunctionName: "Infinite_Lambda_Loop",
        InvocationType: "Event",
    };
    lambda.invoke(params, function(err, data){
        if (err) {
            console.log(err, err.stack);
        }
        else {
            console.log(data);
        }
    })
};

I used a timeout of 3 seconds and the minumum value for memory (128 MB) in all of my trials. I ran each trial in a separate region to get a clean graphing dashboard from Lambda. Therefore, I am assuming that Lambda performs similarly across all regions.

Here are my Lambda statistics as seen in my dashboard after running my function in Ohio after 60 seconds:

Lambda Invocations

In Northern California for 120 seconds (separate trial):

Lambda Invocations 2

In Oregon for 300 seconds (separate trial):

Lambda Invocations 3

I found that the only way to end the madness was to delete the function. If anybody else knows a different way, leave a comment.

After validating the values in my graphs with those found in CloudWatch, my data is as follows:

Region Seconds Invocations Duration (ms) Memory / Execution (MB)
Ohio 60 573 41,731 128
N. California 120 1,523 68,907 128
Oregon 300 3,021 190,985 128

Curiously, the count of invocations and duration change was not exactly linear, but from this data we can estimate that about 3,000 executions take place during 5 minutes of a very simple Lambda function executing itself forever. Again, this assumes that Lambda performance is uniform across regions. If we calculate our estimated cost for this, we get:

3,000 * $0.0000002 / request = $0.00006 Lambda cost for requests.

Price per 100ms with 128 MB = $0.000000208, so 1,910 * $0.000000208 = $0.00039728 Lambda cost for duration.

$0.00006 + $0.00039728 = $0.00045728 cost for my infinite loop executing for 5 minutes.

$0.00045728 * 288 ~ $0.13 cost for my infinite loop executing for 1 day.

$0.13 * 7 ~ $0.91 cost for my infinite loop executing for 1 week.

$0.13 * 30 ~ $3.90 cost for my infinite loop executing for 1 month.

These costs do not reflect any savings gained by Lambda free tier discounts. Information about reducing costs per the free tier can be found here.

Therefore, I think it's safe to say you won't kill the budget with an accidental infinite loop (assuming your concurrency limit is the default 100).

Keep in mind I've made assumptions in my analysis of this data. Additionally, the pricing of AWS products changes frequently, and cost estimation for this scenario was done using Lambda pricing at the time of writing.

Cheers!

Copyright © 2016-2017 Aaron Medacco