5. December 2017 21:38
by Aaron Medacco

AWS re:Invent 2017 - Day 3 Experience

5. December 2017 21:38 by Aaron Medacco | 0 Comments

The following is my Day 3 re:Invent 2017 experience. Missed Day 2? Check it out here.

Out and early at 8:30 (holy ****!) and was famished. Decided to check out the Buffet at the Bellagio. Maybe it was just me, but I was expecting a little bit more from this. Most things in Las Vegas are extravagant and contribute to an overall spectacle, but the Bellagio Buffet made me think of Golden Corral a bit. Maybe it gets better after breakfast, I don't know. 

AWS re:Invent 2017

AWS re:Invent 2017

The plate of an uncultured white bread American.

Food was pretty good, but I didn't grab anything hard to mess up. Second trip back, grabbed some watermelon that tasted like apple crisp and ice cream. Not sure what that was about. Maybe the staff used the same knife for desserts and fruit slicing. From what I could tell, half the patrons were re:Invent attendees, either wearing their badge or the hoodie.

Walked back to my room to watch the Keynote by Andy Jassy, but only caught the last bit of it. After some difficulty getting the live stream to work on my laptop, watched him announce the machine learning and internet of things services. Those aren't really my wheelhouse (yet?), but seemed interesting nontheless. Succumbed to a food coma afterwards for a short nap.

Headed over to the Venetian to go back to the Expo for a new hoodie and for my next breakout session. AWS was holding the merchandise hostage if you didn't fill out evaluations for breakout sessions, so I couldn't get the hoodie until after I came back post-session. Good to know for next year. Back where the session halls were, got in the Reserved line for the Optimizing EC2 for Fun and Profit #bigsavings #newfeatures (CMP202) session. Talked with a gentleman while in line about the new announcements, specifically the S3 Select and Glacier Select features. I wasn't clear what the difference was between S3 Select and Athena and neither was he. I'll have to go try it out for myself.

AWS re:Invent 2017

Awaiting new feature announcements.

AWS re:Invent 2017

Great speaker as always. AWS always has good speakers.

AWS re:Invent 2017

More talk about Reserved and Spot Instances.

Best thing about this session was the announcements of new features. The first one was a really helpful feature AWS added to the Cost Explorer within the management console that gives instance recommendations based on your account's historical usage. Having a tool like this that does cost analysis and recommendations is great, means I don't have to. I pulled up the SDK and AWS CLI reference while he was demonstrating it, but couldn't find any methods where I could pull those recommendations using Lambda or a batch script. I figured it'd be useful to automate a monthly email or something that tells you that month's instance billing recommendations. Ended up talking to the speaker afterwards who said it's not available, but will be in the months to come. Nice!

Second announcement was regarding Spot Instances and being able to hibernate instances when they're going to be terminated. The way this was described was that hibernation acts the same way as when you "open and close your laptop". So if you are using a Spot Instance set to hibernate, when that instance gets terminated in the event another customer bids higher or AWS adds it back to the On-Demand instance pool, it will save state to EBS and when you receive it back, it will pick up where it left off instead of needing to completely re-initialize before doing whatever work you wanted. 

T2 Unlimited was also covered, which essentially allows you to not worry so much about requiring the credits for burst capacity of your T2 series of EC2 instances. The rest of the session covered a lot of cost optimization techniques that have been labored to death. Use Reserved Instances, use Spot Instances, choose the correct instance type for your workload, periodically check in to make sure you actually need the capacity you've provisioned, take advantage of serverless for cases where an always-on machine isn't necessary, and other tips of the "don't be an idiot" variety. Again, I must be biased since most of this information seems elementary. I think next year I need to stick to the 400-level courses to get the most value. That said, the presentation was excellent like always. I won't knock it just because I knew information coming in.

Found the shuttle a short walk from the hall, and decided to be lazy (smart) for once. Got back to the Bellagio for some poker before dinner, and came out plus $105. During all the walks back from the Aria to the Bellagio, I kept eyeballing the Gordon Ramsay Burger across the street at the Planet Hollywood, so I stopped in for dinner. 

AWS re:Invent 2017

Pretty flashy for a burger place.

AWS re:Invent 2017

I ate it all...No, I didn't. But wanted to try out the dogs and the burgers.

For a burger and hot dog place, I'd give it a 7/10. Probably would be a bit higher if they had dill pickles / relish, and honestly, better service. You can imagine this was pretty messy to eat, especially the hot dog, so I asked one of the girls upfront where the bathroom was to go wash my hands. The one across the hall was out of order (go figure), so I had to go to the one out thru some of the casino and next to P.F. Changs. I think the tables next to me thought I just walked out without paying. Heard them say "There he is." when I returned. Really? Do I look like a criminal? Yeah, I came to Vegas for a full week to rip off a burger joint.


24. August 2017 20:32
by Aaron Medacco

New Pluralsight Course: Getting Started with AWS Athena

24. August 2017 20:32 by Aaron Medacco | 0 Comments

After a few months of developing and recording content, my first Pluralsight course, Getting Started with AWS Athena, is live and published. A lot of work went into this, especially since I've never recorded video content of professional quality, so I'm relieved to finally cross the finish line. I never realized how much can go into producing an online course which has given me a newfound respect for my fellow authors. 

AWS Athena Get Started

Besides learning how to produce quality video content, I underestimated how much more I would learn about AWS and Athena. There's certainly a difference between knowing enough to solve your problem with AWS, and knowing enough to teach others how to solve theirs with it. 

For those interested in checking it out, you can find the course here. You'll need to have an active Pluralsight subscription, otherwise you can start a free trial. If you work in technology, the value of a subscription is pretty crazy given the amount of content available.

The course is separated into 7 modules:

  1. Exploring AWS Athena

    Sets the stage of the course. I speak to the value proposition of Athena, why you would want to use it, the features, supported data formats, limitations, and pricing model. If you're someone whose unfamiliar with Athena, this module's designed to give you a primer.

  2. Establishing Access to Your Data

    Athena's not very useful if you can't invoke it. In this module, I show you how to upload data to S3 and configure a user account with Athena access within IAM. Many will find this review, especially those practiced with Amazon Web Services, but it's a prerequisite before getting your hands dirty in Athena.

  3. Understanding AWS Athena Internals

    You never want to be in a place where things seem like magic. Here I address the technologies that operate underneath Athena, namely Apache Hive and the Presto SQL engine. If you've never used these tools to query data before, knowing what they are and how they fit within Athena is important. The only real barrier to entry for using Athena is the ability to write SQL, so I imagine a lot of users with no experience with big data technologies will be trying it out and this module gives a small crash course to help offset that.

  4. Creating Databases & Tables to Define Your Schema

    We start getting our hands dirty in this module. We talk about what databases and tables are within Athena's catalog and how they compare to those of relational databases. This one's pretty hands-on heavy as I demonstrate how to construct tables correctly using both the console or using third-party tools over a JDBC connection.

  5. Retrieving Information by Querying Tables with SQL

    Here we finally get to started eating our cake. I cover the Presto SQL engine in brief detail and show how easy it is to query S3 data using ANSI-SQL. Athena's built to be easy to get started, so by the end of this module, most will feel comfortable enough to start using Athena on their own datasets.

  6. Optimizing Cost and Performance Using Best Practices

    My favorite module of the course, tailored to those who want to get more performance and keep more of their money. I review what methods you can employ to improve query times and reduce the amount of data scanned. The 3 primary ways of doing this involve compression, columnar formats, and table partitioning. In a lot of cases, it's not as simple as "Just compress your data and win." or "Columnar formats are faster so just use that." and I talk about what factors are important when deciding on an optimization strategy for Athena workloads. I also demonstrate how you would transform data into a columnar format using Amazon Elastic MapReduce for those who may have never done it before.

  7. AWS Athena vs. Other Solutions

    Finally, I thought it would be interesting to discuss how Athena stacks up against other data services with the Amazon cloud. Knowing when to use each service is vital for anyone responsible for proposing solutions within AWS, so I felt some high-level, admittedly apples to oranges, comparisons would help steer viewers in the right direction.

Right, so go watch the course! And leave feedback! I want to keep improving the quality of the content I create, and comments are extremely helpful.


4. August 2017 01:44
by Aaron Medacco

Use Bzip2 Compression w/ AWS Athena

4. August 2017 01:44 by Aaron Medacco | 0 Comments

For those using Amazon Athena to query their S3 data, it's no secret that you can save money and boost performance by using compression and columnar data formats whenever possible. Currently, Amazon's documentation doesn't list Bzip2 compression yet as a supported compression format for Athena, however it's absolutely supported.

This was confirmed by Jeff Barr's post made on optimizing performance with Athena. You can see Bzip2 is a splittable data format which allows you to take advantage of multiple readers. Other compression formats that are not splittable don't have this benefit, so it stands to reason you should use Bzip2 if you aren't using a columnar format such as ORC or Parquet.

Amazon Athena

In this post, I'll show how easy it is to compress your data to this format. Once this is done, just upload your data to S3, define a schema within the Athena catalog which points to the location you upload the compressed files to, and query away.

For Windows users:

  1. Download 7-zip here. Once installed, you'll be able to invoke 7-zip from File Explorer by right-clicking on files > 7-Zip > Add to archive....
  2. Select "bzip2" as the Archive format with "BZip2" as the Compression method: 

    Compress To Bzip2
  3. Click OK. 

For Linux users:

  1. Open a command prompt and change your working directory to that where the files you want to compress are located.
  2. Invoke the following command: 
    bzip2 file.csv
    If you want to compress multiple files, you can list them out:
    bzip2 file.csv file2.csv file3.csv
    For more information on other options with bzip2 command, check this out.

Those of you interested in using the recommended columnar storage formats, should check out the AWS documentation, which shows how you can spin up an EMR cluster to convert data to Parquet.


Copyright © 2016-2017 Aaron Medacco