21. January 2017 21:12
by Aaron Medacco
0 Comments

Automatically Converting Your Text Files to Speech MP3s w/ AWS Polly

21. January 2017 21:12 by Aaron Medacco | 0 Comments

AWS Polly is a new service announced at AWS re:Invent 2016 that allows you to convert text to speech almost instantly. For those with this kind of business requirement, you probably don't want to pull this off by manually going file by file. Therefore, I've created a solution that will automatically convert text files (.txt) you upload to an S3 bucket into audio files (.mp3) with the same name in a separate bucket. A definite time-saver if you're doing this to 1000s or more files a day.

 Lambda Polly Diagram

Let's get started.

Creating an IAM policy for access permissions:

  1. Navigate to IAM in your management console.
  2. Select "Policies" in the sidebar.
  3. Click "Create Policy".
  4. Select "Create Your Own Policy".
  5. Enter an appropriate policy name and description.
  6. Paste the following JSON into the policy document:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "polly:SynthesizeSpeech"
                ],
                "Resource": [
                    "*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject"
                ],
                "Resource": [
                    "Your Source Bucket ARN/*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject"
                ],
                "Resource": [
                    "Your Destination Bucket ARN/*"
                ]
            }
        ]
    }
  7. Substitute "Your Source Bucket ARN" with the ARN for the S3 bucket you will be uploading text files to. Make sure you add "/*" after the bucket ARN. For instance, if your bucket ARN was "arn:aws:s3:::clownshoes", you would use "arn:aws:s3:::clownshoes/*".
  8. Substitute "Your Destination Bucket ARN" with the ARN for the S3 bucket that you want speech files to be generated and placed to. Make sure you add "/*". For instance, if your bucket ARN was "arn:aws:s3:::clownshoes", you would use "arn:aws:s3:::clownshoes/*".
  9. Click "Create Policy".

 Creating the IAM role for the Lambda function:

  1. Select "Roles" in the sidebar.
  2. Click "Create New Role".
  3. Enter an appropriate role name and click "Next Step".
  4. Select "AWS Lambda" within the AWS Service Roles.
  5. Change the filter to "Customer Managed", check the box of the policy you just created, and click "Next Step".
  6. Click "Create Role".

Creating the Lambda function:

  1. Navigate to Lambda in your management console.
  2. Click "Create a Lambda function".
  3. Select the "Blank Function" blueprint.
  4. Under "Configure triggers", click the grey box and select "S3".
  5. Select the source bucket you'll be uploading text files to for the Bucket.
  6. Select "Put" for the Event type.
  7. Check the box to "Enable trigger" and click "Next".
  8. Click "Next".
  9. Enter an appropriate function name and description. Select Node.js for the runtime.
  10. Under "Lambda function code", select "Edit code inline" for the Code entry type and paste the following code in the box:
    var AWS = require("aws-sdk");
    
    exports.handler = (event, context, callback) => {
        var s3 = new AWS.S3();
        var polly = new AWS.Polly();
        var destinationBucket = "Destination Bucket Name";
        var params = {
            Bucket: event.Records[0].s3.bucket.name,
            Key: event.Records[0].s3.object.key
        };
        s3.getObject(params, function(err, data) {
            if (err) {
                console.log(err, err.stack);
            }
            else {
                var objectKey = event.Records[0].s3.object.key;
                var objectNameMp3 = objectKey.replace(".txt", ".mp3");
                var pollyParams = {
                    OutputFormat: "mp3", 
                    SampleRate: "8000", 
                    Text: data.Body.toString('utf-8'), 
                    TextType: "text", 
                    VoiceId: "Joanna"
                };
                polly.synthesizeSpeech(pollyParams, function(err, data) {
                    if (err) {
                        console.log(err, err.stack);  
                    } 
                    else {
                        var uploadParam = { Bucket: destinationBucket, Key: objectNameMp3, Body: data.AudioStream, ContentType: "audio/mpeg", StorageClass: "STANDARD" };
                        s3.upload(uploadParam, function(err, data) {
                            if (err) {
                                console.log(err, err.stack);
                            } else{
                                console.log("Speech file upload successful.")
                            }
                        });
                    }
                });
            }
        });
    };
  11. Substitute "Destination Bucket Name" with the name of the bucket you want the audio files to be placed in.
  12. Leave Handler as "index.handler".
  13. Choose to use an existing role and select the IAM role you created earlier.
  14. Leave the other default values and click "Next".
  15. Click "Create function".

Let's test it out!

We'll create a text file with some text.

Text File Contents

 

Then upload the file to our source bucket. Navigate to the destination bucket in S3, and...

Side Note: While playing with AWS Polly, I was actually reminded of the accreditation courses found within the APN portal, which use similar sounding voices for their audio. Now I'm curious if Amazon "dogfooded" Polly and used it when creating their partner training. I guess if you weren't interested in recording audio for video courses you could stick to PowerPoint for the visuals, write the script and then use Polly to convert the text into audio played during the slides.

Cheers!

Copyright © 2016-2017 Aaron Medacco