Building AWS Lambda with Python, S3 and serverless#

Cloud-native revolution pointed out the fact that the microservice is the new building block and your best friends now are Containers, AWS, GCE, Openshift, Kubernetes, you-name-it. But suddenly micro became not that granular enough and people started talking about serverless functions!

When I decided to step in the serverless property I chose AWS Lambda as my instrument of choice. As for experimental subject, I picked up one of my existing projects - a script that tracks new documentation releases for Nokia IP/SDN products (which in the past I aggregated at nokdoc.github.io (now closed)).

Given that not so many posts are going deeper than onboarding a simplest function, I decided to write down the key pieces I needed to uncover to push a real code to the Lambda.

Buckle up, our agenda is fascinating:

testing basic Lambda onboarding process powered by Serverless framework
accessing files in AWS S3 from within our Lambda with boto3 package and custom AWS IAM role
packaging non-standard python modules for our Lambda
exploring ways to provision shared code for Lambdas
and using path variables to branch out the code in Lambda

Init#

What I am going to lambdsify is an existing python3 script called nokdoc-sentinel which has the following Lambda-related properties:

uses non standard python package -- requests
reads/writes a file.

I specifically emphasized this non-std packages and relying on persistence since these aspects are not covered in 99% of Lambda-related posts, so, filling the spot.

AWS Lambda is a compute service that lets you run code without provisioning or managing servers. AWS Lambda executes your code only when needed and scales automatically, from a few requests per day to thousands per second.

Multiple choices are exposed to you when choosing an instrument to configure & deploy an AWS Lambda:

AWS Console (web)
AWS CLI
Multiple frameworks (Serverless, Chalice, Pywren)

While it might be good to feel the taste of a manual Lambda configuration process through the AWS Console, I decided to go "everything as a code" way and use the Serverless framework to define, configure and deploy my first Lambda.

The Serverless Framework helps you develop and deploy your AWS Lambda functions, along with the AWS infrastructure resources they require. It's a CLI that offers structure, automation and best practices out-of-the-box, allowing you to focus on building sophisticated, event-driven, serverless architectures, comprised of Functions and Events.

Serverless installation and configuration#

First things first, install the framework and configure AWS credentials. I already had credentials configured for AWS CLI thus skipped that part, if that is not the case for you, the docs are comprehensive and should have you perfectly covered.

Creating a Service template#

Once serverless is installed, start with creating an aws-python3 service:

A service is like a project. It's where you define your AWS Lambda Functions, the events that trigger them and any AWS infrastructure resources they require, all in a file called serverless.yml.

serverless create --template aws-python3 --name nokdoc-sentinel

Two files will be created:

handler.py -- a module with Lambda function boilerplate code
serverless.yml -- a service definition file

Making lambda instance out of a template#

I renamed handler.py module to sentinel.py, also changed the enclosed function' name and deleted redundant code from the template. For starters I kept the portion of a sample code just to test that deploying to AWS via serverless actually works.

# sentinel.py
import json

def check(event, context):
    body = {
        "message": "Sentinel is on watch!",
    }

    response = {
        "statusCode": 200,
        "body": json.dumps(body)
    }

    return response

Thing to remember is that you also must to make appropriate changes in the serverless.yml, once you renamed the module and the function names:

functions:
# name of the func in the module
  check:
    # `handler: sentinel.check` reads as 
    # "`check` function in the `sentinel` module
    handler: sentinel.check

Deploying and Testing AWS Lambda#

Before adding some actual load to the Lambda function, lets test that the deployment works. To trigger Lambda execution I added HTTP GET event with the test path in the serverless.yml file. So a call to https://some-aws-hostname.com/test should trigger our lambda function to execute.

functions:
  hello:
    handler: handler.hello
    # add http GET trigger event
    events:
      - http:
          path: test
          method: get

Read all about supported by serverless framework events in the official docs.

And we are coming to the first test deployment with the following assets:

$ tree -L 1
.
|-- sentinel.py
`-- serverless.yml

Lets go and deploy:

$ serverless deploy
Serverless: Packaging service...
Serverless: Creating Stack...
Serverless: Checking Stack create progress...
.....
Serverless: Stack create finished...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (0.33 MB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
..............................
Serverless: Stack update finished...
Service Information
service: nokdoc-sentinel
stage: dev
region: us-east-1
api keys:
  None
endpoints:
  GET - https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/dev/test
functions:
  check: nokdoc-sentinel-dev-check

Note the endpoint URL at the bottom of the output, using this API endpoint we can check if our Lambda is working:

curl https://xxxxxxxx.execute-api.us-east-1.amazonaws.com/dev/test
{"message": "Sentinel is on watch!"}

Exploring Serverless artifacts#

Serverless deployed the Lambda using some defaults parameters (region: us-east-1, stage: dev, IAM role); plus serverless did some serious heavy-lifting in order to deploy our code to AWS. In particular:

archived the project files as a zip archive and loaded it to AWS S3
created CloudFormation template that defines all the steps needed to onboard a Lambda and setup an API gateway to respond to GET requests

Key artifacts that were created by serverless in AWS can be browsed with the AWS CLI:

# exploring deployed Lambda
$ aws --region us-east-1 lambda list-functions
{
    "Functions": [
        {
            "FunctionName": "nokdoc-sentinel-dev-check",
            "FunctionArn": "arn:aws:lambda:us-east-1:446595173912:function:nokdoc-sentinel-dev-check",
            "Runtime": "python3.6",
            "Role": "arn:aws:iam::446595173912:role/nokdoc-sentinel-dev-us-east-1-lambdaRole",
            "Handler": "sentinel.check",
            "CodeSize": 1395199,
            "Description": "",
            "Timeout": 6,
            "MemorySize": 1024,
            "LastModified": "2017-07-17T19:06:59.405+0000",
            "CodeSha256": "QrFOl8eBL8HipGRCkN/P7wsxkn8/LDIMCAQLxAVmFfI=",
            "Version": "$LATEST",
            "TracingConfig": {
                "Mode": "PassThrough"
            }
        }
    ]
}


# exploring S3 artifacts
$ aws s3 ls | grep sentinel
2017-07-17 22:05:13 nokdoc-sentinel-dev-serverlessdeploymentbucket-moviajl407hw

$ aws s3 ls nokdoc-sentinel-dev-serverlessdeploymentbucket-moviajl407hw/serverless/nokdoc-sentinel/dev/1500318307598-2017-07-17T19:05:07.598Z/
2017-07-17 22:05:40       3578 compiled-cloudformation-template.json
2017-07-17 22:05:41    395199 nokdoc-sentinel.zip

# exploring CloudFormation stack
$ aws --region us-east-1 CloudFormation list-stacks
{
    "StackSummaries": [
        {
            "StackId": "arn:aws:cloudformation:us-east-1:446595173912:stack/nokdoc-sentinel-dev/da010710-6b22-11e7-aa95-500c20fef6d1",
            "StackName": "nokdoc-sentinel-dev",
            "TemplateDescription": "The AWS CloudFormation template for this Serverless application",
            "CreationTime": "2017-07-17T19:05:08.875Z",
            "LastUpdatedTime": "2017-07-17T19:05:45.283Z",
            "StackStatus": "UPDATE_COMPLETE"
        }
    ]
}

Are you interested what is in this archive nokdoc-sentinel.zip?

$ ls -la ~/Downloads/nokdoc-sentinel/
total 16
drwx------@  6 romandodin  staff   204 Jul 18 09:51 .
drwx------+ 54 romandodin  staff  1836 Jul 18 09:51 ..
drwxr-xr-x@  3 romandodin  staff   102 Jul 18 09:51 .vscode
-rw-r--r--@  1 romandodin  staff   208 Jan  1  1980 sentinel.py
-rw-r--r--@  1 romandodin  staff  3720 Jan  1  1980 watcher.py

There are two files we dealt with earlier plus .vscode dir that a text editor created for its settings. Having .vscode in the deployment package actually indicates that by default serverless zipped everything in the project' dir. You can get in control of this process by using include/exclude statements.

Accessing AWS S3 from within a Lambda#

It is natural that AWS assumes that Lambdas will be used in a close cooperation with the rest of the AWS family. And for the file storage AWS S3 is a one-stop shop.

Sorting out permissions#

What you have to sort out before digging into S3 interaction is the permissions that your Lambda has. When serverless deployed our Lambda with a lot of defaults it also handed out a default IAM role to our Lambda:

aws --region us-east-1 lambda list-functions | grep Role
            # role name is nokdoc-sentinel-dev-us-east-1-lambdaRole
            "Role": "arn:aws:iam::446595173912:role/nokdoc-sentinel-dev-us-east-1-lambdaRole",

To be able to interact with AWS S3 object model, this Role should have access to S3. Lets investigate:

aws iam get-role-policy --role-name nokdoc-sentinel-dev-us-east-1-lambdaRole --policy-name dev-nokdoc-sentinel-lambda
{
    "RoleName": "nokdoc-sentinel-dev-us-east-1-lambdaRole",
    "PolicyName": "dev-nokdoc-sentinel-lambda",
    "PolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "logs:CreateLogStream"
                ],
                "Resource": [
                    "arn:aws:logs:us-east-1:446595173912:log-group:/aws/lambda/nokdoc-sentinel-dev-check:*"
                ],
                "Effect": "Allow"
            },
            {
                "Action": [
                    "logs:PutLogEvents"
                ],
                "Resource": [
                    "arn:aws:logs:us-east-1:446595173912:log-group:/aws/lambda/nokdoc-sentinel-dev-check:*:*"
                ],
                "Effect": "Allow"
            }
        ]
    }
}

As you see S3 access is not a part of default permissions, so we must grant it to our Lambda. Instead of adding additional permissions to the existing role manually, we can re-deploy the Lambda with updated serverless.yml file. In this edition I specified availability zone, set existing S3 bucket as a deployment target and included IAM role configuration allowing full-access to S3 objects:

# serverless.yml
provider:
  name: aws
  runtime: python3.6
  stage: dev
  region: eu-central-1
  # deploy Lambda function files to the bucket `rdodin`
  deploymentBucket: rdodin
  # IAM Role configuration to allow all-access for S3 objects of bucket `rdodin`
  iamRoleStatements:
    - Effect: "Allow"
      Action: "s3:*"
      Resource: "arn:aws:s3:::rdodin/*"

Now the re-deployment will create another Lambda (hence the availability zone has changed), deploy the code in the existing bucket rdodin and apply a policy that allows S3 interaction.

# checking the inline policy of the IAM Role bound to Lambda
aws lambda get-function --function-name nokdoc-sentinel-dev-check | grep Role
        "Role": "arn:aws:iam::446595173912:role/nokdoc-sentinel-dev-eu-central-1-lambdaRole",

aws iam list-role-policies --role-name nokdoc-sentinel-dev-eu-central-1-lambdaRole
{
    "PolicyNames": [
        "dev-nokdoc-sentinel-lambda"
    ]
}

aws iam get-role-policy --role-name nokdoc-sentinel-dev-eu-central-1-lambdaRole --policy-name dev-nokdoc-sentinel-lambda
{
    "RoleName": "nokdoc-sentinel-dev-eu-central-1-lambdaRole",
    "PolicyName": "dev-nokdoc-sentinel-lambda",
    "PolicyDocument": {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "logs:CreateLogStream"
                ],
                "Resource": [
                    "arn:aws:logs:eu-central-1:446595173912:log-group:/aws/lambda/nokdoc-sentinel-dev-check:*"
                ],
                "Effect": "Allow"
            },
            {
                "Action": [
                    "logs:PutLogEvents"
                ],
                "Resource": [
                    "arn:aws:logs:eu-central-1:446595173912:log-group:/aws/lambda/nokdoc-sentinel-dev-check:*:*"
                ],
                "Effect": "Allow"
            },
            {
                "Action": "s3:*",
                "Resource": "arn:aws:s3:::rdodin/*",
                "Effect": "Allow"
            }
        ]
    }
}

Now as the S3 permissions are there, we are free to list bucket contents and modify the files in it.

Using Boto3 to read/write files in AWS S3#

AWS provides us with the boto3 package as a Python API for AWS services. Moreover, this package comes pre-installed on the system that is used to run the Lambdas, so you do not need to provide a package.

I put a file (releases_current.json) that my script expects to read to the directory created by the serverless deployment script:

$ aws s3 ls rdodin/serverless/nokdoc-sentinel/
                           PRE dev/
2017-07-22 15:57:07       3424 releases_current.json

Lets see if we can access it from within the Lambda using boto3 and its documentation:

# sentinel.py
import json
import boto3


def check(event, context):
    s3 = boto3.resource('s3')
    bucket = s3.Bucket('rdodin')
    # reading a file in S3 bucket
    original_f = bucket.Object(
        'serverless/nokdoc-sentinel/releases_current.json').get()['Body'].read()[:50]
    # writing to a file
    new_f = bucket.put_object(
        Key='serverless/nokdoc-sentinel/newfile.txt', Body='Hello AWS').get()['Body'].read()

    body = {
        "message": "Sentinel loaded a file {} and created a new file {}"
        .format(original_f, new_f),
    }

    response = {
        "statusCode": 200,
        "body": json.dumps(body)
    }

    return response

Re-deploy and check:

$ curl https://xxxxx.execute-api.eu-central-1.amazonaws.com/dev/test
{"message": "Sentinel loaded a file b'{\"nuage-vsp\": [\"4.0.R8\", \"4.0.R7\", \"4.0.R6.2\", \"4.' and created a new file b'Hello AWS'"}

So far, so good. We are now capable of reading/writing to a file stored in AWS S3.

Adding python packages to Lambda#

We were lucky to use only the packages that either standard (json) or comes preinstalled in Lambda-system (boto3). But what if we need to use packages other from that, maybe your own packages or from PyPI?

Well, in that case you need to push these packages along with your function' code as a singe deployment package. As official guide says you need to copy packages to the root directory of your function and zip everything as a single archive.

What comes as a drawback of this recommendation is that

your project dir will be dirty with all these packages sitting in the root
you will have to .gitignore these packages directory to keep your packages out of a repository

I like the solution proposed in the "Building Python 3 Apps On The Serverless Framework" post. Install your packages in a some directory in your projects dir and modify your PYTHONPATH to include this directory.

# install `requests` package in a `vendored` dir at the projects root
pip install -t vendored/ requests

# `requests` and its dependencies are there
$ ls vendored/
certifi                     chardet                     idna                        requests                    urllib3
certifi-2017.4.17.dist-info chardet-3.0.4.dist-info     idna-2.5.dist-info          requests-2.18.1.dist-info   urllib3-1.21.1.dist-info

Now modify your code to include vendored directory in your PYTHONPATH

import boto3

here = os.path.dirname(os.path.realpath(__file__))
sys.path.append(os.path.join(here, "vendored"))
# now it is allowed to add a non-std package
import requests

def check(event, context):
# output omitted

Note, that if a package has a native binary code, it must be compiled for the system that is used to run Lambdas.

Shared code for Lambdas#

Even though a Lambda often assumed as an independent function, a real application you might want to transfer to Lambda quite likely will have dependencies on some common code. Refer to the "Writing Shared Code" section of the above mentioned blog post to see how its done.

Handling arguments in Lambdas#

Another common practice in a classic util function is to have some arguments (argparse) that allow to branch out the code and make an app' logic feature-rich. In Lambdas, of course, you have no CLI exposed, so to make a substitution for the arguments you can go two ways:

create several functions for your project and bind different API endpoints to each of them
use a single function and add a variable part to the API endpoint

I will show how to handle the latter option. First, create a variable parameter for your API endpoint in the serverless.yml:

    events:
      - http:
          # `{command}` is a variable part here
          path: go/{command}
          method: get

Now you Lambda can be branched out like that, using the part that you will place in the end of your API endpoint as an argument.

def check(event, context):
    # a variable that we referenced as {command} in serverless.yml
    # can be accessed by a `command` key of event['pathParameters'] dict
    if 'branch1' in event['pathParameters']['command']:
        body = {
            "message": "Argument `A` execution block"
        }

    if 'branch2' in event['pathParameters']['command']:
        body = {
            "message": "Argument `B` execution block"
        }

    response = {
        "statusCode": 200,
        "body": json.dumps(body)
    }

    return response

Now adding an arbitrary text after the go/ path will be evaluated in your Lambda allowing you to conditionally execute some parts of your code.

Summary#

With the above explained concepts I successfully transferred nokdoc-sentinel script from a standalone cron-triggered module to the AWS Lambda. You can check out the project' code and the serverless.yml file at github repo.

Links#

Benny Bauer -- Python in The Serverless Era PyCon 2017
Ryan S. Brown -- Building Python 3 Apps On The Serverless Framework
Serverless Framework AWS Guide
AWS Lambda Developers Guide
AWS CLI Command Reference