Creating Google Cloud Platform Function with Python and Serverless#

Two years ago I shared my experience on building the AWS Lambda function for a python project of my own. And a few days ago I stumbled upon a nice opensource CLI tool that I immediately wanted to transform in a web service.

Naturally, a simple, single-purpose tool is a perfect candidate for function-as-a-service (FaaS), and since I had past experience with AWS Lambda, this time I decided to meet its Google's sibling - Google Cloud Function.

In this post we'll discover how to take a python package with 3^rd party dependencies, make a GCP Function from it and deploy it without a single click in the UI - all without leaving the IDE.

[Project's source code]

The python tool I considered a natural fit for a Cloud Function is a pycatj by David Barroso that he released just recently.

For those with problems accessing yaml/json data: https://t.co/IbbS3x05bq
— David Barroso (@dbarrosop) June 26, 2019

This tool helps you to map a JSON/YAML file to a Python dictionary highlighting the keys you need to access the nested data:

$ cat tests/data/test_1.json
{
    "somekey": "somevalue",
    "somenumber": 123,
    "a_dict": {
        "asd": "123",
        "qwe": [1, 2, 3],
        "nested_dict": {
            "das": 31,
            "qwe": "asd"
        }
    }
}

$ pycatj --root my_var tests/data/test_1.json
my_var["somekey"] = "somevalue"
my_var["somenumber"] = 123
my_var["a_dict"]["asd"] = "123"
my_var["a_dict"]["qwe"][0] = 1
my_var["a_dict"]["qwe"][1] = 2
my_var["a_dict"]["qwe"][2] = 3
my_var["a_dict"]["nested_dict"]["das"] = 31
my_var["a_dict"]["nested_dict"]["qwe"] = "asd"

I felt like having a single-page web service that would do these transformations leveraging the pycatj would be helpful to somebody sometime.
Probably the easiest way would be to rewrite the same code with JavaScript and create a static page with that code without any backend, but does it spark joy?. Not even a bit.

And as a starting point I decided to create a serverless function that will rely on pycatj and will be triggered later by a Web frontend with an HTTP request carrying the content for pycatj-ifying.

In a nutshell, the function should behave something like that:

curl -X POST https://api-endpoint.com -d '{"data":{"somekey":"value1"}}'
# returns
my_var["somekey"] = "value1"

To add some sugar to the mix I will leverage the serverless framework to do the heavy lifting in a code-first way. The plan is set, lets go see it to completion.

Agenda#

The decomposition of the service creation and deployment can be done as follows:

Google Cloud Platform
Create a GCP account (if needed) and acquire the API credentials
Create a project in GCP that will host a Function and enable the needed APIs for serverless to be able to create the Function and its artifacts
Function creation and testing
create the code in conformance with the GCP Function handlers/events rules
Manage code dependencies
Function deployment
leveraging serverless framework to deploy a function to GCP
Add a frontend (in another blog post) that will use the serverless function.

1 Google Cloud Platform#

Following the agenda, ensure that you have a working GCP account (trial gives you $300, and GCP Function is perpetually FREE with the sane usage thresholds). Make sure that you have a billing account created, this is set up when you opt in the free trial program, for example. Without a linked billing account the Functions won't work.

Once you have your account set, you should either continue with a default project or create a new one. In either case you need to enable the APIs that will be leveraged by serverless framework for a function deployment process. Go thru this guide carefully on how to enable the right APIs.

API credentials

Do not forget to download your API credentials, as nothing can be done without them. This guide's section explains it all.
The commands you will see in the rest of this post assume that the credentials are stored in ~/.gcould directory.

2 Function creation#

Since we are living on the edge, we will rely on the serverless framework to create & deploy our function. The very same framework I leveraged for the AWS Lambda creation, so why not try it for GCP Function?

The notable benefit of serverless framework is that it allows you to define your Function deployment as a code and thus making it repeatable, versionable and fast.

But nothing comes cheap, for all these perks you need to pay; and the serverless toll is in being a Javascript package =|. Don't know about you, but no glove - no love is the principle I try to stick to with JS. So why not quarantine it in a docker container jail and keep your machine npm-free?

docker pull amaysim/serverless:1.45.1

2.1 Serverless service template#

The way I start my serverless journey is by telling the serverless to generate a service template in the programming language of my choice. Later we can tune bits and pieces of that service, but if you start from a zero-ground, its easier to have a scaffolding to work on.

# Create service with `google-python` template in the folder ~/projects/pycatj-web
docker run --rm \
 -v ~/projects/pycatj-web:/opt/app \
 amaysim/serverless:1.45.1 \
 serverless create --template google-python --path pycatj-serverless

The result of the serverless create --template <template> command will be a directory with a boilerplate code for our function and a few serverless artifacts.

# artifacts created by `serverless create --template`
$ tree pycatj-serverless/
pycatj-serverless/
├── main.py
├── package.json
└── serverless.yml

We need to take a closer look at the generated serverless.yml template file where we need to make some adjustments:

the project name should match the project name you have created in the GCP
the path to your GCP credentials json file should be valid

Given that the project in my GCP is called pycatj and my credentials file is ~/.gcloud/pycatj-d6af60eda976.json the provider section of the serverless.yml file would look like this:

# serverless.yml file
# with project name and credentials file specified
provider:
  name: google
  stage: dev
  runtime: python37
  region: us-central1
  project: pycatj
  credentials: ~/.gcloud/pycatj-d6af60eda976.json

As to the main.py generated by the framework, then its a simple boilerplate code with a text reply to an incoming HTTP request wrapped in a Flask object.

# main.py
def http(request):
    """Responds to any HTTP request.
    Args:
        request (flask.Request): HTTP request object.
    Returns:
        The response text or any set of values that can be turned into a
        Response object using
        `make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
    """
    return f'Hello World!'

Lets test that our modifications work out so far by trying to deploy the template service.

2.2 Testing function deployment#

Before we start pushing our function and its artifacts to the GCP, we need to tell serverless how to talk to the cloud provider. To do that, we need to install the serverless-google-cloudfunctions plugin that is referenced in the serverless.yml file.

Install the google cloud functions plugin with the npm install command using the directory with a generated serverless service files:

docker run --rm \
  -v ~/.gcloud/:/root/.gcloud \
  -v ~/projects/pycatj-web/pycatj-serverless:/opt/app \
  amaysim/serverless:1.45.1 npm install

Note, here I mount my GCP credentials that are stored at ~/.gcloud dir to a containers /root/.gcloud dir where serverless container will find them as they are referenced in the serverless.yml file.
And secondly I bind mount my project's directory ~/projects/pycatj-web/pycatj-serverless to the /opt/app dir inside the container that is a WORKDIR of that container.

Now we have a green flag to try out the deployment process with serverless deploy:

docker run --rm \
  -v ~/.gcloud/:/root/.gcloud \
  -v ~/projects/pycatj-web/pycatj-serverless:/opt/app \
  amaysim/serverless:1.45.1 serverless deploy

If the deployment fails with the Error Not Found make sure that you don't have stale failed deployments by going to Cloud Console -> Deployment Manager and deleting all deployments created by Serverless

Upon a successful deployment you will have a Cloud Function deployed and reachable by the service URL:

Deployed functions
first
  https://us-central1-pycatj.cloudfunctions.net/http

curl-ing that API endpoint will return a simple "Hello world" as coded in our boilerplate main.py function:

# main.py
def http(request):
    return f'Hello World!'

curl -s https://us-central1-pycatj.cloudfunctions.net/http
Hello World!

You can also verify the resources that were created by this deployment by visiting the Deployment Manager in the GCP console as well as navigating to the functions page and examine the deployed function and its properties:

2.3 Writing a Function#

That was a template function that we just deployed with the HTTP event acting as a trigger.

Lets see how the the actual Python function is coupled to a service definition inside the serverless file. How about we give our function a different name by first changing the functions section of the serverless.yml file:

# changing the function name and handler to `pycatjify`
functions:
  pycatjify:
    handler: pycatjify
    events:
      - http: path

Since we changed the function and the handler name to pycatjify we should do the same to our function inside the main.py file:

def pycatjify(request):
    return f"We are going to give pycatj its own place on the web!"

Deploying this function will give us a new API endpoint aligned to a new function name we specified in the serverless.yml:

Deployed functions
pycatjify
  https://us-central1-pycatj.cloudfunctions.net/pycatjify

# testing
$ curl https://us-central1-pycatj.cloudfunctions.net/pycatjify
We are going to give pycatj its own place on the web!

2.3.1 Managing code dependencies#

Up until now we played with a boilerplate code with a few names changed to give our function a bit of an identity. We reached the stage when its time to onboard the pycatj package and make our function benefit from it.

Since the Functions are executed in the sandboxes on the cloud platforms, we must somehow tell what dependencies we want these sandbox to have when running our code. In the AWS Lambda example we packaged the 3^rd party libraries along the function (aka vendoring).

In GCP case the vendoring approach is also possible and is done in the same way, but it is also possible to ship a pip requirements.txt file along your main.py that will specify your function dependencies as pythonistas used to.

Read more on GCP python dependency management

Unfortunately, the PIP version that GCP currently uses does not support PEP 517, so it was not possible to specify -e git+https://github.com/dbarrosop/pycatj.git#egg=pycatj in a requirements file, thus I continued with a good old vendoring technique:

# executed in ~/projects/pycatj-web/pycatj-serverless
pip3 install -t ./vendored git+https://github.com/dbarrosop/pycatj.git

This installs pycatj package and its dependencies in a vendored directory and will be considered as Function's artifact and pushed to GCP along the main.py with the next serverless deploy command.

2.3.2 Events#

Every function should be triggered by an event or a trigger that is supported by a cloud provider. When serverless is used the event type is specified for each function in the serverless.yml file:

# pycatjify function is triggered by an event of type `http`
# note that they key `path` is irrelevant to the serverless
functions:
  pycatjify:
    handler: pycatjify
    events:
      - http: path

With this configuration we expect our function to execute once an HTTP request hits the function API endpoint.

2.3.3 Writing a function#

Yes, a thousand words later we finally at a milestone where we write actual python code for a function. The template we generated earlier gives us a good starting point - a function body with a single Flask request argument:

def pycatj(request):
    return f"We are going to give pycatj its own place on the web!"

The logic of our serverless function that we are coding here is:

parse the contents of an incoming HTTP request extracting the contents of a JSON file passed along with it
transform the received data with pycatj package and send back the response

With a few additions to access the pycatj package in a vendored directory and being able to test the function locally, the resulting main.py file looks as follows:

This code has some extra additions to a simple two-step logic I mentioned before. I stuffed a default data value that will be used when the incoming request has no body, then we will use this dummy data just for demonstration purposes.
To let me test the function code locally I added the if __name__ == "__main__": condition and lastly I wrote some print functions for a trivial logging. Speaking of which...

2.4 Logging#

Logging is a bless! Having a chance to look what happens with your function in a cloud platform sandbox is definitely a plus. With GCP the logging can be done in the simple and advanced modes. A simple logging logs everything that is printed by a function into stdout/stderr outputs -> a simple print() function would suffice. In a more advanced mode you would leverage a GCP Logging API.

The logs can be viewed with the Web UI Logging interface, as well as with the gcloud CLI tool.

3 Function deployment#

We previously already tried the deployed process with a boilerplate code just to make sure that the serverless framework works. Now that we have our pycatj package and its dependencies stored in a vendored folder and the function body is filled with the actual code, lets repeat the deployment and see what we get:

docker run --rm \
  -v ~/.gcloud/:/root/.gcloud \
  -v ~/projects/pycatj-web/pycatj-serverless:/opt/app \
  amaysim/serverless:1.45.1 serverless deploy

All goes well and serverless successfully updates our function to include the vendored artifacts as well as the new code in the main.py. Under the hood, the deployment process took the code of our Function and packaged it into a directory, zipped and uploaded to the deployment bucket.

As demonstrated above, the serverless framework allows a user to express the deployment in a code, making the process extremely easy and fast.

4 Usage examples#

Time to give our Function a roll by bombing it with HTTP requests. In this section I will show you how you can use the pycatjify service within a CLI and in a subsequent post we will write a simple Web UI using the API that our function provides.

4.1 empty GET request#

curl -s https://us-central1-pycatj.cloudfunctions.net/pycatjify | jq -r .data

# returns
my_dict["data"] = "test_value"
my_dict["somenumber"] = 123
my_dict["a_dict"]["asd"] = "123"
my_dict["a_dict"]["qwe"][0] = 1
my_dict["a_dict"]["qwe"][1] = 2
my_dict["a_dict"]["qwe"][2] = 3
my_dict["a_dict"]["nested_dict"]["das"] = 31
my_dict["a_dict"]["nested_dict"]["qwe"] = "asd"

With an empty GET request the function delivers a demo of its capabilities by taking a hardcoded demo JSON and making a transformation. The returned string is returned in a JSON object accessible by the data key.

4.2 POST with a root and pycatj_data specified#

Getting a demo response back is useless, to make use of a pycatjify service a user can specify the root value and pass the original JSON data in a POST request body using the pycatj_data key:

curl -sX POST https://us-central1-pycatj.cloudfunctions.net/pycatjify \
  -H "Content-Type: application/json" \
  -d '{"root":"POST","pycatj_data":{"somekey":"somevalue","a_dict":{"qwe":[1,2],"nested_dict":{"das":31}}}}' \
  | jq -r .data

# returns
POST["somekey"] = "somevalue"
POST["a_dict"]["qwe"][0] = 1
POST["a_dict"]["qwe"][1] = 2
POST["a_dict"]["nested_dict"]["das"] = 31

4.3 POST without root, with pycatj_data#

It is also allowed to omit the root key, in that case a default root value will be applied:

curl -sX POST https://us-central1-pycatj.cloudfunctions.net/pycatjify \
  -H "Content-Type: application/json" \
  -d '{"pycatj_data":{"somekey":"somevalue","a_dict":{"qwe":[1,2],"nested_dict":{"das":31}}}}' \
  | jq -r .data

# returns
my_dict["somekey"] = "somevalue"
my_dict["a_dict"]["qwe"][0] = 1
my_dict["a_dict"]["qwe"][1] = 2
my_dict["a_dict"]["nested_dict"]["das"] = 31

4.4 POST with json file as a body#

My personal favorite is dumping a JSON file in a request. In that case a lengthy curl is not needed and you can specify a path to a file with a @ char.
This example leverages the logic embedded in a function that treats the whole body of an incoming request as a data for pycatj transformation.

$ cat test/test1.json
{
    "somekey": "localfile",
    "a_dict": {
        "asd": "123",
        "qwe": [
            1,
            2
        ],
        "nested_dict": {
            "das": 31,
            "qwe": "asd"
        }
    }
}

curl -sX POST https://us-central1-pycatj.cloudfunctions.net/pycatjify \
  -H "Content-Type: application/json" \
  -d "@./test/test1.json" \
  | jq -r .data

# returns
my_dict["somekey"] = "localfile"
my_dict["a_dict"]["asd"] = "123"
my_dict["a_dict"]["qwe"][0] = 1
my_dict["a_dict"]["qwe"][1] = 2
my_dict["a_dict"]["nested_dict"]["das"] = 31
my_dict["a_dict"]["nested_dict"]["qwe"] = "asd"

What's next?#

Having pycatj functionality available withing a HTTP call reach makes it possible to create a simple one-page web frontend that will receive the users input and render the result of the pycatj-web service we deployed in this post.

I will make another post covering the learning curve I needed to climb on to create a modern Material UI frontend that leverages the serverless function.