What would happen if you designed the backend of your web application or mobile apps in such a way that they did not require a monolithic application server? Could we provide everything from static assets to RESTful operations and messaging using microservices implemented using the extensive suite of tools provided by Amazon Web Services (AWS)?
Update: I’m writing a technical book on this topic called ‘Serverless’ (Dec 2015)
Update #2: Hacker News discussion here
The Quest Begins
Fred George, a former colleague & mentor at Thoughtworks is widely credited as one of the originators of the concept of microservices. At this year’s speakerconf, Fred once again described his ideas for systems displaying emergent behavior based on collaborative microservices communicating via pub/sub messaging. I left the conference eager to try such a system for my own purposes.
My initial exploration of Apache Kafka (the messaging server that Fred himself uses) was fruitless. Given constraints of free time and lack of practical devops experience, I wanted instead something simple to deploy and manage. As surely many others are currently doing, I shifted my attention to the functionality provided by AWS, especially given recent progress on the Lambda and API Gateway products.
My testbed for experimentation would be DueProps, the web application that I’ve kept on life support since the demise of its startup venture a few years ago. DueProps exists as a monolithic Rails application hosted on Heroku. I touch the codebase as rarely as possible, since it’s brittle and I’ve forgotten how a lot of it works.
What’s more, DueProps is relatively expensive to maintain, especially in light of how little traffic it supports. If I could migrate the entire backend to AWS and pay based solely on usage, the hosting costs might drop significantly. It would also be a considerably smaller mental leap of faith to invest in driving some new traffic, since a key side-benefit of serverless design is auto-scaling; I might be able to survive unexpected spikes in demand without the headaches of wondering whether DueProps’ aging backend would fall over or get unbearably expensive.
But how to begin?
Proof of Concept: Slack Integration
After mulling it over for a couple of weeks, and absent too much inspiration or desire for rewriting core DueProps functionality, I decided to try adding a new feature. One of the most popular user requests of the last year has been for us to integrate with Slack.
Slack has a one-button integration which works via OAuth. So I decided that I would try to implement the callback as a AWS Lambda function behind an AWS API Gateway method.
The plan summarized:
Add an integration tab panel to the admin settings for a DueProps network.
Add a section for Slack integration, featuring the Slack OAuth button.
Register an API Gateway endpoint to use as the Slack OAuth callback.
Back the API endpoint with a Lambda function. On successful integration, save the settings in a AWS DynamoDB table.
Redirect back to the DueProps network settings page.
Note that to follow along with the rest of this blog post, it will definitely help to understand the fundamentals of OAuth.
Slack Integration Setup
First step was to register the application with Slack. This involved creating a new Slack network for DueProps, with myself as its only user. Luckily Slack has a free tier, so no big deal. Once signed in to Slack using my new account, I headed over to their app registry.
Next I opened up the source code for DueProps. The network admin settings page would need a new tab for Integrations. It wasn’t too hard to add, along with the markup provided for the Slack button itself. The template code is out of scope for this blog post, so I won’t recreate it here. It’s garbage anyway.
While adding the button, I noted that the state variable is copied back to the requestor in the callback. Since you can put whatever value you want in there, I plan to use it as a container for the slug of the DueProps network being integrated.
Creating a Lambda function with a blueprint
Eager to start, I headed over to my AWS console and created a new Lambda function using their blueprints functionality.
I picked the microservice-http-endpoint blueprint. It starts you off with NodeJS code accessing DynamoDB behind an API endpoint. This speeds up the configuration at the cost of not exactly knowing everything that is going on, especially if it’s your first time.
Lambda functions exist in a single big namespace, so I put a little more thought than usual into naming. I decided to use a dp prefix and err on the side of being more descriptive than not.
Next up, what to do with the boilerplate code provided?
It provides a bunch of REST-ful operations based on conventions, and I was entirely unclear what those conventions are. Neither could I find any suitable documentation about it.
So after some head-scratching about what to do next, I decided to just delete all of it and add some console.log statements to dump the contents of the context and event parameters.
Before being able to save, I had to specify a couple more properties, located under the web-based code editor.
The Handler field obviously references the function exported in the code, so I left the default in place. Then I clicked through the help text link to learn more about Lambda execution roles.
The lambda_basic_execution role would work for now. I think I had created it the first time that I took interest in the basics of how Lambda works, but I can’t remember.
I clicked Next to configure the API endpoints that would invoke the new Lambda function.
I won’t bore you with detailed steps here, as I experimented with a variety of settings and IAM permissions before settling on a configuration that I felt would serve my purposes.
With these settings done, I could head back over to the DueProps app settings in Slack and set the URL as where Slack should redirect the browser on successful OAuth activity by the admin user.
Speaking of which, I should check what data will be passed to the callback. That information, along with instructions on how to turn an access tokeninto actual settings is featured in the API docs for the Slack Button.
Impatiently, I go ahead and hack changes to the DueProps admin settings to present the Slack integration button to the network admin.
Clicking the button takes me to an OAuth page that prompts for a channel where DueProps can post activity.
I pick #general and hit the submit button. Slack hits the callback and I excitedly jump back over to the AWS console to see what happened.
Looks like the invocation happened. I click on the link that says View logs in CloudWatch and promptly note that no data was passed to the Lambda function. The event parameter is completely empty. Hmm.
Okay, I think to myself there’s got to be some sort of configuration for the API endpoint, along the lines of how nowadays it is best practice to have some sort of way to tell your web app what request params are okay to pass along to controller logic.
So I poke around the AWS Console for my API Endpoint, and bingo! There’s a cool looking diagram showing the flow of the method execution. There’s two boxes above with arrows leading to the Lambda function, then arrows back in the other direction along the bottom two boxes back to the caller.
I find the UI for these configurations to be fairly unintuitive, so I guess I better consult the docs for specifying methods. Much googling and head-scratching ensues, as I trial-and-error for the next 30 minutes or so. In particular, most everything I research about Request Model programming ends up being a dead end.
So I take a step back. What I care about from the Slack callback are the code and state request parameters. I’ll be using the code parameter in conjunction with the Slack API to get a webhook URL. And the stateparameter should have my network slug, so I know which network to associate with the new settings.
Eventually I end up with a configuration that looks like this in the Method Execution settings:
But just specifying that we care about these parameters is not enough. All that it does is to make them available to the mapping step, and that part turned out to be quite a bitch to figure out. After much experimentation and googling, I found the guide to the mapping syntax.
The secret was to define a Mapping Template associated with application/json content type, even though the GET request from Slack (being an OAuth callback) should not be associated with any particular content type.
Now that the desired request parameters were being passed on to the Lambda function via the event object, I decided to try the OAuth flow in the browser. No request variables showed up in the Lambda. No matter what I tried, nothing worked. Always the damned empty object.
So I burned at least another hour going over the syntax of the mapping template and consulting every StackOverflow post and AWS Forum post available on the subject. Nothing helped, until on the verge of giving up, I realized that unlike Lambda functions, it’s not enough to save API Gateway configurations. You have to Deploy them too!
Such a stupid waste of time. What really got me is that Lambda functions just need to be saved, not deployed.
Anyway, now that the parameters were actually being passed on as intended, I was finally able to move to the next step in the process. I leveraged the NodeJS https and querystring packages to hit the Slack API with my token (as documented here.)
Working with NodeJS was relatively painless, but figuring out the particulars of the integration was not.
First of all, even though what we get back from the Slack API is formatted as JSON, it is not marked as such by its content-type. So that conditional checking for JSON was broken code. Then there was the question of what to do with the data returned by oauth.access. At the very least I would need to persist it, and I opted to use the same microservice to store it in DynamoDB.
DynamoDB for NoSQL storage
My first destination was the development guide for working with DynamoDB items. I was hoping that it would be something akin to working with Firebase and, well… the jury is still out on that.
The guide didn’t exactly tell me what to do, in fact it had entirely too much information to be useful to me at the moment. I did go ahead and create a DynamoDB table manually using the console.
Further poking around the web, there doesn’t seem to be clear documentation of how to use the DynamoDB calls specifically together with NodeJS’s aws-sdk library. Employing some trial and error, I was able to deduce the answer from the Rest API documentation for DynamoDB.
The putItem function takes a parameter object, along with a callback to invoke when the operation is complete.
The parameters object must have a TableName key at its top level. It should also have an Item key, containing the data that you want to put in the database. Additionally, that Item object must contain a primary key corresponding to what you declared it to be when you setup the table via the DynamoDB console (or API.) I add this key to the item object on line 10 of the gist.
I also went ahead and used context.done as the callback, since it has the typical NodeJS style (err, result) signature. The downside is that the web-based code editor does not recognize it as an exit point and puts up an obnoxious warning on the page.
Thanks to this warning, I wasted some time second-guessing whether it was indeed true that my function might not exit in an orderly fashion. Maybe the AWS code editor is smarter than I am? Meh, probably not in this case.
Thinking about error conditions made me realize that at the very least, my function should have some awareness of Slack’s API semantics so that it can respond appropriately to edge cases. But not wanting to get sidetracked from my experimentation, I decide to charge ahead with a happy path implementation. I really wanted to see if I could complete the whole feature by redirecting back to the DueProps network settings page.
Tribulations encountered trying to redirect the browser
Here is where my tribulations began. Much thrashing ensued before I chanced on this article that made me really sad.
You see, as of this writing, the API Gateway interprets any successful result from Lambda as a 200. As far as anyone can tell, that condition is not subject to change. Meaning that if you want to return a different status code, such as 302 (in order to accomplish a redirect) or even a 201 (to indicate that a new resource was created), then the Lambda function needs to fail. The code underlying the success/failure step calls toString() on whatever is passed to the fail function, and the resulting string is evaluated against regex criteria in the Method Execution settings of your API method definition.
As a workaround, I experimented with treating a successful outcome as a failure, in order to pass data out. However, when I tried to define a Header Mapping (Location, in my case) it was near impossible to figure out what is supposed to go in the Mapping Value field in order to access the data. No matter what I tried, the console complained about [Invalid mapping expression specified].
What did work is to pass a simple string to the header, which in the case of a 302 means I can send it to a static location. But that is of little use in an OAuth redirect process, since I want to send the user back to their own network settings, which include a network slug in the URL. Anyway, the hostname would need to be dynamic if I ever wanted to test this outside of production.
On a more basic level: I don’t want to mark my Lambda execution as a failure if it wasn’t. It’s the principle!
Indeed, some more digging confirmed that accessing data dynamically from that header mapping value field is simply not supported. According to this AWS forum post, it is currently impossible to access the data returned from the execution of the Lambda. This forum post even more clearly explains that although it is not possible, the AWS team is looking into adding it to a future release. So at least there is some hope on the horizon.
Also while on the subject, I want to mention that this useful general explanation of API Gateway + Lambda helped quite a bit also.
Diversionary tactic, a new function
Discouraged, but not entirely put off, I decided to try my luck with an Ajax endpoint for getting the data I stored on DynamoDB. I started with the blueprint again, but quickly trimmed it down to just a few lines.
It’s worth mentioning that with simple services like this, it’s really easy to use the Lambda console’s built-in test functionality. Click on the Actionsdrop down and select Configure test event.
The sample event template persists, so after you make any changes you can Save and test to make sure it still works. (I admit that the whole automated testing aspect of this scheme gives me significant pause.)
Back to the Rails codebase. A sprinkling of JQuery code added control over whether to show the Slack button or the settings stored on DynamoDB.
Old-school definition list abuse, ftw! If the data was available in DynamoDB, then my view would show it instead of the Slack button. Like this:
Satisfied that I was starting to get a hang of this stuff, but sad about the bigger failure, I decided to call it a night.
The impossibility of redirecting using data from the return of of the Lambda function had dashed my hopes for using the API Gateway + Lambda cocktail as an OAuth callback handler, not without involvement of the browser and Ajax requests anyway. Moreover depressing, the limitation also seriously undermined my aspirations for serverless design. Redirects are vital functionality, no matter what.
A new morning, a new solution
It’s a poor-man’s redirect, and a hack, but it should work for my purposes. I jumped out of bed and put some coffee on, determined to prove my workaround in a matter of minutes.
First, I removed the 302 mapping, and focused my attention on the default 200 response. I would need to specify a Content-Type header, since by default the API Gateway defaults to serving up application/json.
First stop, tell the gateway that you’ll be supplying a ‘Content-Type’ header on the Method Response tab.
Then flip over to Integration Response screen and add a Header Mapping specifying the value of the Content-Type header should be ‘text/html’. This will get the browser to interpret the data it gets as HTML. Next expand the Mapping Templates section, and add a mapping template for text/html type. No need for an actual template, just leave the default Output passthrough set. This tells the gateway to send through whatever was returned by the lambda.
Speaking of which, this is what the relevant part of the Lambda function code looks like.
For the initial test, I hand assembled a URL with the host hardcoded to http://localhost:3000 to test it in development. I remember thinking I’d need to somehow get a hostname from the context. But eager to see it working before getting sidetracked, I went ahead and saved the Lambda change, this time remembering to deploy the new gateway settings.
I had to also manually delete the DynamoDB settings, so that my UI would show me the button. With no small measure of excitment, I reloaded the page and there was my Slack button. I clicked it and got the familiar OAuth confirmation page from Slack. I confirmed, and the browser cranked away for a second and… It works! There was my integration settings page with fresh new data from Slack indicating that we were now connected.
Granted it is a hack, but it is fun and there is a certain elegance to it if I do say so myself. And anyway, I know I’m rationalizing, but most of the time we’re not going to need to redirect the result of an API call to a browser page. Consider this exercise as simply a proof of concept. Most of what we’re going to do on the backend will be accomplished using XHR requests and JSON payloads, not HTML.
Using Stage Variables to set our hostname
I still had some vital cleanup to do. The hostname was hardcoded to localhost for testing. That’s no good. So I changed the Lambda code to accept a hostname variable in the event object.
Now I navigated over to the stage definition. If you’ve been following along you probably have one stage, the first one created as part of the Lambda blueprint process. My existing one was called prod, so I added another called dev.
Then, within the Stage Variables tab, I set a ‘hostname’ variable to http://localhost:3000. I did the same for prod stage, but set that value to https://dueprops.com my production URL.
Now that the stage variable is set, I needed to get it passed to the Lambda function whenever a request happens. That is accomplished at the Integration Request settings of the flow, specifically in the Mapping Template for application/json. Remember that is where the query parameters are mapped to the event that is passed to Lambda. I added a mapping for hostname.
The API Gateway on-screen documentation for stage variables says to access them via the $context object in your mapping template, but that is either plain wrong or outdated. According to the API Gateway mapping template reference, there’s actually a $stageVariables object available specifically for that purpose.
Wanting to verify it, I clicked on the API Gateway’s built-in test function to verify that the mapping worked.
I was pretty confident in my configuration, but the value of the hostname was missing.
Then I realize that, duh, the built-in test function is stage agnostic, it must not have access to stage variables. We’re going to have to test this in vivo. But then I realize that the DueProps App settings in Slack have a hardcoded URL for the OAuth callback. And that is pointing at my prod stage endpoint. For testing, I want to send the callback to my dev stage, so I go over to my app settings and change the callback accordingly.
As with most OAuth implementations, Slack gives you the ability to pass the callback URI in the request, as long as it matches at least part of the URI specified in the app settings. I make a mental note to look into that in the future, so I don’t have to do this manual switching back and forth.
Now that the callback is pointing to my dev stage, I do push the Slack button. It works perfectly, redirecting back to localhost as expected.
Concluding thoughts and future topics
This little adventure provided a dose of drama and excitement for a couple of days over the Thanksgiving break and the fodder for this blog post. Maybe it’s just me, but I think this stuff presages a future where monolithic application server components go away completely.
In future blog posts, I plan to tackle some other important topics related to this serverless future.
Map a custom domain name to API Gateway and tackle whatever issues may arise with SSL certificates. Speaking of which…
Security! For testing purposes I set everything to be wide open, but what happens when I start locking things down? Will complexity start to overwhelm things? How does the API Gateway’s built-in support for API keys work?
Automated testing! I don’t think there’s much in the way of obstacles to testing NodeJS function code locally. Just need to think of a way to tie it into deployment. Integration testing this stuff is probably going to be challenging.
Deployment! Nobody (including me) wants to be coding in the AWS Console for Lambda. The alternative is to have some sort of scheme where the function code is versioned locally and deployed using the Lambda API. During my research I did encounter some tooling, but it looked somewhat primitive.