One of the new features we added at SAM was the ability to embed Stories into a series of different templates. We call these outputs SAM Publishing. They are ideal for placing curated social content in web pages, hosting on big screen TVs at events and have many other use cases going forward. These embeds are delivered and hosted as a SAM service to all of our users of which many are the largest publishers in the world, receiving large volumes of traffic each day. As a result, SAM needed to scale to handle the load of all of our users, and by extension all of their page views. It was crucial that we test our ability to scale with load. We want our clients to trust us and in order to do that, we have to provide a service that is available at all times. This article is an overview of the process we took and the journey to get each instance of our web servers to handle hundreds of millions of requests per month. At SAM our core app technologies are Node.js and Mongo DB hosted primarily on AWS. Instead of listing all of our technologies, the following diagram provides a good overview of our infrastructure. Throughout our journey of scaling SAM we worked with a company called RunWithIT which have a proprietary set of technologies for spinning up multiple AWS instances that run load tests and gather results into graphs. Our team worked closely with Dean Bittner (founder of RunWithIT) whose specialty is “bringing down servers to identify bottlenecks and achieve scale.” We’d highly recommend anyone who wants to release with confidence to reach out to the RunWithIT team. We also used a combo of Async Queue and hyperquest to run our baseline test of continuous use of 50 concurrent users. Once we felt we had solved the current bottleneck, we’d get Dean to run the full load against our test servers. Our initial alpha launch target was 500 million requests per month per instance. This would safely let us handle any of our alpha client’s monthly load if it were to all go through just one SAM embed story. It seemed easy, bundle the CSS and JS files in a grunt build and ship it right?!? Well, the initial results failed miserably… Our initial bundled JS file was 234 KB. As soon as we experienced basic load of 20 concurrent users, Amazon Elastic Beanstalk (EB) started spinning up new servers. Surely a server could handle more than 20 users! We soon realized the default EB load balancing policy was to scale based on network traffic. The solution: reduce the amount of data sent per request. To do this we offloaded the vendor (non SAM) JS files into CDNs. This reduced our bundled JS file to 4 KB and would also offered quicker delivery times of our vendor dependencies to our clients outside North America. A substantial gain. At this point we had a 10x increase in results. Or at least we thought we did because the acceptable results would only last for 60 seconds of load. What we noticed was MongoDB wait times started growing over time – to the point where the response times were over 30s. This was not acceptable. At this point we thought Mongo should be able to perform better but we ignored that and added a caching mechanism. Since the content of the stories are somewhat static (except for live event style stories) we could cache requests for around 10s and use pusher for real-time updating. We implemented a basic in memory cache that would expire content after a specified time. This worked but the results were the same 30s response time under load. This pointed to a new bottleneck though – the web server itself. Our test results began showing a consistent up and down pattern. We needed to dig deeper. The problem was our web servers were running out of TCP connections. As a former .NET developer and non sys admin, it was time to learn how to scale a Linux box to handle more requests. Terms like ULIMIT were new concepts. A base AWS instance allows 1024 TCP requests/per 60 seconds. Once all the TCP connections were used up, the applications would wait for an available one resulting in the spikes in the graph. Inside AWS, servers would be at 100% CPU which was a bit misleading because they are simply waiting and not running at capacity. 1024 requests per 60 seconds meant that each server could only handle 1024 requests per minute or about 1.5 million embed views per day. FAIL. The scale at which our users get page views would be very expensive. We were surprised the default instances from Amazon only allowed for that many requests. The up side is that it allows you to fail quickly and understand what you are doing. For SAM to scale, we needed to modify the following servers in AWS.
- Node.JS application servers – this is a good starting point to learn how to increase system resources on a linux box. Within Elastic Beanstalk, you can use ebextensions to customize the EB containers.
- MongoDB – MongoDB provides a good article with a set of production release notes including their recommended production configuration settings. The whole article is a good read.