Case XX: From thousand to ten millions for 10k
Disclaimer: Showcasing our experience, we are trying to avoid any mentioning of brands or names, with respect to non-disclosure agreements, if relevant.
Let’s talk about microservices today. Our team of software developers receives lots of requests to build another YouTube, Uber, or Twitter. Big agencies take such orders with pleasure and close them for 200–400 hours for a relatively small price. If the client is lucky, they receive a high-quality product which can be shown to investors and potential users. It’s great if the client has an understanding that this product is not ready for the public use yet, and can only be called a Proof-of-concept, a prototype, or an MVP for a narrow target audience. However, it’s not likely that this product could work as a mature MVP that could cope with unpredictable load surges when it’s popularized by someone putting a link on it on the blog, journal, or marketplace. Such product also can’t guarantee a stable resistance to software, hardware, or human errors.
Fault tolerance, and as a result higher uptime, scalability and its automatization — require additional investments. These factors are necessary for good operation of the product, and without them, public release is not recommended. Each point demands a serious, detailed review and consideration when deciding upon an architectural solution and the budget.
Building software or a platform with service-oriented architecture (SOA) or microservices specifically, it’s important to remember that there are lots of possible consequences when it comes to the uptime scaling. Let’s take a look at the most common ones:
- 98% of uptime = 172 hours of downtime per year, or 15 hours per month, or 30 minutes per day.
It is acceptable in case you have a small budget and no need in constant system operation. The platform can be updated manually and you can save on support of fully-fledged software development environment. The service of such level is usually offered by small agencies and freelance developers.
- 99% of uptime = 88 hours of downtime per year, or 7 hours per month, or 15 minutes per day.
- 99,8% of uptime = 18 hours of downtime per year, or less than 2 hours per month, or 3 minutes per day.
Usually, this is a sufficient level of uptime at which the budget share on DevOps is reasonable compared to the overall budget.
- 99,95% of uptime = 4 hours of downtime per year, or 22 minutes a month, or less than a minute per day.
It’s highly predictable that from this mark the cost of DevOps, automation, and redundancy will grow significantly due to the increase of availability. The further availability improvement means the budget for its management becomes comparable to the budget for development of product’s functionality, and may even exceed it.
Performance and Scalability
Once we had been working on the project — entertaining website which was based on WordPress CMS. Like any other regular website on WordPress, it had a fairly good admin panel, module system, and… it certainly wasn’t adapted to scaling and high loads of traffic.
Notwithstanding this fact, the client manipulated website’s content successfully. With viral topics the number of visits increased in leaps and bounds and demonstrated tremendous numbers of page views, sticking to the uptrend.
On the early stages, client’s income wasn’t growing very fast and since the initial budget was not very big, we’ve started from quite simple and cheap solutions. We’d integrated the modules for data and ready pages’ caching, and configured Memcached and Varnish systems. However, these rapid and effective measures, which took us only several days to implement, just delayed more complex issues that we had to optimize for smooth system performance later.
With the increase of the visits, there was an expected increase in the budget required for performance improvement. Unfortunately, it turned out not as smooth as we wanted. Sometimes, loads from unstable visits urged sudden website overload. The loss of visits steadily decreased the income and as a result, increased the overall concern of the team which had been working on the project.
In order to avoid financial loss, we needed an effective but reasonable solution. Microservices was not a very popular type of software architecture back then but we considered it as the only suitable one for the type of issue we had.
We divided highly-loaded frontend part of the software from the rest of the WordPress blocks. Such operations like visitor counting and A/B testing were also segregated into the separate microservice. We’ve maintained Redis between the database and frontend, which then played a role of the source of information for each post.
Such decision increased the limit of possible visits radically: from 5–10 per second up to more than 100, which positively influenced the upcoming income growth from advertising. Such changes allowed us to get the maximum of 360,000 visitors per hour, or 8,000,000 per day, or 170,000,000 per month using only one server with average configuration capabilities.
This case showed us that it’s always better to prepare for unpredictable loads beforehand. The budget may not always allow to fully secure your solution from dropping, but it may certainly allow you to at least consider the option of building a platform on SOA right from the start, rather than move it from monolith to microservices when you’re already close to a disaster mode.