Vertical and Horizontal Sharding
There is miscommunication when relating to sharding. Sharding is a method used by large startups, perhaps late stage, and companies that work with big data. The problem is that if the data becomes too large, it can make the website non responsive. The databases start crashing. The initial solution can be rebooting the servers, but that slows down the performance because the caching is in place, it needs to heat up on the cold database. No database can handle the load of a giant company, especially social networks directly, if they have reached some meaningful traffic. Hence sharding is essential.
If sharding is not in place and company hits a huge traffic spike, it could kill the business. Hence the experienced software developers, identify this pattern early, and make sure that their product will scale. For example, if a company is working within US market, for x amount of users, and now open up EU markets for y users, if proper precautions are not in place, the company will fail.
Vertical sharding is cache based. Most of the data resides in the memory, hence when requests are made to obtain the data, the users always hit the cache. This is the design pattern for Facebook, where memcached servers are behind the friend graph algorithms. The caches have timeouts and given a time frame, retrieve the data from the database, when the user load is low. But 99% of the time, only caching needs to be hit by their giant installation.
Most junior engineers are not allowed to touch this code, at Facebook. The backend code was written by Mark Zuckerberg, with engineers from Twitter, to build a solid vertical sharding implementation. Twitter failed a lot initially with fail whale, because they didn't figure out the sharding. Once Facebook proved their sharding methodology, they allowed Twitter to use their technology.
The horizontal sharding is database driven. No single database can hold all the data, for example users, friends, photos, etc. Hence the database is partitioned by user key. A certain range of users go in database A, others go into database B, and so on. This way, the data is always backed in the database, with the vertical sharding in front. Products like Dynamo DB and Mongo DB try to solve this problem. However, none of these implementations have seen traffic likes of Facebook and Twitter. This is a trap because again, if the company gets to a meaningful traffic, they would either need to partner with Facebook and Twitter, try implementing themselves, or fail.
In the next blog on vertical and horizontal sharding, we will discuss some of the proven algorithms to make vertical and horizontal sharding work in parallel. Build it once, and never look back. Till then, think about the platform you are building. Would you ever have a large scale problem, likes of social networks, Facebook or Twitter? If not, please thank God this is not your concern.
Go to top.