Load Balancing

Load balancers are a vital part of system design making sure requests are distributed evenly across servers and applications.

Assuming we have one server we expect all requests to go to that one server. If 2 servers, how do make sure it is even? How about 3? What if the servers aren’t made equally and server different throughputs of data? Load balancers solve this problem. Load balancers simply distribute a load across different registered services. On top of that load balancers can do much more such as detecting dead servers and routing requests away from them, request new servers be brought up when system capacity is met, and provide additional security by shielding servers from the external web.

From the client a load balancer looks like one service. After making a request the response is generally returned from the service balancer as well. Making the system completely encapsulated and simplifying the model of external dependencies and services.

Load balancers can also provide analytics on current network traffic and trends and help predict what future requests will look like for the whole system.

How to Load balance

There are several algorithms on how to actually balance requests:

1. Random - as the name implies this simply sends a request to a random server. There are potential issues with this such as unbalanced distribution, however this is simplest to implement
2. Least Connected - this method will assign connections to the server with the least number of connections. This will fix the distribution issue from random, however this method does not take into account individual server compute resources
3. Round Robin - Round robin goes around all possible servers and assigns requests 1 by 1, such as a circular queue would form: A->B->C->... By itself this strategy will solve most balancing problems for most organizations
4. Weighted Round Robin - This is an evolution of the round robin. Requests instead are assigned based on the relative capacity to other servers in the load balancing destination. Example server A is 2x more performant than server B. “Performant” is left open ended because performance can mean different things in different cirumstances
5. Least Bandwidth - This will send requests to the servers with the lower traffic in available data / second. This is good for getting requests to servers quicker if the payload is larger
6. Least Response time - This balancing strategy relies on trying to server the requests as quickly as possible to the end user and directing requests to the lowest requested servers. As a server has more open connections response time may generally decrease making this optimal in some situations
7. IP Hash / Session / Cookie - Instead of balancing by compute resources sometimes other factors are considered. In this case user sessions may be distributed among many servers. A load balancer can guarantee certain user requests go to server A and others go to server B every time

For the above strategies while some are most definitely better than others (looking at you random). The actual usage in practice depends a lot on problem context. Certain problems are best served by certain algorithms.

Load balancers as we have discussed above are a single point of failure. To rectify this having a redundant load balancer or load balancer cluster available will make them fault tolerant. When a load balancer goes down another one goes up, keeping the system continually healthy.