Info: This is one post in a series of posts titled “The myths, magic and marketing of developing reliable, scalable and secure software”
The simplifying assumptions are that “many hands make light work” and that the sheer presence of many copies yields greater reliability. The devilish detail is coordination and synchronization.
Creating a process group, which is what a cluster typically is, is a very, very difficult task to do well. Even some of the biggest and best minds in the software industry have done this poorly.
While a well-designed solution can guard against application failure, system failure and site failure, cluster technologies do have limitations. Cluster technologies depend on compatible applications and services to operate properly. The software must respond appropriately when failure occurs.
Cited from “Cluster Architecture Essentials” on Microsoft TechNet
In other words, using a cluster doesn’t provide anything on its own. You still have to design and build applications in special ways to receive the benefit of the underlying cluster.
Here’s an example which I think shows quite clearly how exactly the cluster can fail:
In a 4-node SQL Server cluster:
- Database Server 1 may handle accounts that begin with A-F.
- Database Server 2 may handle accounts that begin with G-M.
- Database Server 3 may handle accounts that begin with N-S.
- Database Server 4 may handle accounts that begin with T-Z.
Cited from “Cluster Architecture Essentials” on Microsoft TechNet
So, when Database Server 3 fails, you lose all accounts that begin with N-S. Not Good!
Some readers might feel that a better architecture would be to keep complete copies of all data on all four database servers. This can fail in very bad ways as well. For example, depending on how the replication scheme was implemented, you could end up with all 4 servers completely locked up in the event that any single machine goes down. And then there’s the fact that you would need 4x as much storage capacity.
There are, in fact a number of ways to use a cluster architecture to achieve higher availability, reliability, fault-tolerance and scalability. For the sake of staying on topic, I’m not going to go into what all of those techniques are. I will say, however that the appropriate techniques are determined by your specific needs and the specific properties that you need to build into your application.
If you’re interested in discussing cluster architectures and how to use them effectively, I invite you to either contact me directly or post a comment to that effect. I would be happy to speak with you individually.
The point I want to make is that simply integrating a clustered architecture doesn’t mean your system is or will continue to be reliable, scalable and highly available. And in Microsoft’s own words, “Cluster technologies depend on compatible applications and services to operate properly.”. This implies that you need someone who really knows what they’re doing to build a solid system. Scalability from a cluster is not turn-key.
I invite everyone to submit comments and questions.
The next post in this series will cover caching. So keep an eye out.
No comments yet.