Info: This is one post in a series of posts titled “The myths, magic and marketing of developing reliable, scalable and secure software”
The simplifying assumptions are that “many hands make light work” and that the sheer presence of many copies yields greater reliability. The devilish detail is coordination and synchronization.
Creating a process group, which is what a cluster typically is, is a very, very difficult task to do well. Even some of the biggest and best minds in the software industry have done this poorly.
While a well-designed solution can guard against application failure, system failure and site failure, cluster technologies do have limitations. Cluster technologies depend on compatible applications and services to operate properly. The software must respond appropriately when failure occurs.
Cited from “Cluster Architecture Essentials” on Microsoft TechNet
In other words, using a cluster doesn’t provide anything on its own. You still have to design and build applications in special ways to receive the benefit of the underlying cluster.
Here’s an example which I think shows quite clearly how exactly the cluster can fail:
In a 4-node SQL Server cluster:
- Database Server 1 may handle accounts that begin with A-F.
- Database Server 2 may handle accounts that begin with G-M.
- Database Server 3 may handle accounts that begin with N-S.
- Database Server 4 may handle accounts that begin with T-Z.
Cited from “Cluster Architecture Essentials” on Microsoft TechNet
So, when Database Server 3 fails, you lose all accounts that begin with N-S. Not Good!
Some readers might feel that a better architecture would be to keep complete copies of all data on all four database servers. This can fail in very bad ways as well. For example, depending on how the replication scheme was implemented, you could end up with all 4 servers completely locked up in the event that any single machine goes down. And then there’s the fact that you would need 4x as much storage capacity.
There are, in fact a number of ways to use a cluster architecture to achieve higher availability, reliability, fault-tolerance and scalability. For the sake of staying on topic, I’m not going to go into what all of those techniques are. I will say, however that the appropriate techniques are determined by your specific needs and the specific properties that you need to build into your application.
If you’re interested in discussing cluster architectures and how to use them effectively, I invite you to either contact me directly or post a comment to that effect. I would be happy to speak with you individually.
The point I want to make is that simply integrating a clustered architecture doesn’t mean your system is or will continue to be reliable, scalable and highly available. And in Microsoft’s own words, “Cluster technologies depend on compatible applications and services to operate properly.”. This implies that you need someone who really knows what they’re doing to build a solid system. Scalability from a cluster is not turn-key.
I invite everyone to submit comments and questions.
The next post in this series will cover caching. So keep an eye out.
Update: The newest installment in this series covers the topic of clustering.
Three thoughts have encouraged me to contribute my perspective on developing reliable, scalable and secure software.
Thought #1
When I was in graduate school one of my professors told me “If someone comes to you claiming to know how to build reliable, scalable and secure software, run away.”
Thought #2
I’ve observed during my 10+ years in the software industry that there are any number of products on the market that claim to be platforms which magically allow software applications to scale.
Thought #3
A good friend of mine if taking a class teaching distributed computing concepts. He showed me his first assignment and asked “Is this course going to get harder? This stuff seems so easy.”
Having taken the same course myself (and having gotten an A) I know exactly why my friend had the perspective he did given that the course was actually quite a tough one. The techniques which make software reliable, scalable and secure seem simple on the surface. But when you take those ideas apart and look deeper to understand the foundations on which those techniques are based, things get very complicated very quickly.
My 10+ years of experience with software has led me to this thesis:
Scalability, reliability and security are indispensable, elusive and are in fact extremely difficult to achieve. They cannot be purchased in shrink-wrapped packaging and there are no one-size-fits-all or “magic bullet” techniques that can be applied to every problem to provide these properties. Each system must be looked at carefully and in the context of the other applications with which it interacts, the hardware and networks on which it runs and the use cases to which it will be subjected. Only after this careful analysis is underway can you begin to understand which of the many available techniques are appropriate for the system at hand.
So, in a series of upcoming posts I’ll be covering a number of techniques commonly discussed in the industry in simple terms as tools for achieving reliability, scalability or security. I’ll present the typical over-simplification and outline in detail some of the complexity that makes it so difficult to achieve something as essential and commonly discussed as building robust software.
Final Thought
I encourage anyone and everyone to read the upcoming posts in this series and to do so with their critical-thinking hat on. I also encourage all of my readers to share their experience by joining the discussion on each post. I’m always interested in refining the techniques I have at my disposal. Likewise, I always delight in a story about a system or situation that went horribly wrong. So please, join the discussion.
If you have any techniques that you specifically would like me to cover, please let me know either by commenting on this post or by sending me an email at blog@evanscode.com.
I look forward to hearing from all of you.
And now, onto the topics:
A colleague of mine made a good point today about a way in which implementing a continuous integration process varies according to the technology stack you have in place. The difference is small but fundamental. It serves however, to demonstrate the importance of following generally accepted ideas on how to develop an effective automated testing process. You can read more on these ideas in our previous post on effective automated software testing.
Two basic pillars of an automated testing process are unit tests and continuous integration tests.
To quote Wikipedia, unit testing is “a method of testing that verifies the individual units of source code are working properly.”
To quote Wikipedia, integration testing is “the phase of software testing in which individual software modules are combined and tested as a group. It follows unit testing and precedes system testing.
To quote Martin Fowler:
Continuous Integration is a software development practice where members of a team integrate their work frequently, usually each person integrates at least daily - leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly.
Cited from: Martin Fowlers page on continuous integration
The point that my colleague highlighted today is that compiled languages have a nice, basic check which is pre-runtime compilation. It’s easy to see that something is wrong when code doesn’t compile.
Stacks based on interpreted languages however perform this basic syntax check at runtime, when the code is interpreted. So you need to ensure that at least some code in every code file in your codebase is executed in order to achieve the same completeness that you get with a compiled language like Java or C#.
Again, this isn’t a major difference between interpreted and compiled languages. But it does serve as a reminder of the value of code coverage in developing thorough testing procedures.
For more information on implementing thorough testing procedures, read our previous post on effective automated software testing.
Have you encountered more significant differences in developing effective automated testing procedures using different software stacks? Let us know! We’re always looking for new ways to improve our own testing processes.
Happy testing!
Good automated testing procedures (for unit and integration testing) use two metrics to gauge the overall effectiveness. The first metric is the pass rate observed when running the tests. If only 20% of your tests pass, you know there’s a problem. But a 100% pass rate doesn’t mean that you’re in the clear either.
The second metric that’s needed to ensure effective testing is something called “code coverage”. A code coverage value tells you how much of your code is actually being tested. It actually tells you what lines of code were executed during a test suite and which lines weren’t. Pretty cool eh?
So, a 100% pass rate is nice. But it’s a poor indicator if your code coverage is only at 40%. This would indicate that 40% of your code does exactly what you want it to do (or at least what you’ve written tests for). But, 60% of your code isn’t even tested. So, your application could be 60% broken and you wouldn’t even know it.
The third factor that comes into play is the quality of your tests themselves. Even if you have 100% code coverage and a 100% pass rate, there may still be use cases or user stories that you haven’t written tests to test.
As you can see, it’s a real exercise in rigor to write an effective, comprehensive test suite. In the last analysis though, it’s the only way to prove, not just be able to say, that your software works properly.
Tools are available to enable you to test effectively in all of the more common languages used today. PHP, Python, Ruby, Java, .NET, PERL and even Javascript. A quick Google will find you what you need to get started.
So, if the quality of your software matters, and it’s important to you to find bugs before your customers do (as it should always be and certainly is for us) then we encourage you to start thinking about the quality of your testing process.
We encourage you to tell us about your own experiences with automated software testing. Tell us what tools worked well for you and which didn’t. Tell us what was most valuable about having good tests. Tell us about how responsive your developers were to fixing broken tests. And any other information on what made your testing processes succeed and/or fail is very welcome.
Happy testing.