This article, contributed by STP guest editor Rex Black, is reprinted from “Beautiful Testing: Leading Professionals Reveal How They Improve Software” (O’Reilly, October 2009). The book, edited by Tim Riley and Adam Goucher, is a compilation of essays from 27 world-renowned testing and development experts based on their experiences in the field (all author royalties from sales of the book go to the Nothing But Nets anti-malaria campaign). Here, Black shares his insights into design and implementation of tests that pay off for stakeholders, both immediately and long-term.
When we describe something as beautiful, we mean that it has qualities that give great pleasure or satisfaction. In this article, I write primarily about the latter, not the former. Yes, testing should provide pleasure. However, testing that pleases the tester may prove a superficial form of beauty, because it does not satisfy the stakeholders.
We are all familiar with superficial forms of beauty that fade. In the entertainment world, we find some actors and actresses beautiful for a time, but in the long run many of them fade. However, some actors and actresses remain beautiful to watch throughout their careers. This is not due solely to physical beauty, but rather because we find beauty in their ability to satisfy us as viewers of their performances. Their performances tell us something that is deeply true, that is moving, and that inspires us to change the way we see the world we live in.
Similarly, in testing, although some approaches to testing might strike us as beautiful at first, in the long run those approaches to testing that satisfy our needs are those which are truly beautiful. Beautiful testing tells us something that is true. Beautiful testing moves us. Beautiful testing inspires a change in our beliefs about and actions towards the projects we work on—and the organizations we work within.
In this chapter, I address the beauty of testing that does not fade. What is it about testing, done really well, that provides long-term satisfaction to the stakeholders? Who are the stakeholders and what do they want from testing? What are the external and internal forms of beauty in testing that truly satisfy? How can testers and test managers build testing organizations that provide this satisfaction over the long term? How can we, as test professionals, create elegance, effectiveness, efficiency, and even delight for ourselves and our stakeholders in our work?
For Whom Do We Test?
Let’s start by identifying these people whom we want to satisfy. I’ll use the broad term “test stakeholders” for these people. This broad term goes beyond test participants, though it includes them. It goes beyond project participants, though it includes them. It even goes beyond the members of the organization for which we are testing, though it includes them, too. Everyone with a stake—an interest—in the testing we do and the quality of the final deliverable is ultimately a test stakeholder.
We can divide the list of test stakeholders into external and internal stakeholders. We can choose any number of boundaries between internal and external, but let’s take an obvious one: the internal stakeholders are those doing, leading, or managing the test work. The external stakeholders are all other applicable stakeholders.
So, who are these stakeholders? The answer varies from one project, product, and organization to the next. However, here are some typical answers, starting from the most immediately obvious stakeholders (the ones we work with daily) to the ones perhaps less obvious but no less important (the ones who ultimately are satisfied that testing accomplished what it must):
- Fellow testers. The people doing the testing work.
- Test leads and managers. The people who plan, direct, measure, and manage the testing work and its results.
- Developers, development leads, and development managers. The people who implement the system. They receive our test results, and often must respond to our findings when they indicate a need for changes and improvements.
- Database and system architects. The people who design the products. They also receive our test results and often must respond to our findings when they indicate a need for changes and improvements.
- Marketing and business analysts. The people who determine the features—and their quality attributes—that must be present in the system as designed and implemented.
- Project managers. The people responsible for bringing the project to a satisfactory conclusion. They must achieve a proper balance between quality, schedule, feature, and budget priorities.
- Technical support and help desk staff. The people who must support the users, customers, and sponsors who eventually receive, pay for, and benefit from the features and quality of the final deliverable.
- Sales managers, engineers, and staff. The people who find the customers, determine how to employ our systems to satisfy their needs, and manage the profitable delivery of our systems.
- *Executives, officers, ministers, and/or directors. The people who run the organization, either on a daily basisor as an oversight body. These roles—and the needs of those in these roles—tend to vary depending on whether we consider a public organization (e.g., a government agency), a non-profit organization (e.g., a charity), a publicly-held organization (e.g., a listed corporation), or a privately-held organization (e.g., a partnership or sole proprietorship).
- Company shareholders. For publicly-or privately-held companies, the people who own the company.
- Elected officials and voters. For public agencies, the people who pass laws and make decisions that affect the organization, and those who elected them.
- Regulators and law enforcement. The people who ensure compliance by the organization, its people, and its systems with applicable laws and regulations.
- Users. The people who use the system directly or who receive its results, reports, data, etc. For companies that use systems internally, such as utilities or insurance companies, their customers are indirect users of their systems.
- Vendors. The people who might provide components incorporated into our systems or who might be users of our systems.
- Customers and sponsors. The people who pay for the development, acquisition, purchase, and/or installation.
- Public and society. The people who live in the communities where the system exists or is used.
This list is not exhaustive and does not apply to all projects.
I should mention another important point. Every stakeholder listed above—and perhaps others on your project—has an interest in your testing. Most of these stakeholders typically want to see your testing and the project succeed. However,
not all necessarily have such a motivation.
Some stakeholders are neutral. For example, regulators and law enforcement typically care more about ensuring that you, the project team, and the organization follow the rules. If failure to follow the rules results in negative consequences, their attitude is likely to be, in the words of the 1970s US television program, “If you can’t do the time, don’t do the crime.” In some cases, failure to adhere to the rules might well constitute a crime, so know your obligations. Running afoul of regulators or law enforcement is not a beautiful experience.
And, while blessedly infrequent, some stakeholders are inimical. In my rare encounters with such stakeholders, I have called them anti-stakeholders. For example, some projects to replace legacy systems require the involvement of the very people who continue to support and maintain these legacy systems. These people might feel that the legacy system works just fine, thanks very much. Since the organizational dictate requires that they participate in the project, they do, but passive aggression is the order of the day. These anti-stakeholders hope that the project as a whole fails, and have no problem with your test work contributing to that failure. Contributing to project failure is not a beautiful experience.
Take the first step towards beautiful testing and determine who your test stakeholders are. If you do not know who the stakeholders are, you might achieve beautiful testing according to some, but others will not find it beautiful. In our consulting work, RBCS assessors see many examples of neglected stakeholders who are unhappy with the test team’s work. Our clients who have thought carefully about testing stakeholders stand a much better chance of testing beautifully. Clients that neglect neutral stakeholders and anti-stakeholders can have a very ugly experience indeed.
Each stakeholder has a set of objectives and expectations related to testing. They want these carried out effectively, efficiently, and elegantly. What does that mean?
Effectiveness means satisfying these objectives and expectations. Unfortunately, the objectives and expectations are not always clearly defined or articulated. So, to achieve effectiveness, testers must work with the stakeholder groups to determine their objectives and expectations. We often see a wide range of objectives and expectations held by stakeholders for testers. Sometimes stakeholders have unrealistic objectives and expectations. You must know what people expect from you, and resolve any unrealistic expectations, to achieve beautiful testing.
Efficiency means satisfying objectives and expectations in a way that maximizes the value received for the resources invested. Different stakeholders have different views on invested resources, which might not include money. For example, a business executive will often consider a corporate jet an efficient way to travel, because it maximizes her productive time and convenience. A vacationing family will often choose out-of-the-way airports and circuitous routings, because it maximizes the money available to spend on the vacation itself. You must find a way to maximize value—as defined by your stakeholders—within your resource constraints to achieve beautiful testing.
Elegance means achieving effectiveness and efficiency in a graceful, well-executed fashion. You and your work should impress the stakeholders as fitting well with the overall project. You should never appear surprised—or worse yet, dumbfounded—by circumstances that stakeholders consider foreseeable. Elegant testers exhibit what Ernest Hemingway called “grace under pressure,” and there’s certainly plenty of pressure involved in testing. You and your work should resonate as professional, experienced, and competent. To achieve beautiful testing, you cannot simply create a superficial appearance of elegance—that is a con man’s job. Rather you prove yourself elegant over time in results, behavior, and demeanor.
As I mentioned, the perspectives on effectiveness, efficiency, and elegance can vary considerably according to the stakeholder. They can also vary considerably by group and by organization. To illustrate that, consider the following examples for two of the stakeholder groups I mentioned in the earlier section.
For some clients, we have found that testers tend to gauge their effectiveness in terms of finding bugs. The more severe the bug, the happier the tester, even if these severe bugs are highly unlikely in real-world usage and not related to important usage scenarios. The more bugs the testers find, the more efficient the testers consider themselves. Such testers consider it elegant to construct a particular devilish—sometimes even tortured—test case that causes a crash, abnormal application termination, computer lock-up, data loss, or similarly spectacularly severe system crash. Test leads and managers, if they encourage such a bug-focused culture, tend to make this perspective even more prevalent. At the extreme end of the scale, some test managers even pay bonuses or measure testers on their yearly performance evaluations based on the number of severe bugs found.
Development managers and projects managers generally do not appreciate such a one-dimensional outlook. They do not consider bug-obsessed testing beautiful at all, but rather antagonistic, disruptive, and obstructive. Effectiveness means that testers focus their efforts on important areas and typical workflows, and find whatever bugs exist there. Efficiency means covering critical and typical scenarios and finding important bugs early in the project. Elegance means clear reporting of results, based on functional areas and key quality risks, not on obscure corner cases.
Conflict results from these divergent perspectives. This conflict generally reaches its most intense stage during test execution. During test execution, testing is on the critical path for release. Each bug found and each test case failed represents a possible delay to the project. Tempers can become short and patience limited. So, conflict can reduce team cohesion and efficiency. The product often goes into production late, or with more bugs than necessary, or both. Further, a residue of bitterness and resentment begins to build between the test team and others on the project. Often, organizations choose to dissolve or reorganize such test teams after a while.
This situation is not very beautiful, is it? What if we could establish a consensus with our fellow stakeholders about what constituted effective, efficient, and elegant testing before we reached such a sorry, often irretrievable, state? Assuming we can achieve the objectives we set, to a level of capability that is possible, then we could achieve widespread satisfaction with our testing work. Ah, satisfied stakeholders: now that is beautiful!
Take the second step towards beautiful testing and determine what objectives and expectations your test stakeholders have. If you do not know your stakeholders’ objectives and expectations, only by luck will you achieve beautiful testing, and usually only for a few of the stakeholders. When my associates and I assess test teams, we see many examples of unfulfilled objectives and expectations, leading to a lower-than-necessary degree of satisfaction in the test team’s work. Our clients who have identified stakeholder objectives and expectations often test beautifully.
What Beauty Is External?
Consider a world-class distance athlete such as an Olympic marathon runner or an Ironman triathlete. Such athletes have a rugged external beauty, a form-fits-function appearance. They are lean. They have extremely well-toned and well-defined—but usually not bulky—muscles. During their competitions, they show a determined face, and they bear the pain of the long event with grace. We can measure their effectiveness, efficiency, and elegance by their final times, their race standings, and their sportsmanlike behavior—win or lose.
A good test team also displays an external beauty, similar to a long-distance athlete. After all, testing is much more like a marathon than like a sprint!
Suppose that, by working with your testing stakeholders, you identify a number of objectives for testing. One includes
a typical objective, that of finding bugs, especially important bugs. How might you determine your externally-visible effectiveness and efficiency for this objective? Consider the following questions:
- What percentage of the bugs delivered to us do we find?
- Do we find a higher percentage of the important bugs?
- What is our cost per bug found and fixed during testing compared to the cost of a failure in production?
For each of these questions, devise a metric. Start with the percentage of bugs that you find. You can measure this with the defect detection percentage, shown in Equation 1. If your testing is the last quality assurance activity prior to user acceptance test and then deployment, you can simplify the metric as shown in Equation 2. Typically, there is a characteristic period of time in which most of the bugs that will be found in production have been found, so you can calculate the defect detection percentage after that period of time has passed since deployment.
Defect detection percentage (DDP)
bugs detected DDP =
Defect detection percentage (DDP) for the last level
of testing prior to UAT and deployment
test bugs DDP =
test bugs + production bugs
Based on our assessments of (and feedback from) clients around the world, an independent test team’s defect detection percentage for a system test or system integration test averages around 85%. However, significant variation exists on this metric. For systems developed for internal use, you should target a higher number, closer to 95%, since the users are typically less varied and the set of use cases and supported environments smaller. For systems developed for a mass market, wide variations in users, their skill levels, their usage of the system, and the environments in which they will use it make achieving a high defect detection percentage much harder. That said, for mission-critical or safety-critical systems, you will need to achieve a very high defect detection percentage.
With a measure of our bug finding effectiveness in hand, devise a metric to check your focus. Does your test team find a higher percentage of the important bugs? You can check this by using the defect detection percentage metric again. First, calculate the defect detection percentage for all bugs. Then, calculate the defect detection percentage for the critical bugs only, however you define “critical bugs” in your organization. The relationship shown in Equation 3 should hold.
Bug finding focus
DDP < DDP
Generally, our clients that practice successful risk based testing can achieve a satisfactory defect detection percentage for critical bugs, and their defect detection percentages adhere to Equation 3. If you need to adopt risk based testing in your organization, you can see my book Pragmatic Software Testing or some of my articles, podcasts, or videos on risk based testing in the RBCS Library at www.rbcs-us.com. In any case, you should try to achieve a defect detection percentage for critical bugs that consistently comes close to 100%. You should carefully analyze any critical bugs that do escape to production to see how you can improve your testing and catch such bugs in the future.
Finally, not only should we find a sizeable percentage of the bugs, and not only should we find more of the critical bugs than of the less critical bugs, but we should also find bugs more cheaply than the alternative: customers and users finding bugs in production. The recognized technique for measuring the cost of failures is called “cost of quality.” You can find a complete description of this technique in my book Critical Testing Processes, or in my article “Testing ROI: What IT Managers Should Know,” which you can find at the RBCS Library at www.rbcs-us.com.: www.rbcs-us.com
Using cost of quality, you can identify three main costs associated with testing and quality:
- Cost of detection: the testing costs which we would incur even if we found no bugs. For example, performing a quality risk analysis, setting up the test environment, and creating test data are activities that incur costs of detection.
- Cost of internal failure: the testing and development costs which we incur purely because we find bugs. For example, filing bug reports, fixing bugs, confirmation testing bug fixes, and regression testing changed builds are activities that incur costs of internal failure.
- Cost of external failure: the support, testing, development, and other costs which we incur because we don’t deliver 100% bug-free, perfect products. For example, much of the costs for technical support or help desk organizations and sustaining engineering teams are costs of external failure.
So, we can identify the average costs of a bug in testing and in production, as shown in Equation 4 and Equation 5. Typically, the average cost of a test bug is well below the average cost of a production bug, often by a factor of two, five, ten, or more. Equation 6 shows a calculation of the return on the testing investment basted on these figures. The logic behind Equation 6 is that each bug found by testing gives the organization the opportunity to save money, specifically the difference between the cost of a test bug and the cost of a production bug. The cost of the investment in testing is the cost of detection (since the cost of internal failure is not an investment).
Average cost of a test bug (ACTB)
cost of detection + cost of internal failure
Average cost of a production bug (ACPB)
cost of external failure ACPB =
Calculating the testing return on investment (Test ROI)
(ACPB – ACTB) x test bugs Test ROI =
cost of detection
In RBCS assessments and projects, my associates and I have found return on the testing investment to range from a respectable low around 25% all the way up to over 3500%. Generally, as the cost of external failure goes up relative to the cost of internal failure, the return on the testing investment also goes up. In other words, the more expensive it is for your organization to deal with bugs in production, the more it should invest in testing.
In terms of setting a target metric for your return on investment, be careful. Sometimes, optimizing the return on the testing investment (your bug finding efficiency) can reduce your defect detection percentage, your bug finding focus, or both (your bug finding effectiveness). During assessments, if the test team has a positive return on the testing investment, we recommend only those efficiency changes unlikely to reduce effectiveness. I’ll discuss an example of such an improvement in the next section.
Now you can take the third step towards beautiful testing: establishing metrics for effectiveness and efficiency and goals for those metrics. In this section, I used Victor Basili’s goal-question-metric approach to do so. You’ve already understood the objectives and expectations of your stakeholders, so those are the goals. Now, what questions would you need to answer to know whether your testing achieved those goals? Finally, what metrics could demonstrate the extent to which you achieved those goals? Now you have a way of measuring your testing in terms of what satisfies your stakeholders. How beautiful is that?
You’re not quite done yet, though. You still have to consider the elegance element of beauty. Establish an ethic of elegance, in terms of graceful work, a service-oriented outlook towards your stakeholders, and a focus on what really matters to the organization. Years ago, someone coined the term egoless programming. Similarly, beautiful testing is egoless testing.
In our assessments, RBCS consultants have seen shining examples of test teams that know their stakeholders, know the objectives and expectations those stakeholders have, and know how to achieve and measure success for those objectives and expectations. These clients almost always test beautifully.
What Beauty Is Internal?
There is one more element to beautiful testing we need to consider: internal beauty. Let’s return to the metaphor of a test team as an Olympic marathon runner or an Ironman triathlete. Underneath the surface, their internal organs all serve the purpose of athletic performance. Muscles are trained for hour after hour of endurance. The digestive system enables the conversion of carbohydrates to fuel and protein to muscle, and distributes water into the body to maintain healthy hydration. So, we can measure effectiveness, efficiency, and elegance by calories burned, body fat percentages, and long-term health.
A good test team also displays a similar internal beauty. Since testing is like a marathon, we need a test team that
can go the distance on project after project, often under trying circumstances.
Suppose you have determined that your team spends a sizeable percentage of its time doing regression testing manually. Even if the defect detection metric indicates that you don’t miss many bugs, manual regression testing is tedious, expensive, error-prone, slow, and morale-sapping. So you could decide to use automation to reduce the manual effort while continuing to maintain a low level of regression risk in delivered products. How might you determine your externally-visible effectiveness and efficiency for this objective? Consider the following questions:
- What percentage of regression tests have we automated?
- What percentage of regression-related quality risks do we cover?
- How much more quickly can we run our automated regression tests?
For each of these questions, devise a metric. Start with the percentage of regression tests automated, as shown in Equation 7. This metric typically cannot—and should not—reach 100%, since some tests require human judgment or interaction during test execution. Many of our clients do achieve regression test automation as high as 90%. You’ll need to do some careful analysis to determine your target.
Regression test automation percentage (RTA)
automated regression tests RTA =
manual regression tests +
automated regression tests
Test automation should preserve or lower the level of regression risk. So, you should measure the percentage of regression-related quality risks covered, as shown in Equation 8. To calculate this, you need the ability to establish the relationship between your tests—both automated and manual—and the underlying regression risks. (If you’re not familiar with this idea of traceability between the tests and the test basis—i.e., what the tests are based on—you can find a description of it in my book Managing the Testing Process.) Many test management tools include the ability to establish test traceability and to measure coverage. As you proceed to automate more and more tests, the regression risk coverage metric should at least stay constant, or, better yet, it should increase.
Regression risk coverage (RRC)
regression risks covered RRC =
regression risks identified
Automated regression tests should make regression test execution quicker, too. You should measure the acceleration of regression test execution, as shown in Equation 9. Note that the duration figures for both manual and automated regression testing should include the time required to run all the regression tests, even if you don’t typically run all the regression tests due to time constraints. You want to measure the time savings realized to achieve the same level of regression risk. If some portion of your regression tests remains manual, you should include the time required to run them in the automated regression test duration figure, to keep this metric accurate.
Acceleration of regression testing (ART)
manual regression test duration – automation regression test duration ART =
manual regression test duration
For example, suppose it takes 20 days to run all the regression tests manually. You can now run them overnight with analysis of the results complete on the second day. You have achieved 90% regression test acceleration. That’s quite a gain in efficiency.
Notice that this acceleration not only makes testing more efficient, it allows us to tolerate a higher rate of change without any increase in regression risk. This benefit is critical for teams implementing Agile methodologies. Without good regression test automation, Agile methodologies tend to result in a significant increase in regression risk, and ultimately in regression bugs found in production.
In addition, if we automate carefully, the costs of detection and internal failure mentioned in the previous section should go down. Thus, you can use regression test automation to improve your efficiency without reducing effectiveness. Isn’t that beautiful?
You can take the fourth step towards beautiful testing. You can set objectives and expectations for your testing from an internal point of view. You can establish metrics for effectiveness and efficiency in meeting these objectives, and establish goals for those metrics. Now you have a way of measuring your testing in terms of what allows you to do your job better, quicker, cheaper, and smarter. How beautiful is that?
Don’t stop there, though. Again, consider the elegance element of beauty, and add to it the element of delight. You and your fellow testers should adopt leading-edge techniques that make your test team an example of testing best practices. Beautiful testing means working in a test team that practices—and advances—the state of the art in testing. Beautiful testing raises the standard for all testers. Beautiful testers share what they have learned about testing in articles, books, and training courses, to the delight and enlightenment of their colleagues in the testing community.
In assessments, we sometimes see test teams that know their stakeholders, know the objectives and expectations of those stakeholders, have objectives and expectations to improve their internal processes, and know how to achieve and measure success for all those objectives and expectations. They inculcate an ethic of smart work, elegant work, and delightful work into their testing. They advance the field of testing, and generously share those advances with others. These clients test beautifully, every day.
Testing has many stakeholders. Beautiful testing satisfies those stakeholders. The tester knows the stakeholders and their objectives and expectations for testing. The tester works with the stakeholders to ensure realistic objectives and expectations, and defines metrics to measure effectiveness and efficiency. They take special care with neutral stakeholders and anti-stakeholders.
The tester knows how internal test processes support effectiveness and efficiency, too, and takes steps to improve those over time. Through a concerted focus on delivering ever-improving test services, and continuously improving their testing practices, the tester works effectively, efficiently, and elegantly. Not only is the tester delighted in their own work, but the other stakeholders are delighted, too. Such testers do beautiful work.
In this chapter, I’ve given you some ideas on objectives and metrics for those objectives that will help you make your testing more beautiful. You’ll need to take those ideas further, as these objectives and metrics are just a starting point. I recommend a thorough assessment to kick-off your journey toward beautiful testing, considering these four steps:
- Know your stakeholders
- Know their objectives and expectations for testing
- Establish metrics and targets for stakeholder objectives and expectations (external beauty)
- Establish metrics and targets for testing objectives and expectations (internal beauty)
Once you have a framework in place for achieving beautiful testing, start working towards that. While it won’t happen overnight, you’ll be pleasantly surprised at how quickly these four steps can improve your testing.
About the Author
Rex Black – President and Principal Consultant of RBCS, Inc
With a quarter-century of software and systems engineering experience, Rex specializes in working with clients to ensure complete satisfaction and positive ROI. He is a prolific author, and his popular book, Managing the Testing Process, has sold over 25,000 copies around the world, including Japanese, Chinese, and Indian releases. Rex has also written three other books on testing – Critical Testing Processes, Foundations of Software Testing, and Pragmatic Software Testing – which have also sold thousands of copies, including Hebrew, Indian, Japanese and Russian editions. In addition, he has written numerous articles and papers and has presented at hundreds of conferences and workshops around the world. Rex is the immediate past president of the International Software Testing Qualifications Board and the American Software Testing Qualifications Board.