What is Testing, and How do We Measure It?
I won’t bombard you with gazillions of testing metrics in this article. Instead, I would like to share my thoughts on what to consider when you want to measure testing. I often see testing metrics misused from one project to another, and wanted to share my take.

First things first. To answer the above question, let’s create a shared understanding of the two keywords contained in the question: (1) Testing, (2) Measurement.

TESTING

To me, software testing is a rich and open-ended intellectual activity. We test our software to reveal risks in our software. A risk is a problem that might happen. A problem is anything that threatens the value of our software. Put a different way, a problem in our software is a deviation from what is perceived and what is desired in our software. So, the purpose of testing is to gain information about the risks in our software. Testing then, is a search for information (Cem Kaner). Testing is about information, not automation. We search for this information to enable other people (e.g. product owners) to make business decisions (e.g. shipping decisions) based on the information provided.

As such, testing cannot be encapsulated into discrete procedural units called test cases (James Bach). It cannot be encoded into test cases, because a test case is just one particular instance of a test. A test case is not the test in the same way a recipe is not cooking. The number of the recipes (test cases) you have doesn’t tell us anything about your cooking (testing) skill (Aaron Hockley).

Similarly, an itinerary is not a trip, sheet music is not a musical performance, and a file of PowerPoint slides is not a conference talk. All of the former are artifacts, they are explicit representations. The latter are human performances (Michael Bolton). Therefore, testing is an activity. It’s a performance, not an act of artifact creation. The artifacts (test cases) may be produced before, during, or after the act of testing (James Bach). Therefore, a tester can test without test cases. Want to dive even deeper? Watch this video to learn more about the purpose and goals of software testing.

Don’t agree? If so, stop here and enjoy the rest of your exciting and hopefully informative “testing” day. May the luck to reveal risks in your software be with you.

MEASURMENT

To gauge the performance or progress of some activity (like testing) we need quantifiable measures called metrics. Metrics enable us to evaluate the success of an activity. Success is defined to be a repeated achievement of operational goals that allows an organization to make progress toward strategic goals.

Software Quality’s Role in Achieving Strategic Goals

Now then, the most important strategic goal of any commercial organization is to generate revenue. One of the ways they can reach this strategic goal is high customer satisfaction, which increases the chance of generating recurring revenue. There are many factors which can contribute to this operational goal of high customer satisfaction, one of which is high software quality. Things get tricky here, since quality is a vague term, and a big reason software development is so challenging. So, let’s define it.

Gerald Weinberg once wisely stated: “Quality is value to some person, at some time, who matters”. This means that quality is inherently subjective. It only exists in relation to people’s perceptions (some person), and these perceptions may vary (some time).

The person in our case is a stakeholder who is materially affected by the implementation of new software or by a change made to existing software. They pay or are willing to pay for our software, and are the people for whom we build the software.

What is built has to reflect the stakeholder’s needs. A need is something that is wanted, important, required, and/or necessary (Ian McCowatt). Hence, quality is value to the stakeholders, and this value is attained by providing a solution to their problems and/or needs.


SIDE NOTE: Satisfying All Stakeholders

A quick side note from James Bach about the subjectivity of quality. Different stakeholders will perceive the same product as having different levels of quality. That’s, by the way, the reason why bugs are not (necessarily) in our software product. Bugs are more about the relationship between the software product and the stakeholders. As such, it’s possible for a bug to be created or resolved just by changing the stakeholders. Therefore, testers must look for different things for different stakeholders.


How can we diversify testing? We must explore the relationship between our software product and its stakeholders. To do so, we must make many social judgments, including judging the importance of testing a specific situation and judging the meaning and importance of potential problems with respect to our stakeholders. Therefore, a total bug identification algorithm would require a complete, unambiguous, and up-to-date specification that is accepted by all stakeholders. Well, when was the last time you saw one of those? I think I know your answer. That’s one reason why testing can’t be encoded in test cases.


In short, the quality of our software product increases when our stakeholders say, “It’s better now, it solves my problem!” or “It’s better now, it enables me to achieve my goals more efficiently!”. Once our stakeholders say that, we can safely say that our software creates value for them, and that our software has a high quality.

How Testing Improves Software Quality

How on earth can you ever measure that!? How can you measure something qualitative (software quality) with something quantitative (metrics)?

Well, hold on. Not so fast, please! Please note that we are talking about testing metrics. Let’s first agree that it is not the job of our testers to figure out what needs to be implemented to satisfy our stakeholders. That is done by our product owners and/or product managers. It’s their job to identify the needs, the problems, the pain points, and anything else that could potentially create value for our stakeholders. These findings are communicated to the development teams, which hopefully get implemented and tested (at the same time!).

It is at this point that testers start contributing to increase (or decrease) the software quality through good (or bad) testing. The reason for this is simple. We unfortunately don’t live in an ideal world with zero bugs, flawless development, and perfect software. So, testers reveal problems in the software which are preventing our stakeholders from achieving their goals. If testers don’t reveal the problems, they cannot be solved, since we can’t solve something we aren’t aware of. Each solved problem is one fewer problem in the software, increasing software quality.

So, testing provides a cause for improving software quality by revealing problems. This hopefully leads to development coding the problem’s solution – and because without a cause there is no effect, testing increases software quality (though indirectly). Period. Please note that I don’t restrict testing to something done only by testers. Anybody who reveals problems eventually increases software quality. My point is that the person responsible for checking in a fix is certainly not the only person who increases software quality. It’s a group effort, the contribution of each individual (e.g. product owner, tester, designer, developer, release engineer) that increases software quality. You may want to read my complete argument to get convinced.

Don’t agree? If so, stop here and enjoy the rest of your exciting and hopefully informative “testing” day. May the luck to reveal problems in your software be with you.

From this perspective, testing can be regarded as an information service that provides information about anything that might threaten the value of our software products. Testers provide the technical information, which feed into shipping decisions. These shipping decisions are business decisions (Michael Bolton) and usually made by product managers and people from operations (e.g. ops engineers). Now, one goal of making these shipping decisions is to gather feedback about the software and use it to increase its quality as quickly as possible, and since these shipping decisions are influenced by the technical information provided by testing, testing is again inherently linked to software quality.

The Question that Testing …… Metrics Should Answer

Now then, when we want to measure testing through metrics, these metrics must enable us to find an answer to the following question.

Do we provide enough information through our testing that enables our organization (e.g. product owners, product managers, developers, operations engineers) to design, implement, and ship high quality software?

The testing metrics we are looking for do not measure software quality directly. They don’t need to, since as we discussed above, it is the product owners and/or product managers to identify and measure software quality, and how it is creating value for our stakeholders.
We need to align our testing metrics with our business goals. This gives our testing metrics a context. The context of our testing metrics is software quality. Knowing this is crucial, since measuring testing for its own sake doesn’t make sense and never has. Every single testing metric must be able to stimulate a discussion that is centered around the question above. It must enable us testers to start an informed discussion, a discussion about what we need to change in our testing process and/or our testing approach to optimize testing. By “optimizing testing” I simply mean finding ways that enable us to reveal risks related to every software quality attribute faster and more frequently. So, let’s refine the question above.

Does this testing metric provide enough information to help us optimize our testing so that our organization is able to increase software quality faster and more frequently?

One more thing. Please do not assume that complex (and complicated) things (e.g. risks) about our software products can be unproblematically summarized with a bunch of numbers (James Bach). Moreover, you shouldn’t assume that you can entirely summarize the good and the bad of your testing activities with these metrics. Rather, use your metrics to kick-off a thoughtful discussion. Use them as a basis to create a shared understanding of the risks in our software products, since not everything that counts can be counted, and not everything that be counted counts (Albert Einstein). Your metrics quantify the “knowns” because you can’t measure “unknowns” directly, but you can get to the “unknowns” indirectly.

The Path to Concrete Actions

Once you measure something through some metrics, you will be asked to figure out what’s going on behind these metrics. These discussions will provide at least a hint of where to find the “unknowns”. The reason for this is that quality is a function of precise thought and reflection. That’s the magic. Techniques that reinforce that discipline invariably increase quality (Michael Feathers). A technique to foster these discussions are regular (e.g. weekly, bi-weekly) retrospectives, where you look back and reflect on your metrics as a (testing) team. This will give you insights in whatever you want to measure through your metrics.

These insights should lead to concrete action items, and these actions items eventually allow you to optimize your testing. Hence, your metrics must provide actionable insights. In doing so, we also define hypotheses, like “How will these actions impact our testing, and how can we see the impact in our metrics?”. This will give you more data-driven decisions, and less opinion-driven, ego-driven, or authority-driven decisions.

Curves and Testing MetricsSpeaking of ego, you should be careful to avoid so-called vanity metrics. Vanity metrics are metrics that make you feel good without telling you anything about your testing. So, vanity metrics are good for the ego, but bad for action (Eric Ries). So, if it turns out that the metrics you are looking at aren’t useful in helping you to find an answer to the question above, or you can’t map any actions to these metrics, then you should stop looking at them. The key phrase here is: “If it works, keep it. Otherwise, dump it.”

So, in these regular retrospectives (aka post-mortems), decide what you want to start measuring, stop measuring, and continue measuring. For each metric we ask the two simple following questions in these regular meetings: (1) What did we learn from this metric? (2) What do we have to change to learn more? That’s our simple strategy to improve continuously. Our mantra is “think it, measure it, analyze it, tweak it”. The awful truth is that most people don’t do that, since it’s easy to collect data, but it’s hard to interpret the data – and it’s even harder to make concrete decisions based on the data.

Furthermore, please never forget that there are no facts, only interpretations.

Look for the Vital Signs: Setting Up KPIs

Don’t just think in terms of metrics, think in terms of key performance indicators (KPIs). The reason for that is explained by Kayla Grigg: Not every metric you measure can be “key”, since if all your metrics are all special, none of them are. So, it’s up to you to select metrics that are most closely aligned with your critical testing objectives. This subgroup of metrics then become your KPIs. One easy way to think of KPIs is to think of them as vital signs that center your focus on the things that matter most to keep your daily testing business alive and well.

Just because your KPIs are your most important metrics doesn’t mean that the rest of your metrics are useless. When one of your KPI vital signs goes haywire, you need to be able to look at your other metrics to properly diagnose the underlying problem. Your KPIs should tell you that there is a problem, while your other metrics help you to understand the problem. So, be aware that there is a difference between metrics and KPIs, and so please don’t use them interchangeably. So, my advice (so far) is as follows.

Don’t hide behind your metrics. Challenge the meaningfulness, necessity, and significance of your metrics constantly. Be aware that your metrics are only one piece of the puzzle. Know that they are misleading unless you look further into their details and context. Look beyond your metrics by looking for the truths behind them. Make your metrics actionable. Avoid vanity metrics. Cross-validate your metrics. Remember that good metrics are metrics that help you to make decisions to ultimately optimize your testing. Select KPIs, and select them carefully. Don’t wait for some general purpose artificial intelligence like Skynet to take over the task of interpreting your metrics. Do it yourself and do it now.

Now, stop here, walk through your testing metrics while being honest with yourself, and reflect on what you have read so far.

Welcome back! Well, I am (almost) certain that most of you went through metrics that are either about artifacts (e.g. How many bugs were found by which test cases last week?) or about human performances (e.g. How long does it take a tester to test a requirement on average?). Let’s call these metrics test-related metrics. For a great list of these metrics, see this blog.

Capturing the Human Dimension of Testing

These metrics are important, that’s beyond question. We can learn so much from them to optimize our daily testing routines, but you shouldn’t stop there. These metrics are necessary, but they are insufficient. What about the most important and fundamental ingredient of software testing? What about the human testers!? Their thought processes and their social interactions can also help to optimize your testing, thereby increasing software quality. Think of it this way: Behind every single test case there is a test idea that comes from a human tester. So, if your test idea sucks, then your test case sucks.

I am absolutely convinced of this, and here is why.

As stated above “Testing is an activity. It’s a performance, not an act of artifact creation”. This, according to Michael Bolton, means that testing is not so much a thing you do, but rather, a way you think. So, testing can’t be reduced to an act of artifact creation or to a body of mechanical checks. Testing is much more than conservative confirmatory testing, where you evaluate your product by applying algorithmic decision rules to specific observations of your product (Rich Rogers).

In simple terms, testing is far more than just answering the question “Does this assertion pass or fail?”. That’s exactly what you do through confirmatory testing, which is also called checking. That’s what machines do when they execute automated test cases. That’s what (most) testers do during repetitive manual test case execution. Well, testing is obviously more than just that. Testing is more about the process of answering the question “Is there a problem here?”. That’s a subtle wording difference, but it asks for an entirely different mindset.

The process of answering this question, this kind of decision making, requires the application of a variety of human activities, such as questioning, studying, modelling, inference, social judgement, and a truly long et cetera. So, good testers do good testing through good test ideas. Now then, what does good testing mean?

James Bach put it this way: Good testing requires good testers, and good testers have the ability to…

  • Interpret what they find
  • Draw the right conclusions quickly
  • Think critically about what they know
  • Know they will never know everything
  • Keep thinking about what they already know< /li>
  • Focus on things they don’t know
  • Target specific issues without losing focus
  • Recognize and manage bias
  • Form and test conjectures
  • Jump on conjectures (not on conclusions)
  • Reflect on their decisions
  • Consider alternatives
  • Solidify important concepts
  • Scrutinize illusions they hold as true
  • Analyze someone else’s thinking
  • Reason about cause and effect
  • Decide about their future learning pathways
  • Learn how to learn, et cetera.

This list could go on forever. In short, good testers are cautious, curious and critical. This is what makes testers hard to fool, and that’s important, since your product is always trying to fool you in one way or another.

In other words, good testers are the Peter Pans of the human race, since they never lose their curiosity to find anything that might threaten the value of our software. Considering the extensive list of qualities a good tester should possess, we must not ignore the human issues at risk (Cem Kaner).

Phew! How on earth can you measure that quantitively!? Are there any human-related metrics out there? Honestly, I simply don’t know. Opinions differ on this question. But is it even the right question to ask? Well, according to Peter Drucker it is, since he once stated: “If you can’t measure it, you can’t improve it”. True! But please note that this statement doesn’t distinguish between direct and indirect measurements. So, even in the cases where we can’t measure skills such as curiosity, critical thinking, and cautiousness directly, we can indirectly try to understand what we can learn about our skills from the test-related metrics. So, qualitative research is the keyword. You do qualitative research when you want to understand something. You do quantitative research to inform that understanding. So, really good quantitative research should not only answer but also prompt qualitative research (Michael Bolton). In other words, your test-related metrics should not only answer but also prompt questions about the presence or absence of your testing skills.

That’s it. Hope that resonates with you.

Ingo Philipp

Ingo Philipp

Distinguished Evangelist at Tricentis

Ingo Philipp, Distinguished Evangelist at Tricentis, champions the methodologies and technologies at the core of the company’s continuous testing solution. In his previous position as a senior product manager, he orchestrated product development and product marketing. Before that, he worked as a theoretical astrophysicist in the field of high-energy particle physics and computational fluid dynamics. He holds a Master of Science degree.