At STPCon Miami last week, I organized a test competition as a double-track session. My friends Andy Tinkham
and Scott Barber
assisted out as judges. Petteri Lyytinen
wrote some software to test, and Matt Johnston
of uTest provided a reference website to test that had just had a rewrite -- that will be important later.
It is also important to note that, while this may be a new idea at commercial test conferences, it is certainly not my idea; James Bach organized a test competition at CAST
, the Conference for the Association for Software Testing, in 2011, which to a great extent inspired this format.
Here are the final scores:
A Little Bit About the Rules
To begin the contest, we handed out a single sheet of paper with the rules of engagement. Teams had two and a half hours to test four different websites. The final judged deliverables were the bug reports and a final test status report. At the beginning of the contest the judges announced three major prizes: Best bugs submitted, best status report, and best overall score. The program brochure made it clear that best overall score would include interacting with the judges and team coordination. At 5:30, we stopped the clock in order to hold a fifteen minute retrospective.
The first thing I noticed was the three uTesters in the room. When I asked them to sign up, they were reluctant. "We are in marketing", they explained, "we are here to blog the proceedings, not participate."
In order to understand what was happening, they had to test, and once they were testing ... they were hooked. Granted, testing is just part of the culture at uTest, but if we can get a marketer to test, then perhaps we can get many people to test come crunch time.
The second thing you'll notice is that the uTest scores weren't terrible -- and I think I know why.
The uTest folks might not know much about software testing, but they were aware
of that weekness. So they asked for guidance
and talked together
to try to figure it out. This gave them points for interacting with the judges (the "product owners") and for teamwork and strategy. Another thing the uTestes did know, which again, is probably cultural, is how to write a reproducible bug report under time and information pressure.
The need for guidance was an explicit, planned part of the competition.
We handed the teams software and said "test this", then told them we were product owners.
Our intent was to simulating the majority of the projects we, the judges, had worked on. And, just like the majority of the projects I have worked on, we (the judges) expected certain things from the teams, but they had to ask.
Team uTest was the only team that asked what we wanted to see in a status report.
This turned out to be very, very important.
It All Came Down To One Word
If you look at the scores across the board, team Lemon PoppySeed Chicken (LPC) is in the the lead. When we periodically looked up and observed, we saw the team talking to each other; when we asked them which websites they tested, they had specific reasons. LPC was also one of two teams that actually asked the judges which of the websites were more important and valuable to test. The bugs they reported were more important to the decision makers, and Andy was able to reproduce more of their defects than any other team.
It all came down to the status report.
Remember, uTest was the only team that asked what the status report should look like.
Team LPC produced a list of bugs
. Personally, I would not call that a status report
; it is more like a bug report
As product owners, our goal was to figure out how close we were to shipping, and, if we had work to do, to use the status report to triage work and assign fixing resources. With the LPC report, we were overwhelmed. Was it good enough? Could we ship? What bugs were critical? We did not know
This is where team Highlander shined. Highlander delivered a test status report that gave high-level insight to decision makers, which is what we were looking fro. This was probably a result of habit (more about that
later) than insight; team Highlander didn't ask for direction either.
The conditions of the test competition were a little different than your classic project. For example, when we are back home, we generally have some idea of project context. We are in
the status meetings and standups, we talk
to the developers and decision makers, and we know what matters
when it comes time to ship. Thus we develop habits of "just testing" the software.
When it comes time for a test competition, we do not "rise to the occasion"; instead we fall back on our habits and training. This worked for team highlander; I think it is what tripped up LPC, Red Square, and Team Dragon.
Dragon's case was interesting; they found a large number of Android device defects. At least one member of the team tests native Android apps for a living, so it makes sense, for him, to test Android out of habit. If the team had asked, we'd have suggested our main concern was not mobile devices, but instead browser compatibility for popular browsers on laptops and desktops.
One of the products dragon tested is generally fit for use on the desktop, but has significant problems under Android. This led to a dichotomy on the final status report, where the testers wrote this literal text: "Due to the good quality of the website we think the website has yellow status."
If I was a product owner, I'm not sure what I would do with that.
The biggest lesson I took away from this is project context; that doing things that might make a team radically successful in one environment might fail in another. The project itself was probably closest to a uTest project, which, combined with their asking for help, may explain why team uTest did so well. Yes, Matt Johnston was on the team, and he did provide me with the software, but it had just had a major site redesign, and the members of the team clearly had not been testing it before. I chose to believe them and let them compete. If they had scored highly, we would have had an awkward conversation -- but it turned out not to be the case.
Finally there were some struggles with the test report, and that makes sense. Most of the teams I work with lately are shipping often; they don't have a huge work-in-progress inventory that would justify a formal report. Instead, the team discusses the known issues at a standup, in person. Still, writing a status report is a valuable skill to have, something James observed at his testing competition in 2011. I am interested in simulating standup meetings in future competitions, but I'm still working on it; we have room to grow.
Out of a possible 35 points, nine points were all that separated the top team from the bottom.
So in one sense, the competition was close.
If you look a little more closely, you'll see that scores tend to average overall, but some teams did much better or worse in a given category. This points me not to equality, but to a great variety of expertise. You see this when you dig into each team.
Team LPC did extremely well in execution, but struggled with the test report. Highlander rocked the test report and also collaborated well, but we weren't excited about the bug reports they produced. Red Square found important bugs we cared about, but seemed to struggle with coordination. Team Dragon was the only team to decide a bug reporting format up front, but they fell back on habits of testing -- they did not clarify the mission of the test.
(I do not mean to be overly critical above; every team did well. It is just that by writing about this, there is an opportunity for people who did not attend the event to learn something.)
The next morning, Rich Hand, director of membership for STP, presented a $25 gift certificate to team Nicole Rivera
and Karl Kell
, who formed team LPC. Team Highlander, which included Brian Gerhardt
of Liquidnet and Joseph Ours
Keep in mind, the goal of the test competition was to learn things while having fun. I believe that happened, mostly because, except for uTest, teams consistent of mixed skill people from different companies. There was a sort of cross-pollination that was a pleasure to watch.
Still, there was potential for more. If we had run the event as multiple rounds, we could have run a mid-competition retrospective to discuss what makes a good reproducible bug report, how to get into the mind of the product owner, and so on, and, hopefully, see teams improve in round two and three.
I am seriously considering running a web-based test competition in December, 2012. Here's how it would work: A week in advance I would create a chat room or skype id and broadcast it, along with the time of the competition. At the exact right time, probably noon Eastern USA, a blog post appears here with rules and how to submit bugs. Three hours later, the test competition closes, and we start judging.
Of course there are timing problems, logistics problems, judging problems, and so on. I believe those are all solvable
If you'd like to be involved, as a judge, volunteer, or to assemble a team (of one to six people, on or off site) drop me a note. We may just do this.
Otherwise, STPCon Spring 2013 will be in San Diego, California in April. You can probably guess the type of session I just proposed ... :-)