Why teams lose faith in their functional test suite

In my opinion, there are three main reasons why a functional automation suite loses its value (or the respect that a delivery team should pay it). This leads to a variety of problems, but I will save that for a later post.

The reasons for me are -

1. Non-deterministic tests - run the same test again (without changing the code) and the test gives different results.

This, in my opinion is the least attractive aspect, a true demotivator for a delivery team to believe in its functional test suite. One of the reasons I have seen non-deterministic failures is when the tests run on hardware that’s performance is non-deterministic, for example VMs which share hardware resources, and hence are dependent on how the other VMs on the same host are performing during the test run. Sometimes, tests are non-deterministic when they access external systems which are non-deterministic. Unless testing the robustness of the system under test, stubbing out such external dependencies have worked out well. Another reason is just bad tests, like tests which depend on the time which they are running at (unless functionally driven to), tests which have time-outs which are not well researched and more. Sometimes the tools used to drive tests are non-deterministic (we found a couple of them driving Silverlight tests) which makes it really difficult to believe in the results of the tests you write.
All non-deterministic tests can and should be fixed, but I have seen the effort to fix these being directly proportional to the amount of customer involvement/value to delivery. People also cite them as “random failures” and ignore them. I believe that every developer who calls a program “random” loses respect by a notch everytime they call it so.

2. Lot of failures in a functional suite - if there are 50 out of 100 tests failing, the team’s belief dwindles.

The first question that you ask is how did we end up here? This has a lot to do with the broken window theory. If all tests pass and the next check-in fails some tests, the team immediately works to fix them. If a regular bunch keep failing and are thus ignored, any new failures will become part of the ignored tests. Again, this can be fixed. Reporting and analysing the failed tests, and constant effort improves the state and builds a positive cycle. However, as before, the intent to improve the state depends on how valuable the tests are for delivery/customer.

3. Time taken to run the suite once - feedback comes in very late.

If a single run tests a lot of check-ins, it becomes really hard to analyse/debug the failures. If this is coupled with 1 and 2 above, it results in a total loss of faith, as fixing takes a lot of time and frustration levels go up. So even if this might not be a primary reason, coupled with the ones above, it deteriorates the dependency the delivery team has on automated functional tests to certify a good build. One way to fix performance is to distribute tests to a grid infrastructure, where different nodes are responsible for a subset of tests, thus parallelizing the execution and cutting down run time.

The factors listed above can all be fixed, with investigation and investment. My experience has been that the intent to spend time and money on fixing these issues is directly proportional to how much a customer believes in them, and asks the team to be accountable for. Teams can also increase their own accountability by exposing results and analysis. If, for example, a delivery team publishes test results as part of release notes, or publishes/exposes the results in some other way to customers, this creates awareness as well as increases accountability of its work on the suite. That for me is the optimal solution.

Thanks - to Martin Fowler, Manish Kumar and Chethan V for helping me crystallize my thoughts on this.