Skip navigation

Over the years, there has been much discussion and argument over which of several competing methodologies is the best one for verification. Competing papers have been written arguing opposite sides of this question, even to the point of each including data “proving” their side.

As a case study, let’s take the example of random vs. directed testing. A lot has been published over the last twenty years comparing these two methods. Some papers state that random finds bugs more efficiently than directed. Same state the opposite. No conclusion has emerged from all the conflicting data. Why is this? Is there any way to resolve this?

Using the framework of the the three laws of verification, it is possible to synthesize a consistent explanation for all the conflicting data. First, to recapitulate:

1st law: the bug rate is high at the beginning of the verification effort, low at the end

2nd law: bug hardness is subjective.

3rd law: bug hardness is independent of when bugs are found during the verification process.

Let’s assume that random and directed testing will each take six months effort and each will find 90% of bugs.

Suppose we do random testing first. We spend six months developing a random testbench, running it and finding and fixing 90% of the bugs. If we now spend six months developing directed tests, the best we can accomplish is to find 90% of the remaining 10%, or 9% of the bugs. But, the amount of effort was the same. Conclusion, random is much more efficient than directed testing.

Now suppose we did directed testing first, followed by random. Now, directed testing would find 90% of the bugs with six months effort and random would find only 9% with the same effort. Conclusion, directed testing is more efficient.

Thus, measured efficiency is a function of the order that verification methodologies are applied. Whichever is done first will look the most efficient. This is a direct result of the first law of verification. Without context on how these methods are applied, there is no way to judge the conclusions presented.

Another issue is that papers often conclude with a statement like “…and we found a bug that wasn’t found before”, with the implication that this method is better than the previous methods. We can also examine this in the context of the three laws of verification.

The best verification methodology is one that combines as many orthogonal methods as possible. Methods are orthogonal if bugs that are hard to find in one method are easy in the other (second law), or bugs that would be found later in one method are found earlier in the other(third law).

Consequently, orthogonality is the best metric to use when comparing methods. The fact that one method finds a bug that was missed by another method is an indication that the two methods are orthogonal. But, that is not the whole story. We need to take into account the scope of the different methods to determine which is actually better.

People often infer from statements such as, “we found a bug that was missed”, that the method would have found all bugs that were previously also and, therefore, this method can replace the old method. This is rarely the case. In the worst case, it could be that this method could only find this one bug.

To judge fairly two methods, it is necessary to do two experiments. Let’s say we want to evaluate whether to replace method A with method B. First, we do method A followed by method B. Let’s say method A finds 100 bugs and method B finds 10 bugs. Method B looks good because it found bugs that were missed by method A, which, because its our current method, we believe to be thorough.

Second, we try method B first followed by method A. If method B finds 20 bugs and method A finds 90 bugs which method B missed, we would say that method A is vastly superior even though the first experiment showed that method B had promise. If instead, method B found 90 bugs and method A found ten bugs, then you could claim they are equal (assuming equal effort). Only if method B found more than 90 bugs when done first could any claim of it being better be made. This is a very tall order, which is why most of the verification papers written with results claiming to have found bugs that were missed generally don’t have much impact.

So, what have we learned? First, lacking context, claims of one method being more effective/efficient or better are meaningless. Second, rather than focusing on which methods are best, focus on which methods are most orthogonal. Methodologies often reflect mindset. Orthogonal methodologies force different mindsets such that  simple bugs don’t slip through, which is the key to ensuring the highest probability of success.

Advertisements

One Comment

  1. Not sure that orthogonal verification methods are the best term to use. Surely two methods are not really orthogonal if they are both using dynamic simulation because then simulation would be the common factor.
    I prefer the expression “Defense in Depth”. That refers to the different lines of defense in ‘war’. If the ennemy (the bug) would get through the first line of defense, then the second line of defense can still catch it.


One Trackback/Pingback

  1. […] is to choose those that are orthogonal. I also talked about how to decide whether a new method is worth doing or not. In this post, I will compare two methodologies, simulation and formal, in more detail […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: