Skip navigation

Tag Archives: verification

While researching the recent P!=NP paper and the activity surrounding it, I found that there are claims of P!=NP or the inverse being made all the time. One of the researchers posted a list of rules they use to quickly filter out bogus claims. Among these were some amusing ones (the paper was not created using LaTex), and some quite simple and powerful ones (the proof does not exclude known counterexamples). Deolalikar’s paper managed to pass most of these rules, which is one of the reasons it was taken seriously.

I recently read a blog post about verification with some claims that I find quite dubious. This gave me the idea of posting some “rules” for reading papers/articles/blog posts on verification that I think are useful in judging the validity of these claims. Note also that I have addressed most of these rules in more detail in previous posts.

  1. The most frequent claim I see are those involving the coming  “verification bottleneck”. The claim is invariable that, today verification is 70% of the effort (an oft-quoted, but unproven statistic), and tomorrow, because chips will have twice as many transistors, will be much higher.

    What seems to get lost in these claims is that the same claim could have been made two years ago for the current generation, and the effort for the previous generation was still 70% (the 70% number has been around for quite a long time). In fact, going back to the dawn of the RTL era, when there was a dramatic shift in methodology,  one could have made the same claim for the last ten generations of chips. This is usually ignored and no reason is given for why the 11th generation is somehow profoundly different than the last ten such that we need to buy these new tools (which is, inevitably, the other shoe falling in these articles).

    In short, be skeptical of claims of future bottlenecks that ignore the fact that the jump from this generation to the next is no different from the jump over the previous ten generations.

  2. Be skeptical of claims uses the term “verification bottleneck”. Do a Google search on “verification bottleneck” and you will find all the results are either research papers or vendors selling trying to sell a new product.
  3. Be skeptical of  claims using math. You will sometimes see something like, verifying N gates today requires 2^N effort and tomorrow there will be double the number of gates, means it will take 2^(2N) effort. Therefore, effort is exponentially increasing and so buy our tool which greatly increases capacity. This is basically a variant of rule 1, but, in general, if you see math, be skeptical.
  4. It is common (and almost a condition of acceptance) that verification research papers always conclude with something like “…and a bug was found that was missed by simulation” or something along these lines. Remember, bugs are easy. An algorithm that found new bugs does not mean that this algorithm is better than any existing algorithm.

    The best way to read verification papers is to ignore the fact that bugs may have been found. If you still find value in the paper even if you make the assumption that it found no bugs, then it probably does have some value.

  5. The last rule comes from the premise that bugs are easy, or as I like to call it, the “pick axe and shovel” argument. This is usually in relation to some claim about better results (more bugs) being found on some, usually large-scale, verification effort. The better results are usually used to justify the proposed methodology as being superior to other methodologies.

    The issue here is that the project usually involved a large number of well-coordinated, smart people. The counter-argument to this is, I could take the same smart people and, using nothing more than pick axes and shovels, find lots of bugs. Why does this prove that your methodology is better?

What prompted to write this post was this post by Harry Foster. This is posted on Mentor’s website, so Harry is obviously pitching the company party line. I won’t bother critiquing the post here. I suggest reading Harry’s post after reviewing the rules above. Let me know what you find.

I was listening to a talk by Daniel Kroening, a software verification researcher at Oxford University , who was explaining the certification process for safety-critical software. He mentioned that one of the requirements is that all test cases must be verified on the actual hardware. Now, in the case of avionics software, that means one test flight for every test case. Since test flights are expensive, optimizing the number of test cases required to cover everything  is extremely important.

This is one of these cases where you get a dot and you connect them with other seemingly random dots and see that you have a line. In this case, what I realized is that the difficulty in verification is not really finding bugs (bugs are easy, right?), but in how efficiently we find these bugs. Recently I posted on how constrained random testing is essentially a (hard) optimization problem. I also posted on the best verification methodology being to combine orthogonal methodologies in order to optimize bug finding productivity. The criticality of optimizing safety-critical test cases was another data point  that led me to this  realization.

This is reflected in the fact that many of the most successful verification tools introduced over the last twenty years have succeeded by optimizing verification productivity. As we all know, faster simulation really does very little to improve the quality of a design, but it helps enormously in improving verification productivity. Hardware verification languages are probably the second most important development in verification in the last twenty years. But again, they don’t improve quality, they simply improve productivity in developing verification environments.

This is not to say that there have not been tools that improve quality. Formal proof clearly improves quality when it can be applied, although semi-formal verification, which focuses on bug hunting is more of a productivity increase. In-circuit emulation allows you to find bugs that could not be found in simulation due to being able to run on the real hardware. However, emulation used simply as faster simulation is really just a productivity increase.

Is verification optimization related to the well discussed verification bottleneck (you know, the old saw about verification consuming 70% of the effort)?  Verification became a bottleneck when the  methodology changed from being done predominantly post-silicon to being done predominately pre-silicon. Many people saw the resulting dramatic increase in verification effort as being correlated to increased design size and complexity. If this were true, then verification would consume 98% of the effort today since this switch occurred twenty years ago and there have been many generations of products since then. Since relative verification effort has not changed significantly over the last twenty years, I think it is safe to say that verification effort is constant with increased design size and that differences in relative effort reflect differences in methodology more than anything.

The real question is: will verification optimization become more important in the future as designs become larger and will that result in relative verification effort rising? If there is no change to design methodology, we would expect verification optimization effort to remain constant.  If high-level synthesis allows us to move up the abstraction ladder, this should improve the ability to optimize verification. In short, there does not seem to be a looming crisis in overall verification optimization.

However, if we look at the software side we see that software content on hardware platforms is growing rapidly which is putting enormous pressure on the ability to verify this software. Effectively, we have managed to forestall the hardware verification optimization crisis by moving it to software

When I started Nusym eight years ago, it was my feeling that formal verification would succeed only when it didn’t look like formal verification.  Nusym was founded with the vision of using formal techniques under the covers of a standard simulation environment to extract value from the wealth of information provided by the simulation environment.

Talking about this strategy in a response to Olivier Coudert’s blog post on formal verification, I wrote:

I think this use model will continue to grow, while formal verification, the product, will continue to whither. I predict that in 10-15 years there will be no formal verification products, but most, if not all, verification solutions at that time will incorporate formal technologies under the covers.

At this year’s DAC, however,  it was apparent that this trend is happening now, rather than 10-15 years from now. There were a number of new verification startups and products announced this year, including:

  • Avery Insight – X propagation and DFT
  • Jasper ActiveDesign – design exploration and debug
  • NextOp – assertion synthesis
  • Vennsa – automated debugging

Notice that none of these products claims to be a formal verification product. But look  who created these products: formal verification Ph.D.s. Do you doubt that these products rely heavily on formal verification techniques under the covers?

The only exception to this, ironically, appears to be my new employer, Jasper, whose JasperGold formal verification product has been seen huge growth over the last year.  They have found that  the recipe to success for formal verification is to buffer the complexity of it by using methodology and services. Also, Jasper’s latest product, ActiveDesign,  a tool that derives simulation traces for a specified design target, is not specifically intended to be a verification tool, except of course, it uses the same formal verification technology under the covers as JasperGold.

Now, there are still traditional formal verification products out there. Cadence IFV, Mentor 0-in, and Synopsys Magellan are still around. I may have missed it, but I didn’t see these tools making any a big impact at DAC this year. However, I suppose we must recognize that they still exist and that it may just take 10-15 years for them to die off.

Constrained random testing is the predominant method used to verify chips today. It has two advantages over directed testing. First, randomness allows testing things not explicitly thought of by the verification engineer or designer. Second, automated random testing is more productive than manual directed testing in terms of number of test vectors produced as a function of human effort put in.

There are several variants on the basic concept of random testing. Unconstrained random testing generates random values for all input variables without any consideration of legality. Constrained random testing is used when the legal set of input values is a subset of the entire input space. In the last dozen years, hardware verification languages (HVLs) have enabled the use of constraint-based random testing. Constraint-based random testing uses static constraints and constraint solvers to do constrained random testing.

Constraint-based random testing improves productivity in writing constrained random testbenches because it allows writing tests at a higher level of abstraction. Static constraints specify the legal input space, but not how to randomize values; this is left up to the solver. As designs/protocols have become more complex, constraint-based random testing’s productivity gain has increased. As a result, there is a proliferation of large, complex constraint sets in today’s testbenches.

Increasingly, however, solving these large, complex constraint sets has become a bottleneck in simulation due to the large amount of time spent solving these constraint sets. Consequently, it is common that a lot of time is spent debugging and trying to workaround these solving bottlenecks. In the worst case, it is necessary to discard the static constraints and rewrite the randomizer using imperative code, negating the productivity gains of using static constraints.

The fundamental cause of this problem is the fact that uniformly randomizing across a constrained input space is a NP-hard problem. As we have seen before, NP-hard problems are optimization problems. Uniformity of randomization is the optimization goal. For example, suppose we have a simple constraint: 0 <= X <= 10. If we randomize X a number of times and each time it returns the value 0, this meets the constraint, but is not very uniform. Randomization that produces the value 0 to 9 with equal probability is optimal. Because NP-hard problems have the additional burden of requiring an optimal value, not just a satisfying value, they are generally harder to solve than NP-complete problems.

What are solutions to this problem?

The obvious solution is to build better solver/randomizers. Companies that produce HVLs already spend a large amount of effort trying to improve their solvers. If it were that easy, it would have been done already.

Barring better solvers, the only other choice is to change methodologies. One change is to ditch static constraints and go back to writing imperative code with the ensuing reduction in productivity. Not a pleasant thought to contemplate.

An alternative to this is keep the static constraints, but reduce the complexity from NP-hard to NP-complete. This means dumping uniform distribution for, potentially, very poor, but still legal, randomization. This is feasible considering that the real goal of random testing is to find bugs and improve coverage. We believe the uniform distribution is required to do this. However, if the randomizer knew precisely what values to generate in order to find a bug or hit a coverage point, it could just generate these values, which may or may not be uniform. If it were possible to do this, randomization would go from being an NP-hard problem to being a NP-complete problem. Technologies such as Nusym’s path tracing technology allow this.

One thing I think would help to implement this strategy would be to provide an open solver API in today’s HVLs. This would allow plugging in different solvers for different circumstances and developing custom solvers on a case-by-case basis rather than being forced to have a one-size-fits-all solver.

We have seen in previous posts that design and verification problems fall in the class of computationally complex hard problems. That is, in the worst case, it takes O(2^N) time to optimize/verify a design of size N, where N could represent the number of binary inputs. If we run into this limit, our ability to design large systems is severely limited.

Given these limits, then, how it is possible that we can routinely create designs consisting of millions of lines of code or gates? More importantly, how we can continue to design increasingly larger, more complex designs?

One answer is that we can’t. We can’t find all the bugs in complex designs, but we can hope that there are no critical bugs. We can’t perfectly optimize a design, but we can do a good enough job. Even then, exponential complexity should be a brick wall to increasing complexity, but is not. Somehow, we are able to avoid these limits.

One way to look at this is to consider the way human brains are built and work in creating complex designs. The jumping off point for understanding this is Shannon’s counting argument.

Claude Shannon was one of the most important computer scientists of the twentieth century. He is most famous for his work on information theory. Shannon’s counting argument is one of his less well known results. Shannon was trying to answer the question, what is the minimum number of gates (say two-input NAND gates) needed to be able to implement any function of N binary inputs? The basic answer is that there must exist functions that require an exponential number of gates to implement. The proof goes by counting the number of possible functions of N inputs and counting the number of functions implementable with T gates and showing that T is o(2^N). The argument is as follows:

  • the number of functions of N inputs is 2^(2^N). For example, a truth table for N=4 would have 16 entries. The number of functions, therefore is 2^16=16,384.
  • The number of functions of N inputs that can be implemented with T 2-input gates is (N+T)^(2T). Each gate input can connect to the output of any other gate or an input for a total of (N+T) possible connections and there are 2T gate inputs.
  • To implement any possible function, then, we need (N+T)^2T >= 2^(2^N).
  • Solving for T shows that T >= 2^(N-d) for some small delta d, which implies that T = o(2^N).

What is the significance of this? Today’s chips have as many as 1000 pins. What is the minimum number of gates required on the chip that would allow us to implement any function of a thousand inputs? Is, say, one million gates sufficient? Off the top of your head, you probably would say no. What about a billion? Our preconceived notion is that this should be sufficient. After all, that is right at the limit of today’s most advanced chips and it is inconceivable that we couldn’t implement what we want with that many gates.

Now lets look at what the counting argument has to say about this. The total number of possible functions is 2^(2^1000), which is a vast number, roughly 10^(10^300). A billion 2-input NAND gates would allow us to implement roughly 10^(10^10) different functions with a thousand inputs. This is also a vast number, but is infinitesimal compared to the number of possible functions. The fact is, we are limited to a very small number of functions that can be implemented!

And this picture doesn’t fundamentally change even if we assume we have a trillion gates at our disposal. So, how is it we believe that we can implement whatever functionality we need? To answer this, let’s work backward from the number of gates available and determine the maximum number of inputs for which we can implement any arbitrary function. Let’s choose the most complex system we have at our disposal, the human brain.

The human brain contains on the order of a trillion neurons, each with roughly a thousand connections. Assuming all neurons can be connected to any other neuron (which is not the case), we come up with an upper bound on the number of functions that can be implemented: (10^10)^(1000 * 10^10) = 10^(10^14) = 2^(2^40). In the best case, therefore, the human brain could implement any function of just 40 binary inputs.

Now we can do a thought experiment. What is the largest size truth table that you can mentally make sense of at one time? Two bits is easy. Three bits not so hard. Four bits is difficult to visualize as a truth table, but using visual aids such as Karnaugh maps makes it possible.

Another test would be to look at the placement of arbitrarily placed objects on a plane. Given a glance, how accurately could you replicate the placement of all objects. If there were only one object, I could envision determining placement with an accuracy of 5-6 bits in both X and Y dimensions. More objects with less accuracy. Beyond about seven objects , it is is difficult to even remember exactly how many objects there are. Maybe 10-12 bits accuracy overall is possible.

The bottom line is the human brain has a very low capacity for consciously grasping an arbitrary function. Let’s say its no more that 10 bits in the best case. From the counting argument, we can determine that it requires slightly more than 500 gates to implement any arbitrary function of 10 inputs. With a million gates on a chip, we could put 20000 such functions on the chip.

Now we treat each of these 500 gate blocks as a unit and then consider how to hook these 20000 blocks together to perform some function. This is certainly possible using abstraction. In this case it would probably require two or more levels of abstraction to be able to deal with 20000 blocks.

Abstraction is the essence by which we are able to design very large systems.  The need for abstraction is driven by the limitations on our brain in dealing with large functions imposed by Shannon’s counting argument. And it is the structure imposed by multiple layers of abstraction that makes intractable design and verification problems become tractable.

The computational complexity class, NP-hard, is at the core of a number of problems we encounter on a daily basis, from loading the dishwasher (how do I get all these pots to fit?) to packing a car for a vacation, to putting together a child’s train tracks.

If we look at these things, they have several things in common. First, they each involve a potentially large number of parts (pots, luggage, pieces of track) that need to be put together in some way. Second, we want to meet some objective, such as fitting all the dishes in the dishwasher. Third, there are a large number of constraints that must be met. In the case of loading the dishwasher, no two dishes can be put in the same place. There are N^2 constraints just to specify this, among many others. A fourth characteristic is that we may get close to an optimal solution, but find it difficult and not obvious how to get to a more optimal one (just how are we going to fit that last pot in the dishwasher). Furthermore, getting from a near optimal solution to an optimal one may involve a complete rearrangement of all the pieces.

One way to solve problems like packing a dishwasher is to view it as a truth table. Each dish can be put in one of, say, 100 slots, in, say, one of ten different orientations. This results in 1000 combinations, requiring 10 bits. If there are 40 dishes, 4000 bits are required to represent all possible configurations of dishes in the dishwasher. The resulting truth table is vast. Each entry in the table indicates how much space is left in the dishwasher if dishes are put in according to the configuration of that entry. A negative number indicates an infeasible solution. There will be many invalid configurations which have two or more dishes occupying the same location. We give all of these entries a large negative number.

The resulting table describes a landscape that is mostly flat with hills sparsely scattered throughout. We can also imagine that this landscape is an ocean in which negative values are under water and positive values represent islands in the ocean. The goal is to find the highest island in the ocean. We start in some random location in the ocean and start searching. We may find an island quickly, but it may not be the highest one. Given the vastness of the ocean, it is understandable why it can take a very long time to find a solution.

But, wait a minute. What about polynomial algorithms like sorting? A truth table can be constructed for these also. For example, to sort 256 elements, we can create 8 bit variables for each element to describe the position of that element in the sorted list. The value of each entry would indicate the number of sorted elements for that configuration. The complete table would again be around 4000 bits and have vast numbers of infeasible solutions in which two or more elements occupy the same slot in the list and only one satisfying solution. Yet, we know finding a solution is easy. Why is this?

The ocean corresponding to the sorting problem is highly regular. If we are put down in an arbitrary point in the ocean, we can immediately determine where to go just be examining the current truth table entry (point in the ocean). Knowing the structure, we may be able to determine from this that we need to go, say, northeast for 1000 miles. We may have to do this some number (but polynomial) times before getting to the solution, but is guaranteed to get to the solution. Structure in a problem allows us to eliminate large parts of the search space efficiently.

In contrast, for an NP-hard problem, there is no guarantee of structure. Furthermore, as we are sailing around this ocean, we are doing so in a thick fog such that we can only see what is immediately around us. We could sail right by an island and not even know it. Given this, it is easy to see that it could take an exponential amount of time to find a solution.

But then, how do we account for the fact that, often, NP-hard problems are tractable? The answer to this question is that there usually is some amount of structure in most problems. We can use heuristics to look for certain patterns. If we find these patterns, then this gives guidance similar to the sorting example above. The problem is that different designs have different patterns and there is no one heuristic that works in all cases. Tools that deal with NP-hard problems usually use many heuristics. The trouble is that, the more heuristics there are, the slower the search. At each step, each of the heuristics needs to be invoked until a pattern match is found. In the worst case, no pattern match will be found meaning it will take an exponential time to do the search, but the search will be much slower due to the overhead of invoking the heuristics at each step.

I hope this gives some intuition into NP-hard problems. In future posts I will talk about even harder classes of problem.

Abstraction: the suppression of irrelevant detail

Abstraction is the single most important tool in designing complex systems. There is simply no way to design a million lines of code, whether it be hardware or software, without using multiple levels of abstraction. But, what exactly is abstraction? Most designers know intuitively that, for example, a high-level programming language ,such as C, is a higher level of abstraction than assembly language. Equivalently, in hardware, RTL is a higher level abstraction than gate-level. However, few designers understand the theoretical basis for abstraction. If we believe that the solution to designing ever more complex systems is higher levels of abstraction, then it is important to understand the basic theory of what makes one description of a design more or less abstract than another.

There are four types of abstraction that are used in building hardware/software systems:

  • structural
  • behavioral
  • data
  • temporal

Structural Abstraction

Structure refers to the concrete objects that make up a system and their composition For example, the concrete objects that make up a chip are gates. If we write at the RTL level of abstraction:

    a = b + c;

this is describing an adder, but the details of all the gates and their connections is suppressed because they are not relevant at this level of description. In software, the concrete objects being hidden are the CPU registers, program counter, stack pointer, etc. For example, in a high-level language, a function call looks like:


The equivalent machine-level code will have instructions to push and pop operands and jump to the specified subroutine. The high-level language hides these irrelevant details.

In general, structural abstraction means specifying functions in terms of inputs and outputs only. Structural abstraction is the most fundamental type of abstraction used in design. It is what enables a designer to enter large designs.

Behavioral Abstraction

Abstracting behavior means not specifying what should happen for certain inputs and/or states. Behavioral abstraction can really only be applied to functions that have been structurally abstracted. Structural abstraction means that a function is specified by a table mapping inputs to outputs. Behavioral abstraction means that the table is not completely filled in.

Behavioral abstraction is not used in design, but is extremely useful, in fact, necessary, in verification. Verification engineers instinctively use behavioral abstraction without even realizing it. A verification environment consists of two parts: a generator that generates input stimulus, and a checker, which checks that the output is correct. It is very common for checkers not to be able to check the output for all possible input values. For example, it is common to find code such as:

    if (response_received)
        if (response_data != expected_data)

The checker only specifies the correct behavior if a response is received. It says nothing about the correct behavior if no response is received.

A directed test is an extreme example of behavioral abstraction. Suppose I write the following directed test for an adder:

    a = 2;
    b = 2;
    if (out != 4)

The checker is the last two lines, but it only specifies the output for inputs, a=2, b=2, and says nothing about any other input values.

Data Abstraction

Data abstraction is a mapping from a lower-level type to a higher-level type. The most obvious data abstraction, which is common to both hardware and software, is the mapping of an N-bit vector onto the set of integers. Other data abstractions exist. In hardware, a binary digit is an abstraction of the analog values that exist on a signal. In software, a struct is an abstraction of its individual members.

An interesting fact about data abstraction is that the single most important abstraction, from bit vector to integer, is not actually a valid abstraction. When we treat values as integers, we expect that they obey the rules or arithmetic, however, fixed with bit vectors do not, specifically when operations overflow. To avoid this, a bit width is chosen such that no overflow is possible, or special overflow handling is done.

Temporal Abstraction

This last abstraction type really only applies to hardware. Temporal abstraction means ignoring how long it takes to perform a function. A simple example of this is the zero-delay gate model often used in gate-level simulations. RTL also assumes all combinational operations take zero time.

It is also possible to abstract cycles. For example, a pipelined processor requires several cycles to complete an operation. In verification, it is common to create an unpipelined model of the processor that completes all operations in one cycle. At the end of a sequence of operations, the architecturally visible state of the two models should be the same. This is useful because an unpipelined model is usually much simpler to write than a pipelined one.

The four abstractions described above comprise a basis for the majority of abstractions used in design and verification. That is, any abstraction we are likely to encounter is some combination of the above abstractions. However, we are still left with the question of what is a valid abstraction. I will defer answering this until the next post.

(note: this discussion is based on the paper, “Abstraction Mechanisms for Hardware Verification” by Tom Melham. There is slightly more high-level description of these abstractions in this paper along with a lot of boring technical detail that you probably want to skip.)

Well, there isn’t one. But, understanding the three laws of verification can help you understand how to optimize your current verification methodology.

(See The First Law of Verification, The Second Law, The Third Law)

The first law says that no matter what methodology is used, verification will be the bottleneck in getting the project completed. The third law tells us that different methods may find the same bug at different points in the verification process. The second law tells us that the same bug may be easier or harder to find depending on the methodology used and the person doing the verification. Based on these observations, we can synthesize a methodology that optimizes verification efficiency:

  1. Parallelize the verification effort.
    • have multiple people working independently.
    • have multiple simulations or other automated verification tools running simultaneously.
  2. Minimize overlap between parallel efforts.
    • have different people verify different aspects of the design.
    • orchestrate verification automation tools to minimize duplicated effort.
  3. Use as many different methodologies as possible.
    • get as many different points of view as possible.
    • minimize the possibility that a bug that is hard to find with one methodology will slip through.
  4. Evaluate the efficiency of each methodology used.
    • efficiency = bugs found/effort put in.
    • put more effort into those methodologies with highest efficiency.
  5. Start as soon as possible.
    • the first law says you will find bugs at the beginning no matter what you do.
    • the sooner you start, the sooner you fill finish.

Most projects naturally follow the first two guidelines. The next three guidelines come from insight gained in understanding the three laws of verification. The subjective nature of bug hardness means that using a single methodology, even if parallelized, increases the chances of relatively simple bugs slipping through the verification process. The way to overcome this is to use as many different methodologies as possible. The downside to this is that each methodology has development cost associated with it. Therefore, there is a tradeoff between development effort and the number of different methodologies that can be deployed. Generally, smaller teams will end up using fewer different methodologies due to resource constraints and larger teams will be able to use more.

It is important to evaluate methodologies used in order to improve the overall verification process for future projects. You need to be careful here. Don’t throw away all methodologies except for the one that is most efficient. You will have lost the advantage of using different methodologies. If you do decide to remove a methodology because it is not finding bugs, try to replace it with another unique methodology rather than just putting more effort on fewer methodologies. Putting more effort on fewer methodologies just increases the risks of bugs slipping through.

As an example, here are efficiencies of the different methodologies used on the MCU project:

Method Bugs Found Effort (Man-Months) Bugs/Man-Month
code review 44 2 21.0
static checks+unit testing 31 13 2.4
full chip directed testing 70 38 1.8
regression+perturbation 12 9 1.3
coverage improvement 3 4 0.8
assertions 3 4 0.7
full chip random+algorithmic tests 5 8 0.6
emulation 3 28 0.1
overall 171 106 1.6

Based on this data, I would probably look seriously at dropping emulation in the next project.

But, these numbers can be deceiving. Unlike bug hardness, bug finding efficiency does vary with time. This is a big topic, so I will save it for its own post.

I am a big fan of Fred Brooks’ “The Mythical Man-Month: Essays on Software Engineering”. Brooks was leader of one of the first large software projects. Along the way, he found that a lot of the conventional wisdom about software engineering was wrong, most famously coming up with the idea that adding manpower to a late project makes it later.

I have also found that a lot of the conventional wisdom about the causes of verification difficulties is wrong. So, in designing this blog, I decided to model it after the Mythical Man-Month. The essence of this style is:

  • short, easy-to-read essays on a single topic.
  • timelessness – focus on overarching issues, not on how to solve specific problems with specific programming languages.
  • back it up with real data whenever possible.

In some senses, this blog will attempt to fill in some obvious holes in the Mythical Man-Month. Brooks states that project effort can be roughly apportioned as:

  • 1/3 planning (specification)
  • 1/6 coding
  • 1/2 verification

but then proceeds to talk mostly about planning and coding and very little about verification. I think this blog will cover each of these areas in rough proportion to these numbers, so most of my posts will be on verification, but a fair number will cover specification and some on particular design issues.

Brooks’ work is more than 30 years old, so it is worth re-examining some of his conclusions to see if they still hold up as design complexity has increased with time. One of the areas of contention is the percentage of time spent in verification. Brooks’ claim of verification taking 50% of the effort applied to large software projects. Today, there are claims that hardware verification is taking 70% of the effort. EDA vendors often point to this “growth” as proof that verification is becoming the bottleneck.

But, is this the real story? Software verification is mostly done by the designers. In this kind of environment, verification consumes roughly 50% of the total effort. 20 years ago, Hardware verification was also roughly 50% of the effort because it was mostly the designers doing the verification. The shift to pre-silicon verifcation that came about due to the advent of HDLs and synthesis enabled the separation of verification and design. But, separate verification is not as efficient as having the designer do the verification. So, now verification is 70% of the effort instead of 50%. So, rather than growing from 50% to 70% of more, it was more of a one time jump due to the shift in methodology. But, whether it is 50% or 70%, verification is the largest single piece of the overall design effort.

I wrote an article addressing this subject in more detail titled, “Leveraging Design Insight for Intelligent Verification Methodologies”. You can download it from the Nusym website

My previous post on the second law of verification left open the question of why the bug rate curve starts high and goes low. If it is not due to bug hardness increasing with time, then what is it?

Towards the end of the MCU project, I decided to analyze all bug reports, over 100 total at the time, to try to get some ideas about the hardness of bugs that occurred. The first problem was how to synthesize a useful metric that captures the difficulty of finding bugs. We have seen that hardness is subjective, so rather than trying to measure absolute hardness, I chose to measure the relative hardness of one bug against another. Let’s call this measure “Bug Finding Probability” (BFP).

One way of measuring BFP is to count the number of events required to trigger a particular bug. For example, bug reports often give a description such as: “A remote read followed by a local write to the same address causes the wrong data to be written.” This is describing two independent events that must happen: 1) a remote read, and 2) a local write. Assuming we are verifying a memory, this is usually done by injecting a series of reads and/or writes to exercise various functions. Tests may inject a read with some probability, followed by a write with a different probability. The product of the two probabilities is the BFP for the bug.

Assuming equal probabilities for injecting requests, the BFP is inversely correlated with the number of events required to trigger the bug. In other words, the more events required to trigger the bug, the lower the BFP. When reading these bug reports, it was often as easy as simply counting the number of occurrencess of the word “and” in the bug description to determine the relative hardness of the bug. Here is a plot of bug hardness vs. time along with the bug rate vs. time for the MCU project. Bug hardness in this plot is measured as the number of independent events required to trigger the bug.

bug rate and bug hardness vs. time for the MCU project

bug rate and bug hardness vs. time for the MCU project

My initial impressions of this were that 1) bug hardness is quite “noisy” vs. time, which doesn’t necessarily invalidate the “bug hardness rises with time” theory if you average it out and 2) on average, bug hardness did not rise over time. At the time, I was not really concentrating on understanding this. I casually mentioned to some colleagues at Stanford that I had found this result which contradicted expectation. Kanna Shimizu, another researcher in our group, and Dave Dill, my advisor were the first to arrive at an explanation for this. Dave, being a professor, naturally had to cast the problem into the classical balls and bins analysis used in probability testbooks.

Assume that you have a bin with an unknown number of black and white balls, with some ratio of black to white balls. Randomly draw one ball from the bin, with all balls having equal probability of being chosen. If the chosen ball is black, paint it white and put it back in the bin and continue to pull out balls and paint the black ones white and put them back into the bin. If you then plot the rate at which black balls are selected, you will get curve that starts high and then exponentially decreases to a low value.

The analogy should be obvious. Balls represent points in the possible input space of a design, black balls represent points that hit a bug. Pulling out a ball corresponds to running a test. Painting a ball white corresponds to fixing a bug. Therefore, the rate that black balls are pulled out of the bin corresponds to the bug rate of the design. The bug rate started high and went low exactly as is observed in real designs. But we assumed that all balls were equally probably which corresponds to all bugs having equal hardness!

This was quite a revelation. It meant that the purgatory period was not caused by hard bugs at all, but simply due to the fact that there weren’t many bugs left to be found. We can now state the third law of verification:

third law of verification: The hardness of a bug is independent of when it was found.

Based on data from the MCU, I think it is fair to say that, to a first-order approximation, all bugs are equally hard. Since, in almost all cases, the verification effort finds 90+% of bugs in a design, we can therefore, to a first-order approximation, restate the third law as:

third law of verification (simplified) bugs are easy.