It looks like this was a timely post. Just a few days after my last post, Harry  (theAsicGuy) Gries posted an entry speculating that having thousands of computers available would revolutionize EDA algorithms. I think with an understanding of how computational complexity affects EDA algorithms, he might not be making these claims.

Let’s start by looking at what he said:

One of the key benefits of the cloud is that anyone can have affordable access to 1000 CPUs if they want it. If that is the case, what sorts of new approaches could be implemented by the EDA tools in addressing design challenges? Could we implement place and route on 1000 CPUs and have it finish in an hour on a 100M gate design? Could we partition formal verification problems into smaller problems and solve what was formerly the unsolvable? Could we run lots more simulations to find the one key bug that will kill our chip? The cloud opens up a whole new set of possibilities.

Harry is wondering how parallelism can help EDA algorithms. The goal of parallelism is mainly to allow completing a given faster. Problems can be broken into two classes: those that are embarrassingly parallel and all others.

Embarrassingly parallel problems are easy to parallelize because problem instances are completely independent. In EDA, there are a number of embarrassingly parallel problems:

  • regression testing – you have thousands of tests. They are all independent. You can run them on as many machines as are available to complete the regression as quickly as possible.
  • synthesis – different blocks can be synthesized in parallel.
  • spice – run different corners in parallel.
  • placement – place blocks in parallel.

This is done routinely today. Even small companies have dozens of servers, if not hundreds to do these, and other,  embarrassingly parallel tasks.

But the real question Harry is asking is: can having thousands of CPUs solve the other kind of parallelism. Can a parallel simulator run a test a thousand times faster? Can we increase the capacity of formal verification by a factor of a thousand? Can we reduce place-and-route time down to minutes?

Sorry, Harry, but the answer to these questions is: no. Suffice it to say that, if it could have been done, it would have been done already. It is not from lack of trying that EDA algorithms are not parallelized today. It is not from lack of CPUs either.There generally are no gains running parallelized algorithms on even 10 CPUS, far less 100 or 1000. If any EDA company could design a parallel algorithm that gave 10X speedup running on, say 20 CPUs, they would have a huge advantage over the competition.

In short, if it was easy to get large speedups just by throwing more CPUs at the problem, it would be embarrassing that nobody has done that yet.

One of the projects I worked on, some time ago, was to improve the efficiency of the physical design process. The goal was to reduce the time required to get from RTL ready to tapeout from nine months to six months. We looked at all aspects of the flow to find bottlenecks and streamline the flow. One of the major problems that emerged from this was how little hardware designers understand computational complexity and its impact on the design process.

As an example, one of the issues that the logic designers wanted addressed in the physical design flow was the fact that scan nets were given equal priority during place and route to the functional nets. Scan is used for testing only and is not performance critical, while the functional nets are very performance critical. Place-and-route tools generally don’t have the knowledge to know which nets are critical and which are not. The result is that the more critical nets are less optimal than they could be.

The proposal to fix this was to do the place and route in two steps. First, a netlist would be created with no scan nets. This netlist would be placed. Presumably, the placer would place all registers optimally with respect to the critical functionality nets only. This would guarantee, that when the critical nets were routed, they would be optimal. Second, a full netlist including all scan nets would be created, which would be placed and then routed. However, all registers would be “locked” into place as determined by the first placement run. Since these placements are supposedly optimal, the critical nets should end up be routed more optimally.

I told them that locking the registers in place for the second run was going to over-constrain the placer and would result in far less than optimal results, just based on understanding the computational complexity of place-and-route. The lead logic designer, who had proposed this scheme, argued that the placement was proven to be optimal, so it shouldn’t matter that the register placements were being locked down. There was a lot of discussion on this, but no amount of argument was going to convince this designer that his scheme wouldn’t work.

As an experiment, we decided to do a trial place-and-route on two different netlists, with only minor differences between them. Neither had scan hooked up (comparing a netlist with scan to one without wouldn’t prove anythning). The results were that, in one case all the registers in one floorplan room ended up one corner of the room and, in the other, they all ended up in the diagonally opposite side. When the designer who proposed this scheme saw that, he said, “I didn’t expect that.” I said, “I did.” Needless to say, we abandoned this scheme shortly thereafter.

The fundamental property that this exhibits is non-linearity of response – a small change in the input causes a large change in the output. This property is inherent when using complex algorithms to create and verify designs. I don’t know if today’s place-and-route tools handle the scan net problem, but even if they do, I still encounter many situations in which there is an expectation of proportional response to a design change when reality is that the response is non-linear.

Formal verification, for example, is plagued by this problem. But, it can even occur in simulation. You are probably familiar with the experience of spending a lot of time tuning a constrained random environment to get a certain level of coverage. But, then the designer makes a minor change, and suddenly you get much lower coverage and then have to spend a lot of effort to get back to where you were. This particular problem is one that we are trying to address at Nusym with our intelligent test generation.

The bottom line for this unpredictable behavior is that it is unavoidable. But, I still find the expectation that design can be made predictable common (especially amongst managers). I plan to publish some posts trying to demystify computational complexity (complexity for dummies ;) ). I will try to frame this in a way that even managers can understand, which, hopefully should help lead to better decision making with respect to planning the amount of effort that needs to go into using complex algorithms in design.

There was an interesting article on verification methodology in Chip Design Magazine recently. The author, Carl Ruggiero, works at an IP supplier and, so, doesn’t have any particular agenda to push with respect to verification methodology (unlike most of the articles in this magazine).

He makes the following points:

  • quality of verification is not correlated to quantity of verification.
  • Directed testing and constraint-based random testing can both be equally successful or unsuccessful.
  • quality of verification is not correlated to the language chosen.
  • Good verification planning, execution, and tracking are the keys to producing a high-quality (low bug) design.

The interesting thing about this is that Ruggiero states that these things were surprising to him. If you understand the concepts of Bugs are Easy, none of these things should be surprising. Let’s try to put these statements in the context of the three laws of verification and the orthogonality concept.

We know that putting effort on multiple, orthogonal methods is better than putting all the effort on a single method. This alone can explain why high verification effort  can fail to produce a high quality design compared to using low verification effort.

We know that directed and random testing are orthogonal methods and both are capable of finding a majority of bugs. We also know that either can appear to be efficient or inefficient depending on how they are deployed. Thus, it is not surprising that Ruggierio sees different groups having different levels of success for random and directed testing.

I’ll come back to languages in a minute.

His conclusion that good planning, execution, and tracking is the key to producing a high quality design, however, is counter to the principles of Bus are Easy because it is essentially a statement there is an absolute best methodology. First, I think Ruggiero would take issue with calling good planning a methodology. After all, isn’t good planning essential to any successful endeavor? How could this be a methodology if all methodologies require good planning? Well, it turns out that back in 1996, two researchers from DEC proposed a verification methodology in which the central tenet was to not do any planning (Noack and Kantrowitz, DAC, 1996) (Sorry, can’t find an online version to link to). Their reasoning, which should sound familiar, was that no matter what you did at the beginning of verification, you would find bugs, so why bother spending a lot of time planning up front. We used this methodology successfully on the MCU chip that I worked on at HAL.

So, planning is a methodology, not planning is a methodology and both are, therefore, subject to scrutinization using the laws of verification. The fact that planning was successful does not mean it is the best methodology. Not planning has also proven successful.

Now let’s return to the issue of languages. Ruggiero states that he has seen simple Verilog-based environments produce high-quality designs and complex HVL-based environments produce low-quality designs. He goes on to further say

…a commitment to execute it turned out to be far more important than the tools chosen to implement it…

where , in this context, “tools” refers to languages. This conflation of tools as languages is made more explicit in his concluding paragraph:

…methodology matters far more than tools in delivering working hardware designs. While certain EDA languages can make engineers more productive…

There is a clear assumption in his mind as he equates tools and languages: languages are technology. That is, there is something about advanced languages that makes testbenches written using these languages more likely to find bugs or get higher coverage or whatever. While advanced languages certainly are useful and enhance productivity by including features that you probably would have to create manually, nothing about them is inherently smarter or better with respect to finding bugs. It’s like saying it’s better to design in English than in Chinese. Or that if you have power steering in your car, you are less likely to get lost than if you have manual steering. The languages are technology argument makes no sense, but as Ruggiero’s article shows, this mindset is pervasive.

Ruggiero correctly concludes that languages are not important to final quality, but he misses the more fundamental conclusion: languages are not technology.

I talked previously about comparing methodologies and the that the key to choosing different methodologies is to choose those that are orthogonal. I also talked about how to decide whether a new method is worth doing or not. In this post, I will compare two methodologies, simulation and formal, in more detail using the concepts I developed in these previous posts.

When starting out with simulation, you may write some simple directed tests or the simplest random environment that exercises only the most basic cases. This represents a very over-constrained environment, that is, it tests a very small subset of the legal input space. This is insufficient to find all bugs, so generally you will modify the environment to relax the constraints such that it can test more of the input space. New bugs are found as new parts of the input space are explored.

Generally, the hard part of increasing the input space is refining the checker. For example, it is easy to inject errors on the input side, but extremely difficult to check results – should the packet be discarded? which packet? How many packets? Another example is configuration. The initial environment usually only exercises a single configuration. Each configuration bit adds some amount of work to verify correctness and, in a complex system, there are usually many configuration bits and combinations of configuration bits to be tested.

As a consequence of this, it becomes more time consuming to increase the input space that is being exercised. And the closer it gets to the exact legal input space, the harder it becomes to get there. This is one of the main causes of the verification bottleneck. At the same time, there is no way to short circuit this because not getting to the point of exercising the complete input space means potentially missing easy bugs that could be show stoppers.

This is where assertion-based verification (ABV) comes in. Rather than starting with a very over-constrained environment and then refining it by gradually loosening the constraints, ABV starts with a highly under-constrained environment and, as constraints are added, gradually restricts it toward the exact legal input space.

The problem in reducing the over-approximation inherent in assertions is analogous to expanding the under-approximation in simulation. The input space is restricted by adding more constraints. The closer the input space gets to the legal input space, the harder this becomes. The reasons are similar to simulation, but there is an additional factor with assertions, namely, that is basically impossible to specify high-level behavior using assertions, which means it is not possible to completely verify a design using assertions only. This means that you will never get to point of constraints specifying the exact input space, it will always be under-constrained to at least some extent.

Effort required to constrain simulation and formal to the exact legal input space

Effort required to constrain simulation and formal to the exact legal input space

So, why use assertions at all? The fact that assertions approach the problem from the opposite end of the input space spectrum means that assertions are an orthogonal method of verification. They force the user to view the design in a very different way. This has two benefits. First, bugs that are hard to find using simulation may be easy to find using assertions and/or formal. Second, assertions make it easier to debug because the error is generally caught closer to the source of the bug (in fact, I believe this is the prime advantage of using assertions).

What is the downside of using assertions and formal? First, since it is not possible to completely verify using assertions alone, nobody is going to abandon doing random and directed testing and, generally, this will be done first. As we have seen before, whatever is done first will find the most bugs. This means that assertions and formal generally are relegated to finding a small number of bugs. This is OK, since these methods are orthogonal to random and directed testing, as long as it is not too much effort.

Unfortunately, writing assertions is very time consuming, error prone, and because it is under-constrained, prone to false negatives (bugs) that cause a lot of frustration and wasted effort. Using formal is another large effort on top of writing the assertions in first place.

Bottom line, assertions and formal are orthogonal methods to simulation and, therefore, useful. But, their return-on-investment is low compared to simulation. If you have the budget, time, and manpower, or verification is a critical problem for you, they are probably worth doing, otherwise, they are probably not.

What is formal verification? Today, even if you ask experts, you will not get a clear answer.

Here are a few definitions you will find:

  • “formal verification of is the act of proving or disproving the correctness … using formal methodsmathematics.” <wikipedia>
    • “formal methods are particular kind of mathematically-based techniques for verification …” <wikipedia>
  • “Use of various types of logic and mathematical methods to verify the correctness of IC logic…” <edacafe>
  • “Establishing properties of hardware or software designs using logic, rather than (just) testing…” <nist>

None of these definitions is correct, or even meaningful, for that matter. What they are trying to get at is that formal verification is not simulation. There is somehow mathematics involved, but it is not clear how or why this is significant.

The sad part about this is that these definitions largely come from experts in the formal verification world. The fact is researchers don’t really know how to define formal verification in a way that clearly differentiates it from simulation. Considering that most researchers in the area have a theoretical background and tend to look down their noses at simulation as a verification method, this identity crisis is a real problem for them.

The real goal of formal verification is to prove things correct rather than merely to test them. To understand the origin of the crisis, we first have to understand what it means to prove something. There are two fundamental properties of any proof method: soundness and completeness. A proof system is sound if what it proves is also true. For example, we would not want a proof system that proves 2 +2 = 5. A proof system is complete if everything that is true is proven. Suppose we have a correctly working adder. A complete proof would have had to verify correctness for all possible input values since all possible input values result in the correct answer.

Soundness underlies any verification method. Simulation and formal verification are both sound. When university research labs produced with the first practical formal verification method, BDD-based model checking, the early 1990s , it was easy to differentiate formal verification from simulation. Formal verification was complete and simulation was incomplete. But, very quickly it became apparent that these first industry-ready formal verification tools were still far too capacity limited to be useful on real designs.

Semi-formal verification was proposed as the way forward. What is semi-formal verification? It’s formal verification, except its not complete. The idea was to give up on proving things and just focus on finding more bugs. But, simulation is also incomplete. So, what’s the difference between the two? And, can we come up with a better definition of formal verification (including semi-formal verification) that clearly distinguishes it from simulation in a useful way?

The answer comes, somewhat obviously, from the definition of the word formal. Surprisingly, though, the word formal does not refer to the use of mathematics. Here is a definition from Webster’s:

formal: … b) relating to or involving the outward form, structure, relationships, or arrangement of elements rather than content…

In the context of verification, the word formal refers to looking at the structure of the design rather than its behavior. For example, suppose our adder is composed of gates. A formal verification -based proof system would represent the structure of these gates (type and connections) as symbols and create equations describing the relationships between them. The adder would then be represented by one (very large) equation. Then to prove the adder is correct, you could create the equation:

enormous_equation_representing_structure_of_adder = A+B

Then it is a “simple” matter of doing algebraic manipulation to show that both sides of the equation are the same, similar to how one would prove the algebraic identity, “2A – A = A”. Once you were done, you would have a complete proof because the symbols A and B represent all possible values of inputs.

OK, that is more an explanation of how formal verification works than its essence and it doesn’t explain what incomplete formal verification is. The key point here is the use of symbols to represent structure and symbolic manipulation to prove results. If a proof system is using symbols and manipulating them directly, it is formal in some sense. Incomplete formal verification just means that we did not prove it for all possible values of the symbols. For example, suppose for the equation “2A – A = A”, we could only prove it for A>0. This is incomplete because we did not verify the equation for all values. This is still better than simulation because we did prove it for many more values than if we simply tested a large number of values. This is the rationale for semi-formal verification; that it can cover more cases than simulation, but can be incomplete.

But, just saying semi-formal can potentially cover more cases than simulation is still not a clear differentiation from simulation. There has to be something more. The answer lies in the use of symbols and symbolic manipulation. Even if incomplete, a system that uses symbols has an idea of what has been not been proven in addition to what has been proven. This means that an incomplete proof system can give the user information about what was not proven. Simulation-based proof systems only give information on what was proven.

Therefore, a useful definition of these different proof methods is:

  • simulation: sound, incomplete, not aware of what is not proven.
  • semi-formal verification: sound, incomplete, aware of what is not proven.
  • formal verification: sound, complete.

Over the years, there has been much discussion and argument over which of several competing methodologies is the best one for verification. Competing papers have been written arguing opposite sides of this question, even to the point of each including data “proving” their side.

As a case study, let’s take the example of random vs. directed testing. A lot has been published over the last twenty years comparing these two methods. Some papers state that random finds bugs more efficiently than directed. Same state the opposite. No conclusion has emerged from all the conflicting data. Why is this? Is there any way to resolve this?

Using the framework of the the three laws of verification, it is possible to synthesize a consistent explanation for all the conflicting data. First, to recapitulate:

1st law: the bug rate is high at the beginning of the verification effort, low at the end

2nd law: bug hardness is subjective.

3rd law: bug hardness is independent of when bugs are found during the verification process.

Let’s assume that random and directed testing will each take six months effort and each will find 90% of bugs.

Suppose we do random testing first. We spend six months developing a random testbench, running it and finding and fixing 90% of the bugs. If we now spend six months developing directed tests, the best we can accomplish is to find 90% of the remaining 10%, or 9% of the bugs. But, the amount of effort was the same. Conclusion, random is much more efficient than directed testing.

Now suppose we did directed testing first, followed by random. Now, directed testing would find 90% of the bugs with six months effort and random would find only 9% with the same effort. Conclusion, directed testing is more efficient.

Thus, measured efficiency is a function of the order that verification methodologies are applied. Whichever is done first will look the most efficient. This is a direct result of the first law of verification. Without context on how these methods are applied, there is no way to judge the conclusions presented.

Another issue is that papers often conclude with a statement like “…and we found a bug that wasn’t found before”, with the implication that this method is better than the previous methods. We can also examine this in the context of the three laws of verification.

The best verification methodology is one that combines as many orthogonal methods as possible. Methods are orthogonal if bugs that are hard to find in one method are easy in the other (second law), or bugs that would be found later in one method are found earlier in the other(third law).

Consequently, orthogonality is the best metric to use when comparing methods. The fact that one method finds a bug that was missed by another method is an indication that the two methods are orthogonal. But, that is not the whole story. We need to take into account the scope of the different methods to determine which is actually better.

People often infer from statements such as, “we found a bug that was missed”, that the method would have found all bugs that were previously also and, therefore, this method can replace the old method. This is rarely the case. In the worst case, it could be that this method could only find this one bug.

To judge fairly two methods, it is necessary to do two experiments. Let’s say we want to evaluate whether to replace method A with method B. First, we do method A followed by method B. Let’s say method A finds 100 bugs and method B finds 10 bugs. Method B looks good because it found bugs that were missed by method A, which, because its our current method, we believe to be thorough.

Second, we try method B first followed by method A. If method B finds 20 bugs and method A finds 90 bugs which method B missed, we would say that method A is vastly superior even though the first experiment showed that method B had promise. If instead, method B found 90 bugs and method A found ten bugs, then you could claim they are equal (assuming equal effort). Only if method B found more than 90 bugs when done first could any claim of it being better be made. This is a very tall order, which is why most of the verification papers written with results claiming to have found bugs that were missed generally don’t have much impact.

So, what have we learned? First, lacking context, claims of one method being more effective/efficient or better are meaningless. Second, rather than focusing on which methods are best, focus on which methods are most orthogonal. Methodologies often reflect mindset. Orthogonal methodologies force different mindsets such that  simple bugs don’t slip through, which is the key to ensuring the highest probability of success.

Somehow, the powers that be decided that my post on languages was worthy of a wider audience. We made it to Deep Chip! Deep Chip is John Cooley’s site that is read by every hardware designer on the planet.

Here is the link:

http://deepchip.com/wiretap/090219.html

If anyone wants to comment on this and somehow finds this post (unfortunately, John doesn’t allow “self promotion”) , please feel free to post comments.

Verification is often considered the wicked step-child of design. Most designers want to design because to design is to create. There is not an ounce of creative work that goes into verification. Designers tell verification engineers what tests to write and the verification engineers job is simply to code these tests with little or no input on their part on the design of the tests. It’s not surprising, then, that most engineers want to be designers.

However, this view of verification engineering misses the point. In fact, looked at from a non-engineering point of view, verification can actually be viewed as the more critical task and, more importantly to the verification engineer, the more prestigious task. It all comes down to how one defines the job ‘engineering’ as opposed to, say, ’scientist’, or ‘artist’. From a societal point-of-view, each one of these jobs is equally important. In fact, it is often the case that scientists look down their noses at engineers because engineers simply use knowledge that scientists discover. Engineers point out that most of today’s technological marvels are created by engineers, not scientists. Artists look down on both since neither of them creates anything beautiful.

The relationship between design and verification is analogous to the relationship between engineers and scientists. Thus, it is understandable that designers can look down on verification and, at the same time, verification can rightly hold a position of respect equal or higher than design.

The task of engineering can be viewed as a combination of art and science. Science is defined as the discovery of knowledge. This is usually taken to mean some fundamental knowledge about the universe, but can, in a broader sense, apply to any knowledge. For example, suppose no one has ever designed an 10G Ethernet controller in 65nm technology and you were tasked with doing exactly that. Clearly, it would be necessary to first gain some knowledge of how to do this. Since no one has done this before, you essentially would be performing the role of a scientist while gaining this knowledge. In fact, all through the design process, most of the work will be taken discovering knowledge about how to design such a product. Generalizing broadly, we can say that 99% of engineering is science.

The primary tool at the scientist’s disposal is the scientific method: hypothesize, perform experiments, conclude results. Usually, hypothesizing and concluding require virtually no time, making experimentation the most time consuming part. The most time consuming part of this effort is designing the experiment. Designing is engineering. Consider the Apollo moon program. The actual science conducted in this project was a few hours gathering moon rocks. Engineering the experimental apparatus, which is to say, the moon rocket required thousands of man-years of effort. Thus, again generalizing broadly, we can say that 99% of science is engineering.

Art is the creation of beautiful objects, engineering is the creation of useful objects. Although the output of these acts is very different, the actual process has many similarities. Most people understand that engineering useful objects is a very rule-bound process. The creation of art is also a very rule-bound process. In painting, beauty comes about because of a detailed understanding of color, composition, materials, and a number of other factors. It is possible for you or I to create art without understanding these rules, but the probability of creating something beautiful is as remote as if one randomly soldered together some resistors, capacitors, and transistors and it ended up being something useful. Music, in particular, has intricate rules that often leave little choice in what notes to choose during composition. It is often thought that genius is the ability to break rules and find new and better rules. In fact, the opposite is true. Whether it be Mozart or Einstein, geniuses do sublime work by strictly following the rules. In fact, it is their deeper understanding of the rules that allows them to see better what can be done.

We can define engineering as combining the creativity of art and the discovery of knowledge of science. Design is the artistic side, but which still requires strict discipline in following rules, which are given by specifications and other design rules. At the same time, design requires figuring out how to design something which has never been designed before. There clearly is some up front scientific discovery needed.

Verification is like science. A verification engineer is given a design, about which very little is known. In particular, it is not known whether there are any bugs in the design or what they might be. The goal of the verification process is to discover any bugs. This, more than anything, is science. Thus, if design brings out the artist in an engineer, verification brings out the scientist. Both are noble.

The training of an engineer is highly disciplined and is done with a heavy dose of the scientific method. Why, then, do engineers disdain verification so much? Much of the blame can be laid on the need for instant gratification. At the very beginning, both verification and design require some amount of discovery of knowledge in the form of studying the specification. After this, the first step of design is purely creative, the fun part of design. The first step in verification is pure drudgery, setting up test benches, scripts, and writing endless tests. After the creative part of the design is done, the rest is pure drudgery. The fun part of verification occurs after all the drudgery is complete, and that is when the verification engineer starts discovering bugs in the design.

There is no greater joy than discovering a bug that reveals some fundamental mis-understanding about the design. This new knowledge can have a dramatic effect on the design teams perception of the design. This effect is not unlike that of a scientist discovering something that contradicts some well-established principle of the natural world. The most famous example of this is probably Einstein’s Theory of Relativity which changed how we view fundamental laws of physics. This analogy is more instructive than appears. Newtonian mechanics work most of the time. Relativity corrects those cases in which it fails. But, Relativity does not simply tack additional refinements on to Newtonian mechanics. It provides a different point of view and shows how Newtonian mechanics are a special case of this new point of view. Although many bugs found during verification are simple refinements of something that is correct, bugs are found which show the true principles of how a design works and that the initial design was simply an instance of these principles that happened to work most of the time. It is these bugs that provide the real satisfaction to verification engineers.

If you have ever taped out a chip and gotten it back in the lab, you have inevitably had the experience of finding a bug which makes you say, “how in the world did we miss this?” When the chip taped out, it was running weeks of regression and random tests without finding any bugs. You reviewed everything and believed that the verification is comprehensive such that, if there are any holes, they are not big ones.

But then, when you get the chip back in the lab, it hits a bug within a second of turning the chip on. When you analyze it, you find that is a very simple case, maybe, a certain bus request in conjunction with a particular configuration bit being set is sufficient to trigger the bug. You think, “how could we have missed something so simple?”

The answer is simple also: the Universe has changed.. It is not unlike the day the Universe changed when Einstein discovered the Relativity principle.What changed that day is the same as what changed the day you got your chip back in the lab. You are looking at the problem with a different mindset.

The mindset you started off with came about when you wrote your initial verification test plan. To understand this, consider how a test plan is created. A test plan basically divides the entire space of possible behavior into different sets. For example, consider the creation of a test plan for a simple memory system. First, the set of all tests can be dividided into two basic types: reads and writes. Next, each of these could be sub-divided into cacheable and uncacheable. Next, each subclass can be divided based on address alignment. We also need to consider sequential behavior. Therefore, each class is further subdivided by being followed by a second request, each of which could be sub-divided based, again, on cacheable/uncacheable, etc.

This test plan can be represented by a tree in which each branch represents a division based on a choice such as read or write, cacheable/noncacheable. This tree grows very quickly and it is up to the verification engineer to decide how deep is sufficient to be considered comprehensive. For anything of any complexity, the number of leaves of the tree is far more than can be tested in a reasonable amount of time. Directed and random testing are ways of sampling the leaves of the tree in a uniform way such that there are no major holes.

A Test Plan For a Simple Memory System

A Test Plan For a Simple Memory System

Clearly, bugs can slip through if they occur in one of the leaves that ends up untested. But, if tests are uniformly distributed across leaves, then there should be no major holes. So, for example, if reads don’t work at all, this will be detected because we have sampled on the read side of the tree.

Now suppose we reorder the tree. For example, instead of the order being read/write, cacheable/uncacheable, aligned/unaligned, we reverse the order.

A TestPlan For a Simple Memory System - Reordered

A TestPlan For a Simple Memory System - Reordered

This reordering can cause a uniform distribution of tests to suddenly become non-uniform. Reordering can reveal large holes in the testing. Reordering in this case reveals that the combination of unaligned/uncacheable requests is not tested, something that was not obvious in the original tree.

This is how the Universe changes when we get the chip back in the lab. We find a bug, but we are looking at it with a different ordering of the tree. We are imagining a tree in which the top of the tree is the bus request and next branch is that particular configuration bit and are seeing that whole side of the tree is completely untested. When we think “how did we miss this?”, we are forgetting that back at the beginning, we were looking at a different tree, one in which that configuration bit was near the bottom. Because of this we didn’t notice that, despite the fact that we were sampling the leaf nodes comprehensively, we were never setting this configuration bit in conjunction this particular bus request.

So, to avoid these kinds of bugs in future, you might try, as you near tapeout, to change the Universe by reordering the tree you used to create your testplan.

In the late Sixties, efforts were underway to define the second generation of programming languages. The first generation of languages, FORTRAN, algol, and cobol, had been in user for almost ten years and enough experience had been built up to allow improved languages to be built. At the time, IBM was the dominant player in programming languages and was in a position to define a language and have it become the defacto standard programming language. The language they designed, called PL/1, was basically a superset of the existing mainstream languages of the time. It had features to support scientific computing, business computing, and systems programming. IBM made PL/1 its flagship programming language and they expected it to become the dominant programming language in the industry.

Meanwhile, at Bell Labs, a programming language called C was being developed. The philosophy behind C was completely opposite to that of PL/1. Instead of being a superset of all existing languages, C was a minimal common subset. C made it easy to create libraries that made up for the lack of features found in other languages. The practice of having a standard library associated with a language started with the C language. Essentially, C is not a language for writing programs, it is a language for writing libraries. Applications are built by putting libraries together. In fact, it is not possible to write a Hello World program in C without using a library (stdio). Today, all modern programming languages still follow this basic philosophy.

The whole Open Source movement is enabled by the philosophy behind C. Imagine if Pl/1 had won the language race and the only way to add new standard features was to add them to the language. The PL/1 compiler was controlled by IBM and the language was not designed to be easily compilable. The software industry would not be as nearly as advanced today if PL/1 had won the language wars.

Contrast this with how hardware design languages have evolved. The analogs of FORTRAN, algol, and cobol in hardware are Verilog and VHDL. These were first generation hardware description languages (HDLs) developed about twenty years ago. Shortly after that, specialized verification languages, such as e and Vera, were introduced that added features useful for verification. Then, along came assertion languages such as PSL and SVA. Today, the hardware design world is migrating to a language called SystemVerilog. Can you guess which path it took? Yup, the PL/1 path. SystemVerilog is a superset of Verilog, Vera, and SVA. The initial release version was version 3.1a. That’s right, it took three major revisions to get it right, then someone decided to add some functionality to get version 3.1. Oh, but wait, that still wasn’t enough, we just had to have a few extra features to get to 3.1a.

And it gets worse. In 35 years, there has been exactly one revision of the C language. SystemVerilog is due for a revision next year after only five years. Verilog has undergone four major revisions in twenty years. The proprietary languages, e and Vera, are changed practically every quarter.

The cost of all this is that it is much harder to innovate in hardware design than it is in software design. At Nusym, we spend 90% of our effort (and money) on language support rather than our core differentiating technology. To illustrate this problem, we have spent dozens of person-years developing Verilog and Vera infrastructure for our tool. By contrast, we had one person spend three months to prototype it on C.

Is there a solution to this problem? Well, there are C-based hardware design languages, such as SystemC. But, using C directly as a hardware design language is trying to fit a square peg into a round hole. Developers and users of SystemC do leverage the ability to easily create libraries, but the goal of this language is to raise the level of abstraction, not create a minimal hardware design/verification language.

It would be interesting to think of what a real equivalent of C for hardware design/verification would be. If there is any interest, I can post my ideas of how to do this and others can contribute. I don’t know if anything would come out of this, but it might be interesting to look at.