I recently read a blog post about verification with some claims that I find quite dubious. This gave me the idea of posting some “rules” for reading papers/articles/blog posts on verification that I think are useful in judging the validity of these claims. Note also that I have addressed most of these rules in more detail in previous posts.

- The most frequent claim I see are those involving the coming “verification bottleneck”. The claim is invariable that, today verification is 70% of the effort (an oft-quoted, but unproven statistic), and tomorrow, because chips will have twice as many transistors, will be much higher.
What seems to get lost in these claims is that the same claim could have been made two years ago for the current generation, and the effort for the previous generation was still 70% (the 70% number has been around for quite a long time). In fact, going back to the dawn of the RTL era, when there was a dramatic shift in methodology, one could have made the same claim for the last ten generations of chips. This is usually ignored and no reason is given for why the 11th generation is somehow profoundly different than the last ten such that we need to buy these new tools (which is, inevitably, the other shoe falling in these articles).

In short, be skeptical of claims of future bottlenecks that ignore the fact that the jump from this generation to the next is no different from the jump over the previous ten generations.

- Be skeptical of claims uses the term “verification bottleneck”. Do a Google search on “verification bottleneck” and you will find all the results are either research papers or vendors selling trying to sell a new product.
- Be skeptical of claims using math. You will sometimes see something like, verifying N gates today requires 2^N effort and tomorrow there will be double the number of gates, means it will take 2^(2N) effort. Therefore, effort is exponentially increasing and so buy our tool which greatly increases capacity. This is basically a variant of rule 1, but, in general, if you see math, be skeptical.
- It is common (and almost a condition of acceptance) that verification research papers always conclude with something like “…and a bug was found that was missed by simulation” or something along these lines. Remember, bugs are easy. An algorithm that found new bugs does not mean that this algorithm is better than any existing algorithm.
The best way to read verification papers is to ignore the fact that bugs may have been found. If you still find value in the paper even if you make the assumption that it found no bugs, then it probably does have some value.

- The last rule comes from the premise that bugs are easy, or as I like to call it, the “pick axe and shovel” argument. This is usually in relation to some claim about better results (more bugs) being found on some, usually large-scale, verification effort. The better results are usually used to justify the proposed methodology as being superior to other methodologies.
The issue here is that the project usually involved a large number of well-coordinated, smart people. The counter-argument to this is, I could take the same smart people and, using nothing more than pick axes and shovels, find lots of bugs. Why does this prove that your methodology is better?

What prompted to write this post was this post by Harry Foster. This is posted on Mentor’s website, so Harry is obviously pitching the company party line. I won’t bother critiquing the post here. I suggest reading Harry’s post after reviewing the rules above. Let me know what you find.

]]>

The significance of this is that proving P != NP is a grand challenge problem. You will hear that if it turned out the other way, that is, P = NP, then all known cryptographic systems would become easily breakable. On the other hand, it would mean that many verification and optimization problems would suddenly become tractable. More troubling, P=NP would also would mean that Skynet is just around the corner. Most researchers already believe that P != NP, so there is really no immediate effect from producing a valid proof of P!+NP.

The P!+NP problem has been deemed so important to solve that whoever solves it either way will receive a $1M prize from the Clay Mathematics Institute. Before this happens, though, the proof would need to be generally accepted by the research community as being correct, a process that will probably take months, if not years.

An extraordinary thing happened in response to the release of the proof. A number of experts in the field dropped what they were doing and started poring over the proof in earnest. This is no small feat given that the proof is over 100 pages long. Pretty quickly, a consensus was reached that there were a number of serious flaws, although, as far as I can tell, no smoking gun yet. For his part, Deolalikar has not conceded defeat yet and is working on addressing all issues that were found.

The proof and mathematics involved are beyond my pay grade. But, the most interesting part of this episode for me is the sociological issues brought up. A group of the top experts in the world dropping everything to review a paper in a matter of days is highly unusual. Unprecedented is that fact that the discussion happened entirely out in the open in online forums. This is in contrast to the normal peer review process, which, for journals, can stretch to years, and is often done by only a few anonymous reviewers who generally don’t share their discussion outside the editorial board (or conference program committee as the case may be).

The discussion could be followed by anyone interested. Now, it is not clear that this is a scalable model as is: it is not possible to drop everything at a moments notice for every paper that comes along. This could have the negative effect of, the next time this happens due to the crying wolf effect. On the positive side, the openness of the process allowed the computational complexity to come together and get in sync on this problem. Many of the objections raised were “standard” objections. For example, the proof may prove other things that are known to be false. This is something that researchers routinely check for when reviewing a paper.

Researchers have an, essentially, agreed upon set of “barriers” to a proof of P!=NP. They also point to a number of results (theorems) that any valid proof would need to address. Any proof in the future (assuming Deolalikar’s doesn’t work out) would need to address these barriers. Putting all this discussion in the open, hopefully, means that proposed proofs in the future would be much higher quality ( P!=NP (and some P+NP) “proofs” are produced on a regular basis).

The other thing that is clear from this episode is that anyone who is going to receive the $1M prize needs the blessing of the computation complexity community. The most interesting barrier that I learned about is a theorem that essentially says “proofs of P!=NP are hard because P!+NP”. All of the other barriers also imply that any proof must be hard. No one is going to come up with a short, elegant, easy-to-understand proof for this. This means that a proof is highly unlikely to come from an outsider. Deolalikar is an outsider, but managed to meet enough of the community requirements to be taken seriously. While researchers acknowledge the contributions of outsiders, the nature of this problem makes it highly unlikely to come from an outsider.

In the (likely) event that Deolalikar’s proof ends up being incorrect, years from now, when a proof is produced, I think we will look back at this event and see that it was a step forward because of the openness of the discussion. Here’s hoping that this leads to positive improvements in how scientific review is done. I think we should be grateful to Deolalikar even if his proof doesn’t work out and even if it represents no new insight into the P!=NP problem.

For further reading:

bbc news (an easy to understand article blessed by an expert in the field)

http://rjlipton.wordpress.com/ (where most of the discussion took place)

wiki (gory details. warning: heavy math)

]]>

This is one of these cases where you get a dot and you connect them with other seemingly random dots and see that you have a line. In this case, what I realized is that the difficulty in verification is not really finding bugs (bugs are easy, right?), but in how efficiently we find these bugs. Recently I posted on how constrained random testing is essentially a (hard) optimization problem. I also posted on the best verification methodology being to combine orthogonal methodologies in order to optimize bug finding productivity. The criticality of optimizing safety-critical test cases was another data point that led me to this realization.

This is reflected in the fact that many of the most successful verification tools introduced over the last twenty years have succeeded by optimizing verification productivity. As we all know, faster simulation really does very little to improve the quality of a design, but it helps enormously in improving verification productivity. Hardware verification languages are probably the second most important development in verification in the last twenty years. But again, they don’t improve quality, they simply improve productivity in developing verification environments.

This is not to say that there have not been tools that improve quality. Formal proof clearly improves quality when it can be applied, although semi-formal verification, which focuses on bug hunting is more of a productivity increase. In-circuit emulation allows you to find bugs that could not be found in simulation due to being able to run on the real hardware. However, emulation used simply as faster simulation is really just a productivity increase.

Is verification optimization related to the well discussed verification bottleneck (you know, the old saw about verification consuming 70% of the effort)? Verification became a bottleneck when the methodology changed from being done predominantly post-silicon to being done predominately pre-silicon. Many people saw the resulting dramatic increase in verification effort as being correlated to increased design size and complexity. If this were true, then verification would consume 98% of the effort today since this switch occurred twenty years ago and there have been many generations of products since then. Since relative verification effort has not changed significantly over the last twenty years, I think it is safe to say that verification effort is constant with increased design size and that differences in relative effort reflect differences in methodology more than anything.

The real question is: will verification optimization become more important in the future as designs become larger and will that result in relative verification effort rising? If there is no change to design methodology, we would expect verification optimization effort to remain constant. If high-level synthesis allows us to move up the abstraction ladder, this should improve the ability to optimize verification. In short, there does not seem to be a looming crisis in overall verification optimization.

However, if we look at the software side we see that software content on hardware platforms is growing rapidly which is putting enormous pressure on the ability to verify this software. Effectively, we have managed to forestall the hardware verification optimization crisis by moving it to software

]]>

Talking about this strategy in a response to Olivier Coudert’s blog post on formal verification, I wrote:

I think this use model will continue to grow, while formal verification, the product, will continue to whither. I predict that in 10-15 years there will be no formal verification products, but most, if not all, verification solutions at that time will incorporate formal technologies under the covers.

At this year’s DAC, however, it was apparent that this trend is happening now, rather than 10-15 years from now. There were a number of new verification startups and products announced this year, including:

- Avery Insight – X propagation and DFT
- Jasper ActiveDesign – design exploration and debug
- NextOp – assertion synthesis
- Vennsa – automated debugging

Notice that none of these products claims to be a formal verification product. But look who created these products: formal verification Ph.D.s. Do you doubt that these products rely heavily on formal verification techniques under the covers?

The only exception to this, ironically, appears to be my new employer, Jasper, whose JasperGold formal verification product has been seen huge growth over the last year. They have found that the recipe to success for formal verification is to buffer the complexity of it by using methodology and services. Also, Jasper’s latest product, ActiveDesign, a tool that derives simulation traces for a specified design target, is not specifically intended to be a verification tool, except of course, it uses the same formal verification technology under the covers as JasperGold.

Now, there are still traditional formal verification products out there. Cadence IFV, Mentor 0-in, and Synopsys Magellan are still around. I may have missed it, but I didn’t see these tools making any a big impact at DAC this year. However, I suppose we must recognize that they still exist and that it may just take 10-15 years for them to die off.

]]>

There are several variants on the basic concept of random testing. Unconstrained random testing generates random values for all input variables without any consideration of legality. Constrained random testing is used when the legal set of input values is a subset of the entire input space. In the last dozen years, hardware verification languages (HVLs) have enabled the use of constraint-based random testing. Constraint-based random testing uses static constraints and constraint solvers to do constrained random testing.

Constraint-based random testing improves productivity in writing constrained random testbenches because it allows writing tests at a higher level of abstraction. Static constraints specify the legal input space, but not how to randomize values; this is left up to the solver. As designs/protocols have become more complex, constraint-based random testing’s productivity gain has increased. As a result, there is a proliferation of large, complex constraint sets in today’s testbenches.

Increasingly, however, solving these large, complex constraint sets has become a bottleneck in simulation due to the large amount of time spent solving these constraint sets. Consequently, it is common that a lot of time is spent debugging and trying to workaround these solving bottlenecks. In the worst case, it is necessary to discard the static constraints and rewrite the randomizer using imperative code, negating the productivity gains of using static constraints.

The fundamental cause of this problem is the fact that uniformly randomizing across a constrained input space is a NP-hard problem. As we have seen before, NP-hard problems are optimization problems. Uniformity of randomization is the optimization goal. For example, suppose we have a simple constraint: 0 <= X <= 10. If we randomize X a number of times and each time it returns the value 0, this meets the constraint, but is not very uniform. Randomization that produces the value 0 to 9 with equal probability is optimal. Because NP-hard problems have the additional burden of requiring an optimal value, not just a satisfying value, they are generally harder to solve than NP-complete problems.

What are solutions to this problem?

The obvious solution is to build better solver/randomizers. Companies that produce HVLs already spend a large amount of effort trying to improve their solvers. If it were that easy, it would have been done already.

Barring better solvers, the only other choice is to change methodologies. One change is to ditch static constraints and go back to writing imperative code with the ensuing reduction in productivity. Not a pleasant thought to contemplate.

An alternative to this is keep the static constraints, but reduce the complexity from NP-hard to NP-complete. This means dumping uniform distribution for, potentially, very poor, but still legal, randomization. This is feasible considering that the real goal of random testing is to find bugs and improve coverage. We believe the uniform distribution is required to do this. However, if the randomizer knew precisely what values to generate in order to find a bug or hit a coverage point, it could just generate these values, which may or may not be uniform. If it were possible to do this, randomization would go from being an NP-hard problem to being a NP-complete problem. Technologies such as Nusym’s path tracing technology allow this.

One thing I think would help to implement this strategy would be to provide an open solver API in today’s HVLs. This would allow plugging in different solvers for different circumstances and developing custom solvers on a case-by-case basis rather than being forced to have a one-size-fits-all solver.

]]>

…recently we have seen a steady dwindling in the number of publications that specifically target hardware. The purpose of this workshop is to rekindle the interest in and enthusiasm for hardware verification.

Almost simultaneously, Olivier Coudert wrote a thought provoking post entitled “Has formal verification technology stalled?” He was prompted to ask this question by the fact that the number of formal verification submissions to the Design Automation Conference (DAC) has dwindled to almost nothing. He wonders if this indicates that innovation has stalled.

I think several factors may be at work here.

One factor may be simple economics. The economic downturn has undoubtedly had an impact on the number of formal verification papers written over the last two years. Both Synopsys and Cadence have had their research labs, a staple of DAC papers every year, decimated.

Researchers may be forced to attend fewer conferences than before. If you attended, say, DAC and FMCAD before, you may have been forced to choose to attend only one. As a researcher, I would probably choose the one where I would most likely meet more formal verification researchers, since this is the prime motivation for attending conferences. In bad economic times, this is going to be the focussed conferences such as FMCAD instead of the broader EDA-wide conferences such as DAC and ICCAD (which also has seen dramatically reduced numbers of formal verification papers).

But, a more fundamental trend that I see is a change in the character of formal verification papers published at all conferences.

A formal verification research project generally consists of three parts:

- A formulation of an abstraction of the problem being solved.
- A translation algorithm from the problem domain to a solving problem.
- A solving algorithm.

In the early days of formal verification (late 80s and early 90s), papers would often address all three of these areas. For example, you might see something like “Verification of Cache Coherence Protocols Using BDD-based Model Checking”. The paper would describe an abstraction of a processor cache plus some properties that need to be checked (1), specify a detailed algorithm of how this is translated into a model checking problem (2), and then some novel model checking optimizations. As research progressed, each of these areas became specialized such that papers would often focus on one aspect while only cursorily addressing the others.

In the late 90s, model checking using SAT became the state-of-the-art. There were a number of papers detailing translations into SAT, a lot of papers on SAT algorithms tailored for model checking, but relatively little on problem formulation since SAT-based model checking was largely a drop-in replacement for BDD-based model checking.

However,within a few years, these papers started dwindling. First, translation has essentially become a solved problem. It is not necessarily straightforward, but it is no longer a research problem because today’s powerful SAT solvers can often compensate for weak translation.

Second, papers on SAT optimization, a staple of DAC in the early 2000s, have all but disappeared from formal verification conferences. Does this mean SAT solving technology has stagnated? No, far from it. Because of the advent of SAT competitions, almost all SAT papers now go to dedicated SAT conferences, which are still quite active. A quite large source of EDA formal verification papers has essentially disappeared and gone back to its AI community roots.

What is left then is mostly papers from the first area: problem formulation. We get many papers on how to verify this or that type of circuit using this particular abstraction and properties. Papers talk about the abstractions and properties and then says “and then we translate this into a SAT problem and give it to a SAT solver and here are the results using miniSAT” or something along those lines. SAT solvers are sufficiently robust that there is no need to worry about optimizing them or even investigating the effects of different solving strategies.

The researchers writing these types of papers don’t have the connections to the EDA industry that were prevalent in the early days of formal verification. Therefore, thare are fewer research papers being published in general EDA conferences such as DAC and ICCAD.

At the same time, I notice that the user track at DAC last year had many formal verification papers. Maybe all that research has finally translated into real tools. That would definitely be a sign of maturity.

]]>

if [formal methods] were ready, their use would be widespread

I haven’t spent a lot of time trying to figure out if his proscriptions make sense or not, but one thing stood out to me. He talks about a gap between software development and older engineering disciplines. This is not a new insight. As far back as the 60’s, the “software crisis” was a concern as the first large complex software systems being built started experiencing acute schedule and quality problems. This was attributed to the fact that programming was a new profession and did not have the rigor or level of professionalism of engineering disciplines that had been around for much longer. Some of the criticisms heard were:

- programmers are not required to have any degree, far less an engineering degree.
- programmers are not required to be certified.
- traditional engineering emphasizes using tried and true techniques, while programmers often invent new solutions for every problem.
- traditional engineering often follows a rigorous design process, programming allows hacking.

These explanations are often used as the excuse when software (usually Microsoft software) is found to have obvious and annoying bugs. But is this really the truth? Let’s look at an example of traditional engineering to see if if this holds up.

Bridge building is technology that is thousands of years old. There are still roman bridges built two thousand years ago that are in use today. Bridges are designed by civil engineers who are required to be degreed, certified engineers. Bridge design follows a very rigorous process and is done very conservatively using tried and true principles. Given that humanity has been designing bridges for thousands of years, you would think that we would have gotten it right by now.

You would be wrong.

Even today, bridges are built with design flaws that result in accidents and loss of life. One could argue that, even so, the incidence of design flaws is far less in bridges than in software. But this is not really an apples to apples comparison. The consequences of a bug in, say, a web browser are far less than a design flaw in a bridge. In non-safety critical software, economics is a more important factor in determining the level of quality of software. The fact is, most of the time, getting a product out before the competition does is economically more important than producing a quality product.

However, there are safety critical software systems, such as airplanes, medical therapy machines, spacecraft, etc. It is fair to compare these systems to bridges in terms of catastrophic defect rates. Let’s look at one area in particular, commercial aircraft. All commercial aircraft designed in the last 20 years rely heavily on software and, in fact, would be impossible to fly if massive software failures were to occur. Over the past 20 years, there have been roughly 50 incidents of computer-related malfunctions, but the number of fatal accidents directly attributed to software design faults is maybe two or three. This is about the same rate of fatal bridge accidents attributable to design faults. This seems to indicate that this gap between software design and traditional engineering is not so real.

The basic question seems to boil down to: are bridges complex systems? I define a complex system as one that has bugs in it when shipped. It is clear that bridges still have that characteristic and, therefore, must be considered as complex systems from a design standpoint. The intriguing question is, given that they are complex systems, do they obey the laws of designing complex systems? I believe they do and will illustrate this by comparing two bugs, one a bridge design fault and another a well known software bug.

The London Millennium Footbridge was completed in 2000 as part of the millennium celebration. It was closed two days after it opened due to excessive sway when large numbers of people crossed the bridge. It took two year and millions of pounds to fix. The bridge design used the latest design techniques, including software simulation to verify the design. Sway is a normal characteristic of bridges. However, the designers failed to anticipate how people walking on the bridge would interact with the sway in a way to magnify it. The root cause of this problem is that, while the simulation model was probably sufficiently accurate, the environment, in this case, people walking on the bridge, was not accurate.

This is a very common syndrome in designing complex hardware systems. You simulate the chip thoroughly and then when you power it up in the lab, it doesn’t work in the real environment. I describe an example of this exact scenario in this post.

In conclusion, it does seem that bridges obey the laws of designing complex systems. The bad news is that the catastrophic failure rate of safety-critical software is of roughly the same magnitude as that of bridges. This means that we cannot expect significant improvements in the quality of software over the next thousand years or so. On the plus side, we no longer need to buy the excuse that software development is not as rigorous as “traditional” disciplines such as building bridges.

]]>

Given these limits, then, how it is possible that we can routinely create designs consisting of millions of lines of code or gates? More importantly, how we can continue to design increasingly larger, more complex designs?

One answer is that we can’t. We can’t find all the bugs in complex designs, but we can hope that there are no critical bugs. We can’t perfectly optimize a design, but we can do a good enough job. Even then, exponential complexity should be a brick wall to increasing complexity, but is not. Somehow, we are able to avoid these limits.

One way to look at this is to consider the way human brains are built and work in creating complex designs. The jumping off point for understanding this is Shannon’s counting argument.

Claude Shannon was one of the most important computer scientists of the twentieth century. He is most famous for his work on information theory. Shannon’s counting argument is one of his less well known results. Shannon was trying to answer the question, what is the minimum number of gates (say two-input NAND gates) needed to be able to implement any function of N binary inputs? The basic answer is that there must exist functions that require an exponential number of gates to implement. The proof goes by counting the number of possible functions of N inputs and counting the number of functions implementable with T gates and showing that T is o(2^N). The argument is as follows:

- the number of functions of N inputs is 2^(2^N). For example, a truth table for N=4 would have 16 entries. The number of functions, therefore is 2^16=16,384.
- The number of functions of N inputs that can be implemented with T 2-input gates is (N+T)^(2T). Each gate input can connect to the output of any other gate or an input for a total of (N+T) possible connections and there are 2T gate inputs.
- To implement any possible function, then, we need (N+T)^2T >= 2^(2^N).
- Solving for T shows that T >= 2^(N-d) for some small delta d, which implies that T = o(2^N).

What is the significance of this? Today’s chips have as many as 1000 pins. What is the minimum number of gates required on the chip that would allow us to implement any function of a thousand inputs? Is, say, one million gates sufficient? Off the top of your head, you probably would say no. What about a billion? Our preconceived notion is that this should be sufficient. After all, that is right at the limit of today’s most advanced chips and it is inconceivable that we couldn’t implement what we want with that many gates.

Now lets look at what the counting argument has to say about this. The total number of possible functions is 2^(2^1000), which is a vast number, roughly 10^(10^300). A billion 2-input NAND gates would allow us to implement roughly 10^(10^10) different functions with a thousand inputs. This is also a vast number, but is infinitesimal compared to the number of possible functions. The fact is, we are limited to a very small number of functions that can be implemented!

And this picture doesn’t fundamentally change even if we assume we have a trillion gates at our disposal. So, how is it we believe that we can implement whatever functionality we need? To answer this, let’s work backward from the number of gates available and determine the maximum number of inputs for which we can implement any arbitrary function. Let’s choose the most complex system we have at our disposal, the human brain.

The human brain contains on the order of a trillion neurons, each with roughly a thousand connections. Assuming all neurons can be connected to any other neuron (which is not the case), we come up with an upper bound on the number of functions that can be implemented: (10^10)^(1000 * 10^10) = 10^(10^14) = 2^(2^40). In the best case, therefore, the human brain could implement any function of just 40 binary inputs.

Now we can do a thought experiment. What is the largest size truth table that you can mentally make sense of at one time? Two bits is easy. Three bits not so hard. Four bits is difficult to visualize as a truth table, but using visual aids such as Karnaugh maps makes it possible.

Another test would be to look at the placement of arbitrarily placed objects on a plane. Given a glance, how accurately could you replicate the placement of all objects. If there were only one object, I could envision determining placement with an accuracy of 5-6 bits in both X and Y dimensions. More objects with less accuracy. Beyond about seven objects , it is is difficult to even remember exactly how many objects there are. Maybe 10-12 bits accuracy overall is possible.

The bottom line is the human brain has a very low capacity for consciously grasping an arbitrary function. Let’s say its no more that 10 bits in the best case. From the counting argument, we can determine that it requires slightly more than 500 gates to implement any arbitrary function of 10 inputs. With a million gates on a chip, we could put 20000 such functions on the chip.

Now we treat each of these 500 gate blocks as a unit and then consider how to hook these 20000 blocks together to perform some function. This is certainly possible using abstraction. In this case it would probably require two or more levels of abstraction to be able to deal with 20000 blocks.

Abstraction is the essence by which we are able to design very large systems. The need for abstraction is driven by the limitations on our brain in dealing with large functions imposed by Shannon’s counting argument. And it is the structure imposed by multiple layers of abstraction that makes intractable design and verification problems become tractable.

]]>

The field of Artificial Intelligence (AI) got its start in 1950 with the famous Turing test. It was thought at that time that, by the year 2000, computers would possess human-level intelligence. Well, this didn’t come to pass and human-level intelligence still seems as far off now as it did then.

One metric back then was the game of chess. It was thought that for a machine to beat the best human at chess would require human-type thinking. This metric was met in 1997 when Deep Blue beat Gary Kasparov. However, as the designers of Deep Blue freely admit, Deep Blue won mostly by brute force enumerating all the solutions. it was clear, and had been for a while, that the chess problem was going to be solved by exploiting computers vastly faster speed compared to humans rather than by any form of human-type thinking.

Chess is a particular type of game in which there are two players, with one player moving first, the second making a counter-move, and the players alternating until one player wins or a draw is achieved. The goal is to find a winning strategy. We can formulate this as follows (for player 2 in this case):

for all first moves player 1 makes, there exists a first move that player 2 makes such that, for all second moves player 1 makes, there exists a second move that player 2 makes such that, ... player 2 wins.

The computationally complexity of problems with this structure (alternating “for all” and “there exists” quantifiers) is in the class PSPACE-complete. Recall that the NP-hard class is the set of problems that tend to be hard. The NP-complete class is a subset of the NP-hard class and these tend to be the easiest problems to solve in the NP-hard class. PSPACE-complete problems are the hardest subset of the NP-hard class. In general, there is no easier way to solve them than by enumerating all the solutions until a satisfying one is found. This is why research into chess programs focusses on making search speed as fast as possible and why it takes specialized hardware to beat a human.

But, what has this got to do with verification? At the most abstract level, a verification system consists of two parts: the device-under-test (DUT) and the environment it sits in. The DUT doesn’t do anything unless the environment initiates something by injecting a request. The DUT responds to this in some way. This is analogous to a two player game in which the DUT is one player and the environment is the other. Requests and responses correspond to moves and counter-moves. The question then is: what is analogous to a game in verification?

There are two fundamental categories of properties that all verfication checks fall into: safety properties and liveness properties. A safety property is one that says “nothing bad will happen” and a liveness property says “something good will always happen.” An example of a safety property is that a traffic light should never be green in both directions simultaneously. An example of a liveness property is that, for a given direction, the light should always eventually turn green.

The vast majority of properties that we verify on a day-to-day basis are safety properties. For example, the property, 2+2=4, in an adder is a safety property. It is not possible to prove any type of property using simulation alone. It is possible to demonstrate the existence of safety property violations, but not liveness property violations using simulation alone. For example, suppose we simulated the traffic light controller for a million time steps and the light was always red in one direction. Does this constitute a violation of the liveness property? It is not conclusively a violation because it is possible the light could turn green if we simulated one more time step. At the same time, even if we saw the light turning green, this does not mean that it won’t get stuck red at some point.

It is possible to prove liveness properties using formal verification. How would we formulate the liveness property of the traffic light? One way is to consider it as a game between the cars arriving at the light (the environment) and the traffic light state. Then we could formulate the problem:

for all possible combinations of cars arriving at step 1, there exists a state of the traffic light at step 1 such that for all possible combinations of cars arriving at step 2, there exists a state of the traffic light at step 2 such that .... if the traffic light state is red, then in the next state it will be green.

This has the same structure as a game and so is PSPACE-complete. Since PSPACE-complete problems are solveable to a certain degree, we can prove liveness properties, but generally only on very small designs. The most effective use of this is to verify highly abstracted designs, typically when doing protocol verification.

This deep connection between the formal verification field and AI field goes further than having similar complexity. The core tool used in formal verification today, SAT solving, emerged from research into AI problems such as planning. The cross fertilization between these two fields has yielded dramatic improvements in efficiency in SAT solving over the last 15 years.

]]>

If we look at these things, they have several things in common. First, they each involve a potentially large number of parts (pots, luggage, pieces of track) that need to be put together in some way. Second, we want to meet some objective, such as fitting all the dishes in the dishwasher. Third, there are a large number of constraints that must be met. In the case of loading the dishwasher, no two dishes can be put in the same place. There are N^2 constraints just to specify this, among many others. A fourth characteristic is that we may get close to an optimal solution, but find it difficult and not obvious how to get to a more optimal one (just how are we going to fit that last pot in the dishwasher). Furthermore, getting from a near optimal solution to an optimal one may involve a complete rearrangement of all the pieces.

One way to solve problems like packing a dishwasher is to view it as a truth table. Each dish can be put in one of, say, 100 slots, in, say, one of ten different orientations. This results in 1000 combinations, requiring 10 bits. If there are 40 dishes, 4000 bits are required to represent all possible configurations of dishes in the dishwasher. The resulting truth table is vast. Each entry in the table indicates how much space is left in the dishwasher if dishes are put in according to the configuration of that entry. A negative number indicates an infeasible solution. There will be many invalid configurations which have two or more dishes occupying the same location. We give all of these entries a large negative number.

The resulting table describes a landscape that is mostly flat with hills sparsely scattered throughout. We can also imagine that this landscape is an ocean in which negative values are under water and positive values represent islands in the ocean. The goal is to find the highest island in the ocean. We start in some random location in the ocean and start searching. We may find an island quickly, but it may not be the highest one. Given the vastness of the ocean, it is understandable why it can take a very long time to find a solution.

But, wait a minute. What about polynomial algorithms like sorting? A truth table can be constructed for these also. For example, to sort 256 elements, we can create 8 bit variables for each element to describe the position of that element in the sorted list. The value of each entry would indicate the number of sorted elements for that configuration. The complete table would again be around 4000 bits and have vast numbers of infeasible solutions in which two or more elements occupy the same slot in the list and only one satisfying solution. Yet, we know finding a solution is easy. Why is this?

The ocean corresponding to the sorting problem is highly regular. If we are put down in an arbitrary point in the ocean, we can immediately determine where to go just be examining the current truth table entry (point in the ocean). Knowing the structure, we may be able to determine from this that we need to go, say, northeast for 1000 miles. We may have to do this some number (but polynomial) times before getting to the solution, but is guaranteed to get to the solution. Structure in a problem allows us to eliminate large parts of the search space efficiently.

In contrast, for an NP-hard problem, there is no guarantee of structure. Furthermore, as we are sailing around this ocean, we are doing so in a thick fog such that we can only see what is immediately around us. We could sail right by an island and not even know it. Given this, it is easy to see that it could take an exponential amount of time to find a solution.

But then, how do we account for the fact that, often, NP-hard problems are tractable? The answer to this question is that there usually is some amount of structure in most problems. We can use heuristics to look for certain patterns. If we find these patterns, then this gives guidance similar to the sorting example above. The problem is that different designs have different patterns and there is no one heuristic that works in all cases. Tools that deal with NP-hard problems usually use many heuristics. The trouble is that, the more heuristics there are, the slower the search. At each step, each of the heuristics needs to be invoked until a pattern match is found. In the worst case, no pattern match will be found meaning it will take an exponential time to do the search, but the search will be much slower due to the overhead of invoking the heuristics at each step.

I hope this gives some intuition into NP-hard problems. In future posts I will talk about even harder classes of problem.

]]>