Skip navigation

Tag Archives: complexity

On  August 6, Vinay Deolalikar, a researcher at HP Labs in Palo Alto, released a paper with a proposed proof for the open problem P != NP. Somehow, news of this proof made it to the mainstream press with headlines proclaiming that the problem was solved. Problem was, the computational complexity community not only had not had to time to review the proof, but viewed it with great skepticism.

The significance of this is that proving P != NP is a grand challenge problem. You will hear that if it turned out the other way, that is, P = NP, then all known cryptographic systems would become easily breakable. On the other hand, it would mean that many verification and optimization problems would suddenly become tractable. More troubling, P=NP would also would mean that Skynet is just around the corner. Most researchers already believe that P != NP, so there is really no immediate effect from producing a valid proof of P!+NP.

The P!+NP problem has been deemed so important to solve that whoever solves it either way will receive a $1M prize from the Clay Mathematics Institute. Before this happens, though, the proof would need to be generally accepted by the research community as being correct, a process that will probably take months, if not years.

An extraordinary thing happened in response to the release of the proof. A number of experts in the field dropped what they were doing and started poring over the proof in earnest. This is no small feat given that the proof is over 100 pages long. Pretty quickly, a consensus was reached that there were a number of serious flaws, although, as far as I can tell, no smoking gun yet. For his part, Deolalikar has not conceded defeat yet and is working on addressing all issues that were found.

The proof and mathematics involved are  beyond my pay grade. But, the most interesting part of this episode for me is the sociological issues brought up. A group of the top experts in the world dropping everything to review a paper in a matter of days is highly unusual.  Unprecedented is that fact that the discussion happened entirely out in the open in online forums. This is in contrast to the normal peer review process, which, for journals, can stretch to years, and is often done by only a few anonymous reviewers who generally don’t share their discussion outside the editorial board (or conference program committee as the case may be).

The discussion could be followed by anyone interested. Now, it is not clear that this is a scalable model as is: it is not possible to drop everything at a moments notice for every paper that comes along. This could have the negative effect of, the next time this happens due to the crying wolf effect. On the positive side, the openness of the process allowed the computational complexity to come together and get in sync on this problem. Many of the objections raised were “standard” objections. For example, the proof may prove other things that are known to be false. This is something that researchers routinely check for when reviewing a paper.

Researchers have an, essentially, agreed upon set of “barriers” to a proof of P!=NP. They also point to a number of results (theorems) that any valid proof would need to address. Any proof in the future (assuming Deolalikar’s doesn’t work out) would need to address these barriers. Putting all this discussion in the open, hopefully, means that proposed proofs in the future would be much higher quality ( P!=NP (and some P+NP) “proofs” are produced  on a regular basis).

The other thing that is clear from this episode is that anyone who is going to receive the $1M prize needs the blessing of the computation complexity community. The most interesting barrier that I learned about is a theorem that essentially says “proofs of P!=NP are hard because P!+NP”. All of the other barriers also imply that any proof must be hard. No one is going to come up with a short, elegant, easy-to-understand proof for this. This means that a proof is highly unlikely to come from an outsider. Deolalikar is an outsider, but managed to meet enough of the community requirements to be taken seriously. While researchers acknowledge the contributions of outsiders, the nature of this problem makes it highly unlikely to come from an outsider.

In the (likely) event that Deolalikar’s proof ends up being incorrect, years from now, when a proof is produced, I think we will look back at this event and see that it was a step forward because of the openness of the discussion. Here’s hoping that this leads to positive improvements in how scientific review is done. I think we should be grateful to Deolalikar even if his proof doesn’t work out and even if it represents no new insight into the P!=NP problem.

For further reading:

bbc news (an easy to understand article blessed by an expert in the field)

http://rjlipton.wordpress.com/ (where most of the discussion took place)

wiki (gory details. warning: heavy math)

Advertisements

We have seen in previous posts that design and verification problems fall in the class of computationally complex hard problems. That is, in the worst case, it takes O(2^N) time to optimize/verify a design of size N, where N could represent the number of binary inputs. If we run into this limit, our ability to design large systems is severely limited.

Given these limits, then, how it is possible that we can routinely create designs consisting of millions of lines of code or gates? More importantly, how we can continue to design increasingly larger, more complex designs?

One answer is that we can’t. We can’t find all the bugs in complex designs, but we can hope that there are no critical bugs. We can’t perfectly optimize a design, but we can do a good enough job. Even then, exponential complexity should be a brick wall to increasing complexity, but is not. Somehow, we are able to avoid these limits.

One way to look at this is to consider the way human brains are built and work in creating complex designs. The jumping off point for understanding this is Shannon’s counting argument.

Claude Shannon was one of the most important computer scientists of the twentieth century. He is most famous for his work on information theory. Shannon’s counting argument is one of his less well known results. Shannon was trying to answer the question, what is the minimum number of gates (say two-input NAND gates) needed to be able to implement any function of N binary inputs? The basic answer is that there must exist functions that require an exponential number of gates to implement. The proof goes by counting the number of possible functions of N inputs and counting the number of functions implementable with T gates and showing that T is o(2^N). The argument is as follows:

  • the number of functions of N inputs is 2^(2^N). For example, a truth table for N=4 would have 16 entries. The number of functions, therefore is 2^16=16,384.
  • The number of functions of N inputs that can be implemented with T 2-input gates is (N+T)^(2T). Each gate input can connect to the output of any other gate or an input for a total of (N+T) possible connections and there are 2T gate inputs.
  • To implement any possible function, then, we need (N+T)^2T >= 2^(2^N).
  • Solving for T shows that T >= 2^(N-d) for some small delta d, which implies that T = o(2^N).

What is the significance of this? Today’s chips have as many as 1000 pins. What is the minimum number of gates required on the chip that would allow us to implement any function of a thousand inputs? Is, say, one million gates sufficient? Off the top of your head, you probably would say no. What about a billion? Our preconceived notion is that this should be sufficient. After all, that is right at the limit of today’s most advanced chips and it is inconceivable that we couldn’t implement what we want with that many gates.

Now lets look at what the counting argument has to say about this. The total number of possible functions is 2^(2^1000), which is a vast number, roughly 10^(10^300). A billion 2-input NAND gates would allow us to implement roughly 10^(10^10) different functions with a thousand inputs. This is also a vast number, but is infinitesimal compared to the number of possible functions. The fact is, we are limited to a very small number of functions that can be implemented!

And this picture doesn’t fundamentally change even if we assume we have a trillion gates at our disposal. So, how is it we believe that we can implement whatever functionality we need? To answer this, let’s work backward from the number of gates available and determine the maximum number of inputs for which we can implement any arbitrary function. Let’s choose the most complex system we have at our disposal, the human brain.

The human brain contains on the order of a trillion neurons, each with roughly a thousand connections. Assuming all neurons can be connected to any other neuron (which is not the case), we come up with an upper bound on the number of functions that can be implemented: (10^10)^(1000 * 10^10) = 10^(10^14) = 2^(2^40). In the best case, therefore, the human brain could implement any function of just 40 binary inputs.

Now we can do a thought experiment. What is the largest size truth table that you can mentally make sense of at one time? Two bits is easy. Three bits not so hard. Four bits is difficult to visualize as a truth table, but using visual aids such as Karnaugh maps makes it possible.

Another test would be to look at the placement of arbitrarily placed objects on a plane. Given a glance, how accurately could you replicate the placement of all objects. If there were only one object, I could envision determining placement with an accuracy of 5-6 bits in both X and Y dimensions. More objects with less accuracy. Beyond about seven objects , it is is difficult to even remember exactly how many objects there are. Maybe 10-12 bits accuracy overall is possible.

The bottom line is the human brain has a very low capacity for consciously grasping an arbitrary function. Let’s say its no more that 10 bits in the best case. From the counting argument, we can determine that it requires slightly more than 500 gates to implement any arbitrary function of 10 inputs. With a million gates on a chip, we could put 20000 such functions on the chip.

Now we treat each of these 500 gate blocks as a unit and then consider how to hook these 20000 blocks together to perform some function. This is certainly possible using abstraction. In this case it would probably require two or more levels of abstraction to be able to deal with 20000 blocks.

Abstraction is the essence by which we are able to design very large systems.  The need for abstraction is driven by the limitations on our brain in dealing with large functions imposed by Shannon’s counting argument. And it is the structure imposed by multiple layers of abstraction that makes intractable design and verification problems become tractable.

Abstraction: the suppression of irrelevant detail

Abstraction is the single most important tool in designing complex systems. There is simply no way to design a million lines of code, whether it be hardware or software, without using multiple levels of abstraction. But, what exactly is abstraction? Most designers know intuitively that, for example, a high-level programming language ,such as C, is a higher level of abstraction than assembly language. Equivalently, in hardware, RTL is a higher level abstraction than gate-level. However, few designers understand the theoretical basis for abstraction. If we believe that the solution to designing ever more complex systems is higher levels of abstraction, then it is important to understand the basic theory of what makes one description of a design more or less abstract than another.

There are four types of abstraction that are used in building hardware/software systems:

  • structural
  • behavioral
  • data
  • temporal

Structural Abstraction

Structure refers to the concrete objects that make up a system and their composition For example, the concrete objects that make up a chip are gates. If we write at the RTL level of abstraction:

    a = b + c;

this is describing an adder, but the details of all the gates and their connections is suppressed because they are not relevant at this level of description. In software, the concrete objects being hidden are the CPU registers, program counter, stack pointer, etc. For example, in a high-level language, a function call looks like:

    foo(a,b,c);

The equivalent machine-level code will have instructions to push and pop operands and jump to the specified subroutine. The high-level language hides these irrelevant details.

In general, structural abstraction means specifying functions in terms of inputs and outputs only. Structural abstraction is the most fundamental type of abstraction used in design. It is what enables a designer to enter large designs.

Behavioral Abstraction

Abstracting behavior means not specifying what should happen for certain inputs and/or states. Behavioral abstraction can really only be applied to functions that have been structurally abstracted. Structural abstraction means that a function is specified by a table mapping inputs to outputs. Behavioral abstraction means that the table is not completely filled in.

Behavioral abstraction is not used in design, but is extremely useful, in fact, necessary, in verification. Verification engineers instinctively use behavioral abstraction without even realizing it. A verification environment consists of two parts: a generator that generates input stimulus, and a checker, which checks that the output is correct. It is very common for checkers not to be able to check the output for all possible input values. For example, it is common to find code such as:

    if (response_received)
        if (response_data != expected_data)
           print("ERROR");

The checker only specifies the correct behavior if a response is received. It says nothing about the correct behavior if no response is received.

A directed test is an extreme example of behavioral abstraction. Suppose I write the following directed test for an adder:

    a = 2;
    b = 2;
    dut_adder(out,a,b);
    if (out != 4)
       print("ERROR");

The checker is the last two lines, but it only specifies the output for inputs, a=2, b=2, and says nothing about any other input values.

Data Abstraction

Data abstraction is a mapping from a lower-level type to a higher-level type. The most obvious data abstraction, which is common to both hardware and software, is the mapping of an N-bit vector onto the set of integers. Other data abstractions exist. In hardware, a binary digit is an abstraction of the analog values that exist on a signal. In software, a struct is an abstraction of its individual members.

An interesting fact about data abstraction is that the single most important abstraction, from bit vector to integer, is not actually a valid abstraction. When we treat values as integers, we expect that they obey the rules or arithmetic, however, fixed with bit vectors do not, specifically when operations overflow. To avoid this, a bit width is chosen such that no overflow is possible, or special overflow handling is done.

Temporal Abstraction

This last abstraction type really only applies to hardware. Temporal abstraction means ignoring how long it takes to perform a function. A simple example of this is the zero-delay gate model often used in gate-level simulations. RTL also assumes all combinational operations take zero time.

It is also possible to abstract cycles. For example, a pipelined processor requires several cycles to complete an operation. In verification, it is common to create an unpipelined model of the processor that completes all operations in one cycle. At the end of a sequence of operations, the architecturally visible state of the two models should be the same. This is useful because an unpipelined model is usually much simpler to write than a pipelined one.

The four abstractions described above comprise a basis for the majority of abstractions used in design and verification. That is, any abstraction we are likely to encounter is some combination of the above abstractions. However, we are still left with the question of what is a valid abstraction. I will defer answering this until the next post.

(note: this discussion is based on the paper, “Abstraction Mechanisms for Hardware Verification” by Tom Melham. There is slightly more high-level description of these abstractions in this paper along with a lot of boring technical detail that you probably want to skip.)

I am a big fan of Fred Brooks’ “The Mythical Man-Month: Essays on Software Engineering”. Brooks was leader of one of the first large software projects. Along the way, he found that a lot of the conventional wisdom about software engineering was wrong, most famously coming up with the idea that adding manpower to a late project makes it later.

I have also found that a lot of the conventional wisdom about the causes of verification difficulties is wrong. So, in designing this blog, I decided to model it after the Mythical Man-Month. The essence of this style is:

  • short, easy-to-read essays on a single topic.
  • timelessness – focus on overarching issues, not on how to solve specific problems with specific programming languages.
  • back it up with real data whenever possible.

In some senses, this blog will attempt to fill in some obvious holes in the Mythical Man-Month. Brooks states that project effort can be roughly apportioned as:

  • 1/3 planning (specification)
  • 1/6 coding
  • 1/2 verification

but then proceeds to talk mostly about planning and coding and very little about verification. I think this blog will cover each of these areas in rough proportion to these numbers, so most of my posts will be on verification, but a fair number will cover specification and some on particular design issues.

Brooks’ work is more than 30 years old, so it is worth re-examining some of his conclusions to see if they still hold up as design complexity has increased with time. One of the areas of contention is the percentage of time spent in verification. Brooks’ claim of verification taking 50% of the effort applied to large software projects. Today, there are claims that hardware verification is taking 70% of the effort. EDA vendors often point to this “growth” as proof that verification is becoming the bottleneck.

But, is this the real story? Software verification is mostly done by the designers. In this kind of environment, verification consumes roughly 50% of the total effort. 20 years ago, Hardware verification was also roughly 50% of the effort because it was mostly the designers doing the verification. The shift to pre-silicon verifcation that came about due to the advent of HDLs and synthesis enabled the separation of verification and design. But, separate verification is not as efficient as having the designer do the verification. So, now verification is 70% of the effort instead of 50%. So, rather than growing from 50% to 70% of more, it was more of a one time jump due to the shift in methodology. But, whether it is 50% or 70%, verification is the largest single piece of the overall design effort.

I wrote an article addressing this subject in more detail titled, “Leveraging Design Insight for Intelligent Verification Methodologies”. You can download it from the Nusym website

I ran across an article the other day with the title “Complex Behavior from Simple Systems” or something like that. This didn’t make sense to me. Aren’t all complex systems built up from simpler systems? Microprocessors are built from AND and OR gates, large software systems are built from simple variables and assignments, bridges are built from steel and rivets. What was so noteworthy about this?

It turns out that, in the realm of continuous systems, the terms “complex behavior” and “simple system” are well defined. It was talking about how chaotic behavior could arise from simple differential equations.

However, hardware and software are examples of discrete systems, where these terms are not as well defined. This made me realize that if I am going to talk about complex systems, I better define what I mean.

In computer science, there are many definitions of complexity. There is computational complexity, which is probably the most familiar and relevant one to the design of software. There is algorithmic complexity, which may be more relevant to hardware, but is not widely used. But none of these really capture the notion of complexity when we talk about designing large, complex systems. Also, all of these computational metrics have one problem, they don’t actually tell you whether something is complex or not. They simply return a value. It is up to you to decide if the measured system is complex or not based on this value.

The question, then, is, what is a useful definition of a complex system? I am going to use a definition that appears to be somewhat facetious:

A complex system is one that has bugs when it ships.

This doesn’t seem useful because it appears to say that you cannot define something as complex until after it ships. But, the reality is, all large systems ship with bugs (If anyone has shipped a large software/hardware system that is bug free, you shouldn’t be reading this blog, you should be writing it.) If you are working on a multi-million line-of-code project or multi-million gate project, we both know there are going to be bugs when it ships. They may be minor, but they will exist, so my definition, I think, is fairly clear cut.

But, this is still a subjective definition and there doesn’t seem to be any way of avoiding this. In fact, many aspects of engineering complex systems are subjective. I have seen that one of the greatest disconnects that causes quality and schedule issues in large project is the expectation that there are objective metrics when, in fact, there are only subjective judgements. When asked a question like “have you found all the bugs yet?”, it is useful to remember the following principle in designing complex systems:

everything is subjective

We still need to answer the question: is my definition of complex systems useful? We can break the problem into two parts: 1) how hard is it to avoid putting bugs into the system in the first place, and 2) how hard is it to find the bugs you put in? If a system has no bugs when it ships, no matter how easy it was to avoid putting bugs in, it had to have been easy to find them. Therefore, verification complexity is the key metric in determining whether a system is complex by my definition. OK then, how do we define verification complexity? Well, I hate to tell you this, but it’s subjective too, but that is the topic of another post.