Skip navigation

Category Archives: specification

In an article in a recent issue of Computer entitled “Really Rethinking Formal Methods”, David Parnas questions the current direction of formal methods research. His basic claim is that (stop me if this sounds familiar) formal methods have too low ROI and researchers, rather than proclaiming the successes, need to recognize this and adjust their direction. As he so eloquently puts it:

if [formal methods] were ready, their use would be widespread

I haven’t spent a lot of time trying to figure out if his proscriptions make sense or not, but one thing stood out to me. He talks about a gap between software development and older engineering disciplines. This is not a new insight. As far back as the 60’s, the “software crisis” was a concern as the first large complex software systems being built started experiencing acute schedule and quality problems. This was attributed to the fact that programming was a new profession and did not have the rigor or level of professionalism of engineering disciplines that had been around for much longer. Some of the criticisms heard were:

  • programmers are not required to have any degree, far less an engineering degree.
  • programmers are not required to be certified.
  • traditional engineering emphasizes using tried and true techniques, while programmers often invent new solutions for every problem.
  • traditional engineering often follows a rigorous design process, programming allows hacking.

These explanations are often used as the excuse when software (usually Microsoft software) is found to have obvious and annoying bugs. But is this really the truth? Let’s look at an example of traditional engineering to see if if this holds up.

Bridge building is technology that is thousands of years old. There are still roman bridges built two thousand years ago that are in use today. Bridges are designed by civil engineers who are required to be degreed, certified engineers. Bridge design follows a very rigorous process and is done very conservatively using tried and true principles. Given that humanity has been designing bridges for thousands of years, you would think that we would have gotten it right by now.

You would be wrong.

Even today, bridges are built with design flaws that result in accidents and loss of life. One could argue that, even so, the incidence of design flaws is far less in bridges than in software. But this is not really an apples to apples comparison. The consequences of a bug in, say, a web browser are far less than a design flaw in a bridge. In non-safety critical software, economics is a more important factor in determining the level of quality of software. The fact is, most of the time, getting a product out before the competition does is economically more important than producing a quality product.

However, there are safety critical software systems, such as airplanes, medical therapy machines, spacecraft, etc. It is fair to compare these systems to bridges in terms of catastrophic defect rates. Let’s look at one area in particular, commercial aircraft. All commercial aircraft designed in the last 20 years rely heavily on software and, in fact, would be impossible to fly if massive software failures were to occur. Over the past 20 years, there have been roughly 50 incidents of computer-related malfunctions, but the number of fatal accidents directly attributed to software design faults is maybe two or three. This is about the same rate of fatal bridge accidents attributable to design faults. This seems to indicate that this gap between software design and traditional engineering is not so real.

The basic question seems to boil down to: are bridges complex systems?  I define a complex system as one that has bugs in it when shipped. It is clear that bridges still have that characteristic and, therefore, must be considered as complex systems from a design standpoint. The intriguing question is, given that they are complex systems, do they obey the laws of designing complex systems? I believe they do and will illustrate this by comparing two bugs, one a bridge design fault and another a well known software bug.

The London Millennium Footbridge was completed in 2000 as part of the millennium celebration. It was closed two days after it opened due to excessive sway when large numbers of people crossed the bridge. It took two year and millions of pounds to fix. The bridge design used the latest design techniques, including software simulation to verify the design. Sway is a normal characteristic of bridges. However, the designers failed to anticipate how people walking on the bridge would interact with the sway in a way to magnify it. The root cause of this problem is that, while the simulation model was probably sufficiently accurate, the environment, in this case, people walking on the bridge, was not accurate.

This is a very common syndrome in designing complex hardware systems. You simulate the chip thoroughly and then when you power it up in the lab, it doesn’t work in the real environment. I describe an example of this exact scenario in this post.

In conclusion, it does seem that bridges obey the laws of designing complex systems. The bad news is that the catastrophic failure rate of safety-critical software is of roughly the same magnitude as that of bridges. This means that we cannot expect significant improvements in the quality of software over the next thousand years or so. On the plus side, we no longer need to buy the excuse that software development is not as rigorous as “traditional” disciplines such as building bridges.

To find a Buddha, you have to find your nature.
– Bloodstream Sermon

We perceive the world as an abstraction of reality. When we look at a tree, we see a tree, we don’t see all the individal branches and leaves that make it up. Even if we viewed a tree that way, branches and leaves are still abstractions. Even if it were somehow possible to view a tree as all the individual molecules that make it up, that is still an abstraction. There is no escaping abstraction in how we relate to the world.

A monk asked Dongshan Shouchu, “What is Buddha?” Dongshan said, “Three pounds of flax.”
– Zen Koan

The core of Zen Buddhism is trying to see past the abstractions that our minds crave so desperately in order to make sense of reality. There are very few people who find enlightenment because of the power that abstraction holds over our minds. To the enlightened, those who see reality as it is, questions framed in terms of orthodox abstractions make no sense.

This is the fundamental issue that makes creating anything hard. What we create is real, but how we view it is through the lens of abstraction, and there is no escaping this. We generally believe that we conceive at a high level of abstraction and implement at a low(er) level of abstraction. We strive to increase the level of abstraction because we believe this is the way to improve productivity.

Conventional Wisdom

Productivity vs. Abstraction Level: Conventional Wisdom

This turns out not to be entirely true. The closest concrete representation of how we conceive of a design is the written specification. As reviled as it is, this document best represents our intentions at the level of abstraction that we conceive of the design. But, if we look at an average written spec., we find that it encompasses many levels of abstraction. We find:

  • textual descriptions that are written at a very high level of abstraction, “it’s a bus that has a processor, memory controller, and I/O controller sitting on it.”
    • structural, behavioral, data, and temporal abstraction.
  • Block diagrams of major subunits showing how they are connected.
    • unabstracted structure at a high level, structural abstraction of blocks, behavioral, data and temporal abstraction.
  • Equations, code snippets, gate-level diagrams.
    • basically no abstraction.
  • truth tables
    • structural abstraction only
  • waveforms
    • structural abstraction, behavioral abstraction, unabstracted data, time.

and many other levels of abstraction in between. What this indicates is that as we conceive of a design, we jump around to different levels of abstraction as we consider different aspects of the design. And this holds even if we try to move the level of abstraction up. A new abstraction level just adds to the set of abstraction levels that we can use. What this means is that there is a law of diminishing returns as we move up in abstraction.

Law of Diminishing Returns

Productivity vs. Abstraction: Law of Diminishing Returns

Even more importantly, productivity doesn’t increase monotonically as we raise the level of abstraction. Design is the process of translating from the abstractions that we conceive to those required by the implementation. The greater the gap between these, the lower our productivity in implementing the design.

The conventional wisdom that productivity increases as abstraction increases is based on analyzing just two points on the abstaction continuum. If we plot productivity vs. abstraction level, the only points that we really can count on to improve productivity are those that are close to the natural abstractions at which we conceive.

Accounting for Translation from Natural to Implementation Abstraction Level

Productivity vs. Abstraction: Accounting for Translation from Natural to Implementation Abstraction Level

Productivity falls off rapidly as the gap between implementation and conception abstraction levels increases. This is one of the reasons that it has been such a struggle to raise the level of abstraction in design. Finding the right fit that matches how we think at the highest levels of abstraction is exceedingly difficult, but is key if we want to continue to improve productivity by raising the level of abstraction at which we design.

In previous posts, I have tried to illustrate that writing complete, unambiguous specifications is hard, if not impossible. A solution that is proposed often, but never seems to take hold, is to write “executable” specifications. That is, rather than writing text-based specifications, write code that tools can then use to automate the process of producing the design. Today, specification is still done the same way it was twenty years ago, using (electronic) paper and pencil. Why is this? Why do we continue to write text-based specifications despite dramatic increases in complexity and the obvious need for more automation?

I gained some insight into this problem from research carried out by Kanna Shimizu, my colleague at Stanford. In bus protocols such as PCI, the protocol rules are spelled out in the spec. The traditional way of verifying PCI would be to design a PCI controller first, then verify that all the properties held for that controller. Kanna’s idea was to code the protocol rules into simple propositions that could be formally verified for consistency.  The advantage of this method is that you did not need a design in order to simply verify that the protocol itself had no problems. The types of errors that could be detected were things like conflicts between rules where, by one rule, a signal was supposed to be asserted at some time, while by another one, it was supposed to be deasserted. Suprisingly, a number of such inconsistencies were found in the PCI protocol using this approach.

When I first heard this result, my reaction was that it did not seem right. If there were that many bugs in the PCI protocol, it would not be possible to design any working hardware, yet PCI was a widely using protocol and there was no evidence that these inconsistencies had caused problems.

I didn’t think much about this until Kanna came to me one day with a case she had discovered while working on an Intel processor bus protocol, which happened to be the same one used on the HAL MCU design that I had worked on and, therefore, was familiar with. She had discovered a case where a bus agent could hang the bus forever under certain conditions. I was certain this could not happen, but after analyzing it, found that the protocol did indeed allow such a thing to happen. Again there was a disconnect between what I knew about the protocol and what the spec. actually was saying.

I didn’t think much about it until much later when I realized what the problem was. You would have to go out of your way to design an agent to do this. In fact, it would have to be malicious. Even though the spec. had holes, there is the intent to build a bus that communicates data. This intent was missing from the set of properties being verified. The intent is actually written in the spec., but not in a way that is easy to translate to code. It says, simply, “It’s a bus with the following rules.” The rules were what Kanna verified. While it was easy to code the rules into properties that could automatically verified, coding the requirement “it’s a bus” is consderably more difficult. That one sentence corresponds to potentially thousands of lines of code.

When claims are made about executable specs being better in some: either being more concise, easier to write, less ambiguous, proponents usually point to these easy to code protocol rules, while neglecting the difficulty of translating inherently concise specifications like “it’s a bus”. I believe this is one of the main reasons that executable specifications have not caught on as a better way of specifying complex designs. This will continue into the future, specifications will continue to be text-based because there is no solution on the horizon to this problem.

In my previus post on abstraction, we were left with the question of what makes a valid abstraction. For example, suppose we have a gate-level design of an adder. If we write:

    a = b * c;

we can see that this is a higher level of abstraction, but we would not generally consider it a valid abstraction of an adder. A valid abstraction would be:

    a = b + c;

In general, functions are not as simple as adders and multipliers. We need to define some way of determining whether a design is a valid abstraction of another design.

A valid abstraction is one that preserves some property of the original design.

In RTL, the property of interest is that the abstract function should produce the same output for all inputs, i.e. functionality is preserved. What abstractions are being used in RTL? Let’s look at our adder example. There is structural abstraction because we have eliminated all gates. There is data abstraction from bits to bit vectors. There is temporal abstraction if you consider that the gates have non-zero delays. There is no behavioral abstraction because values are specified for all possible input values. This is a valid abstraction if all output values of the abstraction are the same as the non-abstracted version.

Abstraction works like a less-than-or-equal (<=) operator. Suppose you have designs A and B. If A is an abstraction of B, we can write A <= B. Suppose also that B is an abstraction of C, B <= C. We know that if A <= B and B <= C, then A <= C. This also holds for abstractions. Since abstraction is the hiding of irrelevant detail, you can think of the less than relation as meaning “less detailed than”.

Suppose you have designs D, E, and F, with D<=E and D<=F. We know D is an abstraction of E and F, but what do we know about the relative abstraction levels between E and F? We cannot consider one as being an abstraction of the other even though they are equivalent in some sense. This type of relationship is important in design where D is a specification and E and F are different implementations of the same specification.

The flip side of this is: suppose we have designs P, Q, and R, with P<=R and Q<=R. In other words, P and Q are different abstractions of R. Again there is nothing we can say about the relative abstraction levels between P and Q. This relationship is important in verification where P and Q are different abstract models of the design R.

One last note: it is generally an intractable problem to prove that an abstraction is valid. If you are familiar with equivalence checking, this is basically a method to prove that an abstraction is valid.

Abstraction: the suppression of irrelevant detail

Abstraction is the single most important tool in designing complex systems. There is simply no way to design a million lines of code, whether it be hardware or software, without using multiple levels of abstraction. But, what exactly is abstraction? Most designers know intuitively that, for example, a high-level programming language ,such as C, is a higher level of abstraction than assembly language. Equivalently, in hardware, RTL is a higher level abstraction than gate-level. However, few designers understand the theoretical basis for abstraction. If we believe that the solution to designing ever more complex systems is higher levels of abstraction, then it is important to understand the basic theory of what makes one description of a design more or less abstract than another.

There are four types of abstraction that are used in building hardware/software systems:

  • structural
  • behavioral
  • data
  • temporal

Structural Abstraction

Structure refers to the concrete objects that make up a system and their composition For example, the concrete objects that make up a chip are gates. If we write at the RTL level of abstraction:

    a = b + c;

this is describing an adder, but the details of all the gates and their connections is suppressed because they are not relevant at this level of description. In software, the concrete objects being hidden are the CPU registers, program counter, stack pointer, etc. For example, in a high-level language, a function call looks like:

    foo(a,b,c);

The equivalent machine-level code will have instructions to push and pop operands and jump to the specified subroutine. The high-level language hides these irrelevant details.

In general, structural abstraction means specifying functions in terms of inputs and outputs only. Structural abstraction is the most fundamental type of abstraction used in design. It is what enables a designer to enter large designs.

Behavioral Abstraction

Abstracting behavior means not specifying what should happen for certain inputs and/or states. Behavioral abstraction can really only be applied to functions that have been structurally abstracted. Structural abstraction means that a function is specified by a table mapping inputs to outputs. Behavioral abstraction means that the table is not completely filled in.

Behavioral abstraction is not used in design, but is extremely useful, in fact, necessary, in verification. Verification engineers instinctively use behavioral abstraction without even realizing it. A verification environment consists of two parts: a generator that generates input stimulus, and a checker, which checks that the output is correct. It is very common for checkers not to be able to check the output for all possible input values. For example, it is common to find code such as:

    if (response_received)
        if (response_data != expected_data)
           print("ERROR");

The checker only specifies the correct behavior if a response is received. It says nothing about the correct behavior if no response is received.

A directed test is an extreme example of behavioral abstraction. Suppose I write the following directed test for an adder:

    a = 2;
    b = 2;
    dut_adder(out,a,b);
    if (out != 4)
       print("ERROR");

The checker is the last two lines, but it only specifies the output for inputs, a=2, b=2, and says nothing about any other input values.

Data Abstraction

Data abstraction is a mapping from a lower-level type to a higher-level type. The most obvious data abstraction, which is common to both hardware and software, is the mapping of an N-bit vector onto the set of integers. Other data abstractions exist. In hardware, a binary digit is an abstraction of the analog values that exist on a signal. In software, a struct is an abstraction of its individual members.

An interesting fact about data abstraction is that the single most important abstraction, from bit vector to integer, is not actually a valid abstraction. When we treat values as integers, we expect that they obey the rules or arithmetic, however, fixed with bit vectors do not, specifically when operations overflow. To avoid this, a bit width is chosen such that no overflow is possible, or special overflow handling is done.

Temporal Abstraction

This last abstraction type really only applies to hardware. Temporal abstraction means ignoring how long it takes to perform a function. A simple example of this is the zero-delay gate model often used in gate-level simulations. RTL also assumes all combinational operations take zero time.

It is also possible to abstract cycles. For example, a pipelined processor requires several cycles to complete an operation. In verification, it is common to create an unpipelined model of the processor that completes all operations in one cycle. At the end of a sequence of operations, the architecturally visible state of the two models should be the same. This is useful because an unpipelined model is usually much simpler to write than a pipelined one.

The four abstractions described above comprise a basis for the majority of abstractions used in design and verification. That is, any abstraction we are likely to encounter is some combination of the above abstractions. However, we are still left with the question of what is a valid abstraction. I will defer answering this until the next post.

(note: this discussion is based on the paper, “Abstraction Mechanisms for Hardware Verification” by Tom Melham. There is slightly more high-level description of these abstractions in this paper along with a lot of boring technical detail that you probably want to skip.)

Achilles: I must confess, Mr. T, you got me with that Zeno nonsense. But, trick me no more you will! I have decided to build a clock. A most perfect clock that will measure the exact moment that I pass you in our race.

Tortoise: A great idea, Achilles, but how will you know your clock has the correct time?

Achilles: Simple, I will compare it against the best clock available.

Tortoise: But, how do you that clock is correct?

Achilles: Hmm, I hadn’t thought about that.

Tortoise: I doubt that you can build a clock that will ever tell the right time, because I don’t think you can even describe what a correctly functioning clock looks like.

Achilles: Of course I can! Everybody knows how to tell if a clock is correct or not.

Tortoise: OK, then. Why don’t you give me a specification for your clock and I will build it for you.

Achilles: OK. The clock must have a round face with 60 evenly distributed marks around the edge. It must have the numerals 1-12 imprinted around the circumference evenly spaced every five tick marks with 12 at the top. It must have a big hand and a little hand. The big hand points to the hour and the little hand points to the minute. The hands must point to the correct time at all times.

Tortoise left and returned a while later with clock in hand. Achilles examined it.

Achilles: The hands are pointing to exactly 12PM, but it is now 4PM. Furthermore, there is nothing inside this clock. Its hands don’t move! This doesn’t even come close to meeting the requirements of my specification. Mr. T, I am disappointed. I thought you would do much better.

Tortoise: Not so fast, my friend. I claim my clock is more accurate than any clock you could dream up based on the specification you gave me.

Achilles: I don’t need to dream too hard to think up a clock that actually has hands that, you know, move.

Tortoise: OK, suppose I built a clock that had springs, gears, and an escapement, you know, all the things that clocks have. Let’s say that my clock is accurate to one minute a day. Furthermore, let’s say I set it to the exact correct time based on some reference clock that we agree upon.

Achilles: Now you are talking.

Tortoise: But, my dear Achilles, that clock would be far less accurate than the clock I just built you!

Achilles: How is that possible?

Tortoise: You are right about the clock I built. It doesn’t move. But, it is exactly right twice every day, no more, no less.

Achilles: I see that, but twice per day is nothing to write home about.

Tortoise:Now, let’s analyze your clock. It is accurate at the time I set it, but it immediately becomes inaccurate because it loses one minute per day. It will only be accurate every 720 days. So you see, my clock is actually far more accurate.

Achilles: Ha, Ha. Very funny. You know that all you have to do is make my clock more accurate.

Tortoise: OK, let’s say it only loses one second per day instead of one minute. Then, it will only be accurate one every 43,200 days. I would rather have the clock that is one minute slow.

Achilles: OK, then make it more accurate.

Tortoise: But, this is not possible. No matter how accurate you make it, it is not possible to be exactly synchronous with the reference. You are doomed to failure. And, paradoxically, the closer you get, the less accurate your clock is!

Achilles: I think you are trying to get Zeno back into this conversation.

Tortoise: No, I am just trying to point out that your clock specification is very poor.

Achilles: OK, then how do we fix it?

Tortoise: How do you think we should fix it?

Achilles: We amend it to say that the clock can be off by plus or minus one minute with respect to the correct time.

Tortoise: But my dear Achilles, this just delays the inevitable. Your clock will be correct for the first day by that measure, but will not be accurate again for another 718 days. My clock will be accurate for four minutes every day. My clock is still twice as accurate as yours.

Achilles: Mr. T, you are missing the point. Yes, my clock may become inaccurate after a day, but the user can always reset it to the correct time.

Tortoise: Ah, but that was not part of the specification. Do you want to amend your specification yet again?

Achilles: Yes, I will amend it to say that the user can reset the watch whenever she/he wants.

Tortoise: OK, I will produce a new clock that meets your specification.

Tortoise left and, after a time, came back with a new clock.

Achilles: this clock is no better than your previous clock, Mr. T. It still doesn’t move and there is nothing inside it to make it move.

Tortoise: But, it meets your specification. My first clock had the hands glued to the time 12 O’clock. They could not be moved. This clock has hands that can be moved and therefore, be reset at any time. And, it still has the advantage of being more accurate than your clock when the user decides to not bother with resetting it. On top of that, if the user is not happy with the time displayed by my clock, they can simply set it to to any time they desire.

Achilles: You are trying to use technicalities to avoid admitting you are wrong. You know as well as I do what a clock should look like and how it should behave. The specification I gave you is good enough.

Tortoise: OK, I’ll give you one last chance. If it is that simple, you should be able to give me a specification that gives me no choice but to give you what you want. If I don’t have to think too hard about how to beat you, your specification can’t have been that good and you must concede the point. Do you agree?

Achilles: Alright, I’ll give it one last shot. I am going to add one more condition to my specification. Assume we have a reference clock. It doesn’t matter how accurate it is, as long as its accuracy is acceptable to me. And I am not going to specify that because it doesn’t matter. What I am going to specify is that the clock that you give me must have its hands move at the same rate as the reference clock plus or minus 1/60 of the reference clock’s rate. That is, if we assume that the reference clock’s big hand turns exactly 360 degrees in one hour, your clock’s big hand can be off by one minute per hour. Similarly, the hour hand rate must match the reference clock’s big hand rate with the same accuracy.

Tortoise: Whew, that was quite a mouthful!

Achilles: I will concede that specifying things completely is harder than I thought, but I think this is finally bulletproof. You will have to build me a clock with hands that move, so I am not willing to concede yet the main point that I cannot guarantee that I get the clock I want.

At that, Tortoise left yet again, presently to return bearing two clocks.

Tortoise: First, I will tell you that you used one of the most common copouts when it comes to specification. You specified how it should work rather than its behaivor. The function of a clock is to tell time, not have hands that go round in circles. That is implementation details. However, having said that, here is my clock.

Achilles: It is the same clock! The hands still don’t move. You violated my specification!

Tortoise: No, I didn’t. You specified the hands should move at the same rate as a reference clock. Here is my reference clock. It is an atomic clock with a digital display. It has no hands. Therefore, I am not violating your spec. Yet, atomic clocks are the most accurate in the world, so I think even you would concede that it is accurate enough to meet your specification.

Achilles: OK, I give up. I concede. There appears to be no way to completely specify even something as simple as a clock in a bulletproof way.

Tortoise: You are finally seeing the light, my friend. I suggest you stick to things you are good at, like foot races with slow moving creatures like me.

[with apologies to Lewis Carroll and Douglas Hofstader]

I am reading Evan Harris Walker’s “The Physics of Consciousness: The Quantum Mind and the Meaning of Life”. It is a thought provoking book, but I am not convinced by his final arguments.

One of the most interesting things I learned from this book is how much quantum-level effects affect our macro world. In particular, the effect of the Heisenberg Uncertainty Principle is much larger than I would have expected. Walker talks about the ability of predicting the roll of a die. Suppose you had an exact model of a die down to the atom and knew the exact vector of the roll. Classical physics says that the outcome can be be predicted with 100% certainty. However, quantum mechanics says that at each bounce, there is a small probability introduced in the direction of the bounce due to Heisenberg uncertainty. The uncertainty at each bounce is extremely small, but after only 50 bounces, the outcome is completely unpredictable!

I have found in my career that specifications have the same property. No matter how bulletproof, ironclad, or complete they seem, there is always some uncertainty that creeps in that makes it impossible to know what is correct. There is a lot to say on this subject, but I am going to start with a story from my own experience.

The first time I had to deal with an external spec. was when I was working at Amdahl and had to design a controller for 3278-type terminals, which were used as consoles for the mainframe we were designing. The 3278 was the standard terminal used with IBM mainframes and were as ubiquitous then as PCs are today. The terminal interfaced to its controller through a single coax cable. The protocol was half-duplex, in which the controller initiated all actions, either writing the display buffer or polling the keyboard.

IBM 3278 terminal

IBM had a reputation for writing stellar specifications. The 3278 serial protocol was no exception. I designed a controller that was mostly hard-wired. We were concerned about getting it correct because any error almost certainly meant a respin. I designed a state machine that would send a one line display stream automatically upon reset. The idea was that this would be a fail-safe way of diagnosing connectivity problems. We taped-out the chips, got them back, and powered up the controller for the first time. The diagnostic display showed up on the terminal on the first try! But, when we pressed a key on the keyboard, nothing happened. The controller did not receive any valid key press responses to its polls. We hooked an oscilloscope up to the coax and looked at the waveforms. They matched the spec. precisely. We could see the display stream, the poll commands, and see that the terminal was responing to the polls. It just wasn’t sending any valid data back in its responses.

We couldn’t figure out what was going on, so we brought in an AE from the terminal manufacturer (there were many 3278 clone manufacturers at the time). The AE looked at the waveforms and immediately said, “I’ve never seen a controller poll that fast.” It turns out that the standard IBM controller used with 3278s, the 3274, polled each terminal only once every 2mS. Our controller was only designed to poll a single terminal and, because it was hard-wired, the delay between polls was less than 1uS. The terminals were typically implemented using microcontrollers and their polling loop was in software so there was no way they could keep up with our polling rate.

We fixed this problem in the short run by trying every 3278 compatible terminal type until we found one that could keep up with our controller. It turns out that vendors use the 3274 controller as a spec. for the performance of their terminals. The IBM spec. does not mention performance at all; it says nothing about how often back-to-back poll commands can be issued by the controller. Nobody working on this project had any experience dealing with these type of terminals before, so we didn’t know this.

The bottom line was, we were following the spec. to the letter, but the system did not work. So, lesson 1 in using specifications is:

there are often additional constraints outside the spec. that only experts know about.

I am a big fan of Fred Brooks’ “The Mythical Man-Month: Essays on Software Engineering”. Brooks was leader of one of the first large software projects. Along the way, he found that a lot of the conventional wisdom about software engineering was wrong, most famously coming up with the idea that adding manpower to a late project makes it later.

I have also found that a lot of the conventional wisdom about the causes of verification difficulties is wrong. So, in designing this blog, I decided to model it after the Mythical Man-Month. The essence of this style is:

  • short, easy-to-read essays on a single topic.
  • timelessness – focus on overarching issues, not on how to solve specific problems with specific programming languages.
  • back it up with real data whenever possible.

In some senses, this blog will attempt to fill in some obvious holes in the Mythical Man-Month. Brooks states that project effort can be roughly apportioned as:

  • 1/3 planning (specification)
  • 1/6 coding
  • 1/2 verification

but then proceeds to talk mostly about planning and coding and very little about verification. I think this blog will cover each of these areas in rough proportion to these numbers, so most of my posts will be on verification, but a fair number will cover specification and some on particular design issues.

Brooks’ work is more than 30 years old, so it is worth re-examining some of his conclusions to see if they still hold up as design complexity has increased with time. One of the areas of contention is the percentage of time spent in verification. Brooks’ claim of verification taking 50% of the effort applied to large software projects. Today, there are claims that hardware verification is taking 70% of the effort. EDA vendors often point to this “growth” as proof that verification is becoming the bottleneck.

But, is this the real story? Software verification is mostly done by the designers. In this kind of environment, verification consumes roughly 50% of the total effort. 20 years ago, Hardware verification was also roughly 50% of the effort because it was mostly the designers doing the verification. The shift to pre-silicon verifcation that came about due to the advent of HDLs and synthesis enabled the separation of verification and design. But, separate verification is not as efficient as having the designer do the verification. So, now verification is 70% of the effort instead of 50%. So, rather than growing from 50% to 70% of more, it was more of a one time jump due to the shift in methodology. But, whether it is 50% or 70%, verification is the largest single piece of the overall design effort.

I wrote an article addressing this subject in more detail titled, “Leveraging Design Insight for Intelligent Verification Methodologies”. You can download it from the Nusym website