Skip navigation

I am reading Evan Harris Walker’s “The Physics of Consciousness: The Quantum Mind and the Meaning of Life”. It is a thought provoking book, but I am not convinced by his final arguments.

One of the most interesting things I learned from this book is how much quantum-level effects affect our macro world. In particular, the effect of the Heisenberg Uncertainty Principle is much larger than I would have expected. Walker talks about the ability of predicting the roll of a die. Suppose you had an exact model of a die down to the atom and knew the exact vector of the roll. Classical physics says that the outcome can be be predicted with 100% certainty. However, quantum mechanics says that at each bounce, there is a small probability introduced in the direction of the bounce due to Heisenberg uncertainty. The uncertainty at each bounce is extremely small, but after only 50 bounces, the outcome is completely unpredictable!

I have found in my career that specifications have the same property. No matter how bulletproof, ironclad, or complete they seem, there is always some uncertainty that creeps in that makes it impossible to know what is correct. There is a lot to say on this subject, but I am going to start with a story from my own experience.

The first time I had to deal with an external spec. was when I was working at Amdahl and had to design a controller for 3278-type terminals, which were used as consoles for the mainframe we were designing. The 3278 was the standard terminal used with IBM mainframes and were as ubiquitous then as PCs are today. The terminal interfaced to its controller through a single coax cable. The protocol was half-duplex, in which the controller initiated all actions, either writing the display buffer or polling the keyboard.

IBM 3278 terminal

IBM had a reputation for writing stellar specifications. The 3278 serial protocol was no exception. I designed a controller that was mostly hard-wired. We were concerned about getting it correct because any error almost certainly meant a respin. I designed a state machine that would send a one line display stream automatically upon reset. The idea was that this would be a fail-safe way of diagnosing connectivity problems. We taped-out the chips, got them back, and powered up the controller for the first time. The diagnostic display showed up on the terminal on the first try! But, when we pressed a key on the keyboard, nothing happened. The controller did not receive any valid key press responses to its polls. We hooked an oscilloscope up to the coax and looked at the waveforms. They matched the spec. precisely. We could see the display stream, the poll commands, and see that the terminal was responing to the polls. It just wasn’t sending any valid data back in its responses.

We couldn’t figure out what was going on, so we brought in an AE from the terminal manufacturer (there were many 3278 clone manufacturers at the time). The AE looked at the waveforms and immediately said, “I’ve never seen a controller poll that fast.” It turns out that the standard IBM controller used with 3278s, the 3274, polled each terminal only once every 2mS. Our controller was only designed to poll a single terminal and, because it was hard-wired, the delay between polls was less than 1uS. The terminals were typically implemented using microcontrollers and their polling loop was in software so there was no way they could keep up with our polling rate.

We fixed this problem in the short run by trying every 3278 compatible terminal type until we found one that could keep up with our controller. It turns out that vendors use the 3274 controller as a spec. for the performance of their terminals. The IBM spec. does not mention performance at all; it says nothing about how often back-to-back poll commands can be issued by the controller. Nobody working on this project had any experience dealing with these type of terminals before, so we didn’t know this.

The bottom line was, we were following the spec. to the letter, but the system did not work. So, lesson 1 in using specifications is:

there are often additional constraints outside the spec. that only experts know about.

Advertisements

3 Comments

  1. Hi Chris,

    Interesting and thought provoking post. Brings back memories of my summer jobs using a 3270 at IBM in East Fishkill and at the Almaden Research Center in San Jose. You know I crashed the network on my first day at work emailing the entire games directory to a friend of mine? If I wasn’t a summer intern I probably would have been fired 🙂

    Those days were quite a while ago. Today, there would be a BFM model for the 3270 which would have likely mimicked the timing sensitive behavior.

    Still, your point is more general and well taken. If the requirement is not in the spec, then it won’t necessarily get verified. And with chips getting more complex, the likelihood of missing requirements increases. I think the simpler requirements are actually more likely to be missed because everyone is focused on the hard stuff, the corner cases, so the simple stuff slips through. Unfortunately, the only way to address this seems to be the good ole fashioned review process.

    harry the ASIC guy

  2. Hi Harry,

    thanks for comment, the first one for my blog!

    In the second version of this chip, we decided to add a delay counter in the polling loop to slow down the polls. We also decided to simulate the chip this time (the first version taped out without any simulation, if you can believe that). The spec. says that there should be five idle cycles at the beginning of a transmission to quiesce the coax. We noticed that the 3274 controller issued many more than five idle cycles and the terminals basically didn’t care how many idle cycles were sent. So we wrote a BFM that mimiced this exact behavior. It accepted any number of idle cycles.

    The design was supposed to delay for some time then transmit five idle cycles as specified. However, there was a bug in the chip that caused it to transmit idle cycles all during the delay period. However, we did not detect this because our BFM accepted any number of idle cycles. We didn’t discover it until we got the chip back and looked at the waveforms on the oscilloscope. At the same time, it worked anyway because the terminals don’t care how many idle cycles are sent. I don’t think we ever fixed this bug.

    I’d forgotten this part until you reminded me by mentioning using a BFM.

    –chris

  3. Chris,

    I enjoyed this. I guess the question is what is meant by “specification” and “requirements”. (I was reading your post on the PCI specification and thinking the same thing, before I got to this post — I’m bouncing around right now but need to read top to bottom).

    Whether you call it “requirements” or “specification”, the full specification comprises the official specification as well as any other requirements needed to properly meet the target application.

    What’s the goal of the verification engineer? It’s not to ensure that a chip or piece of IP meets a design specification — it’s to ensure that the chip will meet the requirements of the target application. This means determining whether the specification itself is complete or right — and, often, figuring out what the real requirements are.

    My last chip company designed a storage networking chip with a Fibre Channel controller on it. Achieving interoperability with other Fibre Channel equipment is very difficult. The spec is ambiguous — and there are a lot of quirks in implementation. We couldn’t tape out without FPGA prototyping against many real Fibre Channel systems — and going to UNH’s Interoperability Lab multiple times. We could have easily released a chip that met all the requirements of the FC specification — but, ultimately, this would have been insufficient.


One Trackback/Pingback

  1. […] This is a very common syndrome in designing complex hardware systems. You simulate the chip thoroughly and then when you power it up in the lab, it doesn’t work in the real environment. I describe an example of this exact scenario in this post. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: