Skip navigation

It looks like this was a timely post. Just a few days after my last post, Harry  (theAsicGuy) Gries posted an entry speculating that having thousands of computers available would revolutionize EDA algorithms. I think with an understanding of how computational complexity affects EDA algorithms, he might not be making these claims.

Let’s start by looking at what he said:

One of the key benefits of the cloud is that anyone can have affordable access to 1000 CPUs if they want it. If that is the case, what sorts of new approaches could be implemented by the EDA tools in addressing design challenges? Could we implement place and route on 1000 CPUs and have it finish in an hour on a 100M gate design? Could we partition formal verification problems into smaller problems and solve what was formerly the unsolvable? Could we run lots more simulations to find the one key bug that will kill our chip? The cloud opens up a whole new set of possibilities.

Harry is wondering how parallelism can help EDA algorithms. The goal of parallelism is mainly to allow completing a given task faster. Problems can be broken into two classes: those that are embarrassingly parallel and all others.

Embarrassingly parallel problems are easy to parallelize because problem instances are completely independent. In EDA, there are a number of embarrassingly parallel problems:

  • regression testing – you have thousands of tests. They are all independent. You can run them on as many machines as are available to complete the regression as quickly as possible.
  • synthesis – different blocks can be synthesized in parallel.
  • spice – run different corners in parallel.
  • placement – place blocks in parallel.

This is done routinely today. Even small companies have dozens of servers, if not hundreds to do these, and other,  embarrassingly parallel tasks.

But the real question Harry is asking is: can having thousands of CPUs solve the other kind of parallelism. Can a parallel simulator run a test a thousand times faster? Can we increase the capacity of formal verification by a factor of a thousand? Can we reduce place-and-route time down to minutes?

Sorry, Harry, but the answer to these questions is: no. Suffice it to say that, if it could have been done, it would have been done already. It is not from lack of trying that EDA algorithms are not parallelized today. It is not from lack of CPUs either.There generally are no gains running parallelized algorithms on even 10 CPUS, far less 100 or 1000. If any EDA company could design a parallel algorithm that gave 10X speedup running on, say 20 CPUs, they would have a huge advantage over the competition.

In short, if it was easy to get large speedups just by throwing more CPUs at the problem, it would be embarrassing that nobody has done that yet.



  1. Hi Chris,

    Thanks for commenting on my blog post concerning 1000 CPUs. The post was intended to get people brainstorming and one of the brainstorms was to solve problems otherwise unsolvable by utilizing parallelism. I believe that utilizing parallelism effectively is not as unreasonable as you suggest:

    1) My first job out of school was working on a vector processor that had 6 parallel execution units. Indeed, scheduling the processor was a pain, but utilization of over 75% was not uncommon for signal processing type algorithms. Today, most specialized signal processors exploit this type of parallelism effectively.

    2) Google search is the best example of parallelism reducing turnaround time, searching the entire web in seconds.

    3) VCS just announced a multicore version. The speedup is modest, but for time-to-market critical products it can be critical.

    4) I know of one university effort to apply massively parallel processing to formal verification. I am told it is effective but I have no specific data.

    Of course, expecting N speedpup using N CPUs is not possible due to the communication overhead and we both know that is well documented. EDA tool algorithms generally are not symmetric and generally not nicely partitionable. Still, I think it does not hurt to keep thinking outside the box.

    Getting this speedup is not “easy”, but it is doable. Until now, only the larger companies could have the server farms to run these types of algorithms. Soon, anyone will be able to get 1000 CPUs.


    • Harry,

      I think one of the things you are not thinking about is that to do the things you are talking about, it is not enough to just have 1000 CPUs. You also need a high performance interconnect to realize any kind of parallelization gains. But, the cloud computing value proposition assumes commodity hardware and software. The interconnect between processors is basically TCP, which means high latency (low performance).

      Think of it this way: suppose instead of putting those 1000 CPUs in one room, they are spread throughout the world, say one CPU in every city. Now, do you think it is possible to parallelize an algorithm across 1000 CPUs and see significant speedup? Well, the time it takes for two CPUs in the same room to communicate using a commodity hardware/software stack is basically the same as if they were in different cities.

      What this means is that computing clouds are really designed to run embarrassingly parallel applications only.

      And it gets worse. The application’s interface to the cloud is not direct, but through LSF (or some other load balancer) queues. Suppose you had an application that fires off a bunch of jobs, gathers the results and analyzes them, and them fires off more jobs based on the analysis. This is still embarrassingly parallel, but not with LSF. The problem is that if even one job gets stuck in an LSF queue, all the potential speedup is wiped out.

      When using LSF queues, there are limitations even on the embarrassing parallel applications. Basically, the only thing that will work is fire-and-forget type applications, where you just fire off a bunch of jobs and then look at the results later.

      I think the only thing in EDA that meets this requirement is simulation jobs. There may be some usefulness in running an order of magnitude more simulation than you can today. But, then you would have to overcome another problem, namely the cost of simulation licenses may be too prohibitive. It is possible that the cloud computing revolution won’t come about until somebody fundamentlally changes the EDA licensing model.


  2. To speed up simulations it is possible that a hardware emulator, FPGA hardware, GPUs, or even a number of processors with a high-speed interconnect could be used in a specialized verification data center. I think this could be used to speed up simulations and reduce costs by amortizing the cost across several companies.

    At one large chip maker where I used to work we used a Quickturn emulator which existed in Germany from our office in North America and this was able to significantly reduce simulation times.

    • Jeremy,

      Specialized hardware, GPUs. multicore processors are all examples of low latency multiprocessors. I agree that these are the only types of parallel processors that will give any significant speedups on EDA algorithms. But these types of solutions are expensive and specialized. Harry is envisioning using CPUs owned by companies such as Amazon or Google that are very cheap to deploy.

      There is no clear problem that benefits from having large numbers of commodity CPUs available. But, I can see value in timesharing a Quckturn box, for example. Small hardware companies that can’t afford to buy an emulator could rent time on one. This might enable them to simulate boot their operating system, as one example, something they couldn’t do otherwise.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: