Wherein Mike Warot describes a novel approach to computing which follows George Gilders call to waste transistors.

Wednesday, June 30, 2010

Unrolling programs instead of loops

A programming trick that used to work was to unroll loops, to prevent the pipeline penalties that occur when you branch. It worked well for a while. The bitgrid is based on the idea of unrolling the whole friggin program. Instead of making a list with less branches... why not distribute each and every instruction of a program out into a physical processing instance?
To make it feasible in hardware, use the simplest computing grid feasible, a grid cells (each cell having 4 inputs (one bit from each neighbor), 4 outputs (one bit TO each neighbor) and a 16 entry look up table), each of which is a pitiful unit of computation by itself.... in a grid size to fit the application at hand, they can execute all of the instructions necessary to compute a result simultaneously.
Communication isn't shared, because every input and output only has 1 place to go or come from. It only has to go to the next cell... so there are no long communication lines to worry about. Each cell can function as a router and logic element at the same time.
Programming is a matter of setting the values in the lookup tables, which could be made of static RAM cells.

Monday, June 28, 2010

Introducing Sim02


Now that Sim01 is capable of demonstrating how a single cell works in terms of bits, it's time to start building a grid simulator. This is my first effort at it. Using Sim01 to figure out the hex codes (a new feature I added to support this)... I was able to determine the proper codes to do a pass through... and make that the default for this 2 cell array.  

Here you see an input from the West side of the array propagated all the way through to the east side. Simultaneously an input from the North of Cell 01 (right cell) is propagated to the bottom.

It's all prototype code, but it is functional. I intend to scale this up to a simulator capable of simulating an arbitrary size grid. Then I'll add some input and output functionality to allow the processing of real data with a simulated bitgrid.

Thursday, June 24, 2010

Introducing Bitgrid-Sim version 0.03

I've created a basic simulator for 1 cell of a bitgrid. The simulator currently is a Windows only application, but it's should be fairly easy to port to Linux should it be required later. Here's the first public screen shot:


The simulation shown is a pass through.... in this configuration the bitgrid cell acts as an expensive set of wires, passing signals straight through. It's useful for filler between active cells.

In this simulator you can edit the program (showing in the 4 areas of check boxes) and simulate inputs The checkboxes corresponding to the current set of inputs are highlighted red as a programming aid. You can save your work as well.

Users can load a few examples, or create there own.  In addition to creating a Google Code project site for it, I've also packaged up the Source, Executable and Examples and made it available as a zip file. It's small, and you just unzip it, and it should be ready to use.

Wednesday, June 23, 2010

Beyond the Petaflop - How Bitgrid could meet DARPA's needs ahead of schedule.

I found this story about a new DARPA request for proposals via Digg. The challenge is to build a computer system which can process 10^18 floating point operations per second. Here's the back of the envelope calculation I posted in response:


I think it could be done in 3 years if it didn't have to fit in one rack, it could have already been done, had there not been such a heavy emphasis on the Von Neuman architecture for the past 40 years.

Imagine a bit slice processor....with perhaps 1000 transistors. Put those in the same die in a 1000x1000 grid, this would require    10^9 transistors. You could clock them at the nice sane clock speed of 1 ghz. That would fit in a die the same size as a current generation cpu.  That's 10^6 slices times 10^9 cycles/second, or 10^15 bit computes per second, on a practical size die, with current technology.

Even if you lost 99.9% of the compute efficiency in shuffling bits around to do a floating point operation, you could still do 10^12 Floating point operations per second, on a prototype chip... today.

The chip would be easy to test, because all of the bit slices would be identical, so the testing of each part could be done in parallel... perhaps 1 second to test time per die. (Testing is a big part of cost when it comes to chips)   The chip would cost somewhere around $10 each.

If you allow me to continue with my estimate of 10^12 flops per chip, and it were possible to build a grid of 1000x1000 of them... that takes you to the magic 10^18 Flops that DARPA wants, for a cost of about $10,000,000.

10^18 operations per second, with 10^15 transistors, clocked at 1Ghz.  Feasible... yes... but it does require you to give up sequential programming, and think in terms of graph flow.

It's called bitgrid, I thought of it around 1981.... and I've written some of this up at  http://bitgrid.blogspot.com

About Me

My photo
I fix things, I take pictures, I write, and marvel at the joy of life. I'm trying to leave the world in better condition than when I found it.