Wherein Mike Warot describes a novel approach to computing which follows George Gilders call to waste transistors.

Thursday, March 10, 2005

The bitgrid story

The Von Neuman architecture - Our current standard

It all started with the realization that most of the transistors in a modern computer are just sitting idle 99% of the time. Everyone focuses on the CPU, with a single instruction stream being read and executed at a blinding speed. Everything optimizes the CPU, thus RAM is designed to be able to read a cell quickly, when needed, but is idle otherwise. This has advantages, including until very recently, the lack of heatsink requirements for memory chips. The laptop I write this entry on has 512 megabytes of RAM, which means there are 4 Billion transistors which are just sitting there, waiting for their turn in the computing process. Meanwhile most of the 30 million or so transistors in the CPU are also dedicated to the "cache", which is a faster RAM, also present to help optimise the flow of data to and from the CPU.

At the heart of everything in the CPU, there are registers (another form of RAM), and the ALU (Arithmetic Logic Unit), Execution pipeline, instruction decoders, and assorted hardware. There are also specialized hardware optimizations such as "speculative execution" which literally computes advances the program along both forks in the instruction flow, until well after it is known which one is the proper path, in order to know the correct answer that much faster.

I can't emphasize enough how everything is optimized to get data to the CPU, through the CPU, and back out from the CPU. Its all about getting flow through the core as quickly as possible. The problem is that the core can only get so fast... because it always has to wait to talk to RAM.

The next logical step has been known for a very long time (before 1970?), parallel processing. I'm surpised its taken as long as it has for the dual core chips to show up. I expected them back in the early 1990s. They will help performance, and push things along because there will be multiple tasks happening while the CPU waits for the next set of data. Its not optimal though, because the programming tools and operating systems all are still optimized for the single instruction flow, but this shift has been long anticipated, and much research has been done, so there are appreciable benefits to this "multi-core" approach. (Intel calls it Hyperthreading when done by certain Pentium 4 chips)

This is the future path most computing hardware will take. The model is mature, and universally understood. It entails a known set of trade-offs which I have summarized above.

The Bit Grid

Bitgrid is a radically different architecture. I see the first uses as a coprocessor, for doing a fixed task at very high speed. Tasks which might otherwise be dedicated to a custom hardware processor, such as signal processing, graphics, compression, etc. I've always figured it would make a really good radar signal processor, or text search engine.

The concept is pretty simple, but the implications are radical. The basic computing element in a BitGrid is a cell, which has 4 neighbors, up, left, down, right. For each of these neigbors, there is an input, and output, and a program. So, in summary a cell has
  • 4 single bit inputs, one from each neighbor
  • 4 single bit outputs, one to each neighbor
  • 4 programs, one for each output (16 bits each)
The flow is simple:
  • combine the 4 inputs from the neigbors into a nibble - n
  • select bit n from each "program" i.e. up(n), left(n), down(n), right(n)
  • output that bit to each neighbor
The key is that once the programs are loaded, the bitgrid because a hardware virtualization of the logic embedded in the program. A very fast virtualization. All parts of the design run at the same time, there is no program counter, no need to optimize around a specific set of registers, etc. All of the cells can be in used if the programmer is efficient. (I've put this model into a spreadsheet that works well - though it was interesting getting around the circular reference restrictions once I tried to make a grid of cells)


Interesting properties of the bitgrid.
Fault tollerance - A bitgrid chip doesn't need to be perfect. If a cell is bad, but doesn't lie near an edge, it can be routed around. Software to do this routing would eventually find it way into the normal set of tools associated with bitchip deployment. This should greatly reduce the cost of the chips, should they ever be manufactured in bulk.

Program transformation - A program for a bitgrid could be automatically rotated, compartmentize, and routed to best fit a given chipset. As above, I expect this software to be a basic part of any bitgrid deployment.

Duplexing - When I decided on a 4 in, 4 program, 4 out architecture, it seemed a reasonable choice. When modeling the circuit for a 8 bit programmable 2s complement generator, it became apparent that data flow in more than one direction across a chip was quite useful. I had the mode bit going down, with the carry bit going up the same column. In non-trivial applications, this could help boost the efficiency of utilization of the cells.

Inertia as default - When programming a grid, I've chosen a set of defaults for my tools that implement what appears to be inertia. Once a bit is in motion across the grid, it defaults to travel in that direction, without collision. The physics parallels are intriguing.

Interface as RAM - It would be quite desireable to make the bitgrid appear to be a RAM chip externally. The "program" store could appear as a conigous block of static RAM to the host processor. The IO pins for each cell could be similarly mapped. Thus the first top 8 input bits would just be written to at address 0, for example. The output bits would be a read from the same address. Interfacing to a host becomes trivial.



My story
As I've stated before, the bitgrid idea goes way back with me, to the my brief stint as an Engineering student (Rose-Hulman, 1981-1982). I had read about, and studied a bit about bit slice processors, and came up with the idea of building a vast array of them. I wrote page upon page of different ideas, and tweaks to the idea, but then I let it all drop. (A bad habit I'm working to change)

I've talked with friends about it from time to time, but figured I'd never be able to do it because I don't have the resources or skills to get a chip built. Let alone the whole patenting process...

I've hit middle age, and am figuring out just what my life goals are. 43 things was a nice catalyst, and one of my entries was about the bit grid. (Another is a refactoring compiler, which will be discussed elsewhere). At this point my goals for the bitgrid are:
  • Have at least one other person truely "get" the idea, so I have a collaborator
  • Actually get a good technical discussion going about the merits of this architecture
  • Determine the true value of the concept, where its niche in the world of computing is
Assuming that I'm not crazy, then we move on to longer term goals:
  • Build some software emulation programs to test out ideas
  • Write some code to solve a non-trivial computing problem (for some reason FFTs keep coming to mind)
  • Figure out just how fast and cheap the chips would be
  • Get chips made (I don't care who does this, as long as it stays open source)
  • Debug said chip designs
  • Get good chips made
  • Find real customers, get feedback, loop
Right now... I'm just trying to find others who think the idea has merit, and want to sheppard it along. I thank Doc Searls for his encouragement, and I thank you for your time and consideration.

--Mike-

1 comment:

KipIngram said...

This is a pretty nice idea. I think it bears a lot of resemblance to the "sea of gates" FPGA type. Your "bits" in the grid look a whole lot like Xilinx FPGA LUTs (look up tables) to me. I particularly like your notion of making the configuration space just look like a RAM range. One of my big beefs with contemporary FPGA vendors is the way they hold the "meaning" of the configuration bitstream proprietary.

With the "RAM view" of configuration, you could reconfigure parts of the grid on the fly, which I think opens the door to some very, very interesting possibilities. Also, we already know how to make RAM - it looks to me like making what you've described would be mostly just integrating the neighbor-to-neighbor logic into a pre-existing RAM structure. Which might not be "easy" - existing RAM designs have been optimized to pack the cells as close together physically as possible - they'd have to be "spread out" at least a little to slip the new connections in.

All in all, though, it's a clever idea and it would be interesting to see what could be done with a really large version of such a structure.

About Me

My photo
I fix things, I take pictures, I write, and marvel at the joy of life. I'm trying to leave the world in better condition than when I found it.