BitGrid - brilliant, or waste of time??

Reconsidering the expulsion of routing hardware from BitGrid

2024-11-27T10:10:00.000-08:00

I often reconsider the decisions that have lead to the current state of my ideal BitGrid design. This 1999 paper from Tyler J. Moeller at MIT has me reconsidering the expulsion of all routing hardware from the BitGrid.

Specifically, it's page 52 where he lays out a Bit Level Systolic array... and I see that it definitely needs to have more than one bit inbound from some directions.

I've come to realize (since adding Isolinear Memory) that the key property of a BitGrid that makes it valuable is homogeneity, not necessisarily simplicity. If it were possible to add routing for the 4 bits into/out of a cell without loss of generality and homogeneity, it might be an acceptable tradeoff.

Since I'll have two layers of programability in a BitGrid anyway, the LUT values in one control plane, and the configuration in another, routing data via part of the configuration layer shouldn't take too many gates to pull off.

I think this change would allow far more efficient utilization of a BitGrid, but the complexity might not be worth it. I'm open to discussion, should the bus factor for BitGrid ever increase above 1.

BitGrid, where and how it got here

2024-11-19T08:39:00.010-08:00

As far as I can tell, I've been working on the BitGrid since the early 1980s. I recall finding notebook pages about it from those days. I started this blog back in 2004. Since getting Long Covid in 2020, and the accompanying brain fog (plus being over 40..50..60) I don't remember exactly how it started. In 2022, someone wrote a page about it on the Esoteric Languages Wiki, complete with some examples. Here are the things I've learned that I'm fairly sure of, and all the context I can give you.

First a bit of relevant computing history

When the ENIAC, one of the earliest computers, was built, programming it was a matter of connecting cables in plugboards, throwing switches on "function tables". This effectively wired up a very expensive special purpose computer. Programming could take weeks, but everything could then work in parallel at state of the art speeds, equivalent to about 500 floating point operations per second.

Then the von Neumann architecture was grafted onto it, and the speed was cut by a factor of 6, because all of the inherent parallelism was lost.

Modern FPGAs are optimized for lowest possible latency, which is why they have switching fabrics on them. This makes them expensive, which tends to drive the need for maximum utilization, which then drives adding "special features" like dedicated RAM blocks, Multipliers, etc.

In the end, you've got a very weird, heterogeneous mess. The hardware design languages that you end up forced into using do their best to push your design into all the nooks and crannies, but it's never simple, nor fast. Compiles can take hours, and you're always going to be worried about timing issues.

Much like the problems with the ENIAC that eventually required a sacrifice in run time performance, to make programming easier... the BitGrid makes similar horrible, but useful, tradeoffs.

I've been toying with alternative architectures since the 1980s as a hobby. Over the decades I've become convinced that the best architecture is a grid of cells, in a grid, like a chess board. Each cell would have 4 inputs and outputs, one for each neighbor, latches would then store the outputs of each, so that computation was spread across 2 phases. Every cell that was computing would see stable inputs, and there would be no race conditions.

However, this makes it hard to think about... it might be worth using the conceptually easier model of latching all of the inputs in parallel, and also latching all of the outputs, at least for the first version of a chip. It's equivalent, as far as I can tell, but slower, in run times.

Then there's the problem of memory, I started thinking about this about 6 months ago, as it seemed that I might actually be able to make an ASIC of the BitGrid.

In FPGAs, they add dedicated blocks of RAM, that you can aggregate in various ways. I rejected this approach, because it would add the von Neumann bottleneck, and ruin the homogeneity that makes BitGrid universal.

In trying to actualize the BitGrid design, I learned that the LUTs on FPGAs are effectively a string of D flip flops daisy chained so that the bits can be streamed in at device programming time, then run through a multiplexer. If you add some more logic around this string of flip flops, you can then use it as a serial RAM, without losing generality, nor introducing the von Neumann bottleneck. I called it Isolinear Memory back in September.

It looks like I'll be able to get a working FPGA based implementation of a BitGrid sometime soon, but programming it is still a nightmare, I have zero tooling.

I've had a working emulator for a while. I wrote it in my favorite programming language, Pascal, which means almost nobody else can use it. I also have no I/O for it, nor standard way to load code into it.... it's a total greenfield design space.

I've started using GitHub's CoPilot AI as a tool to build a React based emulator, so that anyone can use it, it even works on my smartphone, but right now it feels like an incoherent Sea of clicky LEDs serving no actual function.

Well kids, that's where I'm at... I welcome collaborators, or even someone else just running with the idea, like everyone did when Ward introduced Modem, later named Xmodem.

BitGrid - and interesting tangents

2024-11-12T05:07:00.000-08:00

If you've read about von Neuman, and wondered what it was like to bootstrap computing, and if there could be another way.... here's an interesting Birthday Gift from me to you... the BitGrid, and a writeup, on an HN thread

https://news.ycombinator.com/item?id=42115107

Bitgrid and Isolinear Memory

2024-09-26T11:50:00.000-07:00

In exploring the design of a BitGrid cell to fit within the confines of a TinyTapeout project, it quickly became apparent that the shift registers to hold all of the programming data for a cell were the majority of the silicon.

It also has become apparent that in order to utilize BitGrid in an actual application, there need to be blocks of memory embedded into the cells.

I decided that it might be interesting to allow the use of the shift registers that normally hold the contents of the BitGrid cell's LUT as memory. Adding some multiplexing and a few more shift registers could greatly increase the functionality of a cell without destroying the homogeneity and logical consistency that makes the BitGrid so easy to think about and use. For now, I'm calling it IsoLinear Memory, as it would add a line of memory into any bitgrid array on demand, in any direction, uniformly.

Here is the cell design at present, as I learn to use KiCad 8.0. The cells get their inputs from IN_A through IN_D, and are clocked in to latch U6 at the rising edge of Clock_A. The state of the LUT table is stored in U2 and U3, and the value selected is fed through either U4 or U5 to the output multiplexer.

The last output cell is also sent to the multiplexer.

I'm going to add logic to allow shifting the LUT contents out, subject to the contents of U1.

That's it for today.

BitGrid meets the Advent Of Code

2023-12-03T10:56:00.000-08:00

I'm a big fan of the Advent of Code, in the past, I used Lazarus/Free Pascal as my language of choice, you can see the code here. So, this year, I'm going to update the BitGrid engine, and do whatever is required to get code to run in the virtual BitGrid. You can follow my progress here.

This requires an ecosystem to support me. I've got to add I/O, programming and Debug SubSystems to the BitGrid engine. I've got to figure out a programming language, and a compiler, router, etc. I hope to get all of the 2023 problems done before Advent of Code 2024 starts.

Making a fresh start, in Pascal

2023-06-15T22:55:00.001-07:00

I've recently been made aware that there is a page devoted to BitGrid in the Esoteric Languages wiki. Needless to say, this came as a bit of a shock to me. I was particularly impressed with the working example of the game of life implemented in a BitGrid.

I've let this project sit far too long, and since I've got nothing but free time these days, I'm pushing at implementing a simulator for the BitGrid, in Lazarus, which is a GUI builder based on Free Pascal. The GitHub repository for the project has the latest code, which isn't much to look at for the moment.

I'm hoping that I can sustain small bits of progress that add up over time.

Eventually, I'll have a system to really allow programming and evaluation of programs. It would then be on to a chip design, and onward to Exaflop computing. ;-)

A chance to make a bitgrid chip appears suddenly

2021-05-20T23:09:00.001-07:00

This thread on Hacker News pointed to a Google funded project to let people design their own chips. The window is already half way over, so I need to learn chip design in the next 2 weeks or so. It should be fun.

Applications - image processing, survey plane

2010-10-28T08:23:00.000-07:00

One of the long term issues with having a technology, which should be much more powerful than existing hardware is finding an appropriate use scenario for it. Today one occurred to me on my commute.

I've been doing a lot of experimenting with synthetic focus imagery. Having recently written a tool to help me do image matching, I've begun to appreciate why Hugin gets so bogged down generating and matching control points. The cross correlation of 2d image sets is a huge resource hog. Fortunately, the bitgrid should be quite capable of handling it, because the computations are data local, with the only global data being the source material which loads once per frame, and the output maximum coordinates, again once per frame.

I imagine a remote control glider with a pair (latter an array) of cameras feeding into a system which correlates the images to generate altitude data. It should be possible once this 2d triangulation is done, with hints from the navigation system, to then generate a 3d image of the area below, with altitude information at least as accurate as the pixel resolution allows.

If the plane is slow and stable enough to allow multiple overlapping images of the same area, it should be possible to derive super-resolution images using Richardson-Lucy deconvolution.

It all hinges on the question of power consumption of a single bitgrid cell. Something I don't know, but an experienced IC designer should be able to figure out on his lunch break.

Spreadsheet iteration and other linguistic hits

2010-08-14T06:05:00.000-07:00

I continue to search for ideas that are close to the BitGrid, and I've come across Amir Hersch's mention of the need for a "spreadsheet iterator" in his blog post titled "More versus Faster"... the BitGrid would be a good spreadsheet iterator.

I'm still trying to figure out the cost/benefit ratio of getting rid of all routing in an a real world FPGA device. As an abstraction tool, it's totally cool and cost effective, as there are no static or dynamic power costs in a thought experiment. 8)

As an intermediate stage of compiling a design, there are time costs in translations, but they might be worth it when it comes to the ability to move elements of a system design orthogonal to other design decisions.

Time and persistent effort to get the questions answered will tell. I'm glad I'm still asking questions and pursuing the goal of getting a BitGrid chip built

Oh... another linguistic hit "hardware spreadsheet" as mentioned here.

Reconfigurable Systolic Array

2010-07-27T06:38:00.000-07:00

I've been searching, and searching, and searching for anything that matches the Bitgrid in architecture... and I've found nothing... today's Google search is reconfigurable systolic array.

I get a lot of results, mostly academic (which means they are behind a paywall, and thus worthless). It does give me a better way to describe the bitgrid, though.

The bitgrid is a fine grained homogeneous 2d reconfigurable systolic array and/or mesh. It will be verified as to utility by simulation. I hope to popularize it with blogs, social media, and making a game out of it.

It is my belief that the flexibility of the LUT based approach more than makes up for the lack of dedicated routing and compute blocks. Any inactive elements of the circuit are unclocked, and thus should be at very low power.

I'm not sure if I'm going to be a good fit for the OHPC project or not, I've got until August 6 to write a proposal.

Prior art - non found... and I've got a headache

2010-07-09T20:10:00.000-07:00

Every single article I checked trying to find a pure LUT based FPGA had some sort of routing fabric in it.

I can't find anything that is close to the bitgrid.

I'm going to try to relax, and wait for the Excedrin to kick in.

Learning about chip design

2010-07-06T14:48:00.000-07:00

I really need to find out how much power a BitGrid cell will consume, in order to find out how well it could realistically deliver on the Exascale challenge. This is forcing me to learn all about VLSI design. Thanks to CMOSedu, I'm getting up to speed on Electric and LTspice right now. I hope to learn enough to have an answer within an order of magnitude this week. I'm hoping I can get it in the range of 1 pJoule/operation/cell for a naive design. This would allow a 1000x1000 chip to operate at 1 watt at 1 Ghz. I expect static power to be very low.

I've got a month until the first DARPA deadline for submissions...

wish me luck.

What's required for Exascale

2010-07-04T10:53:00.000-07:00

I've been watching the google results for the key word "Exascale"... and found this at the bottom of this article about bioinformatics. (Emphasis mine)

And in another opinion piece over at International Science Grid This Week, Irving Wladawsky-Berger, a 37-year IBM veteran, lays out some of the big challenges on the way to achieving exascale computing. Whereas the evolution from terascale to petascale went smoothly using tens to hundreds of thousands of processors from the PC and Unix markets, they will not get us to exascale, writes Wladawsky-Berger. Exascale will require some other kind of major transition in chip architecture, not to mention a completely new programming paradigm.

Well... that is precisely describes the Bitgrid, or will, once I get the thing built.

SimGrid - Up and running

2010-07-01T13:36:00.000-07:00

I managed to get a Grid Simulation working.... here is a screenshot of it. I named it SimGrid, and it works in conjunction with Sim01 as a programming tool. First you work out your entry on Sim01 for a single cell, then paste the hex code into the appropriate cell in the Grid. The code to generate it can scale to arbitrary dimensions, limited by the screen and Windows resource handles.

In this simulation, every cell is loaded with the code required to pass each bit through the cell, and have it emerge on the opposite side. Its a fairly easy way to check for logic flaws in the simulator. Here we see a bit from the left side propagating all the way across to the right.

I feel that it's reasonable to estimate I can emulate an arbitrary grid of these, even as many as 1000x1000 without problem. The problem now is to find a problem domain that is appropriate for the architecture to use as a baseline for performance evaluations. I need to make sure it's going to be quick enough once cast into silicon to have any commercial value. I'm hoping it will be fast enough for the Exaflop realm, but I have to real way to tell at this point in time.

I'm starting to dig around for silicon simulation resources, as well as a place to do fabrication of this as an ASIC should I find funding.

I feel like Tesla probably felt when he got the idea of polyphase power straight in his head.... BitGrid has enormous potential, but now is the time for a lot of blood sweat and tears to make it come into being.

Unrolling programs instead of loops

2010-06-30T09:21:00.000-07:00

A programming trick that used to work was to unroll loops, to prevent the pipeline penalties that occur when you branch. It worked well for a while. The bitgrid is based on the idea of unrolling the whole friggin program. Instead of making a list with less branches... why not distribute each and every instruction of a program out into a physical processing instance?

To make it feasible in hardware, use the simplest computing grid feasible, a grid cells (each cell having 4 inputs (one bit from each neighbor), 4 outputs (one bit TO each neighbor) and a 16 entry look up table), each of which is a pitiful unit of computation by itself.... in a grid size to fit the application at hand, they can execute all of the instructions necessary to compute a result simultaneously.

Communication isn't shared, because every input and output only has 1 place to go or come from. It only has to go to the next cell... so there are no long communication lines to worry about. Each cell can function as a router and logic element at the same time.

Programming is a matter of setting the values in the lookup tables, which could be made of static RAM cells.

Introducing Sim02

2010-06-28T09:32:00.000-07:00

Now that Sim01 is capable of demonstrating how a single cell works in terms of bits, it's time to start building a grid simulator. This is my first effort at it. Using Sim01 to figure out the hex codes (a new feature I added to support this)... I was able to determine the proper codes to do a pass through... and make that the default for this 2 cell array.

Here you see an input from the West side of the array propagated all the way through to the east side. Simultaneously an input from the North of Cell 01 (right cell) is propagated to the bottom.

It's all prototype code, but it is functional. I intend to scale this up to a simulator capable of simulating an arbitrary size grid. Then I'll add some input and output functionality to allow the processing of real data with a simulated bitgrid.

Introducing Bitgrid-Sim version 0.03

2010-06-24T14:47:00.000-07:00

I've created a basic simulator for 1 cell of a bitgrid. The simulator currently is a Windows only application, but it's should be fairly easy to port to Linux should it be required later. Here's the first public screen shot:

The simulation shown is a pass through.... in this configuration the bitgrid cell acts as an expensive set of wires, passing signals straight through. It's useful for filler between active cells.

In this simulator you can edit the program (showing in the 4 areas of check boxes) and simulate inputs The checkboxes corresponding to the current set of inputs are highlighted red as a programming aid. You can save your work as well.

Users can load a few examples, or create there own. In addition to creating a Google Code project site for it, I've also packaged up the Source, Executable and Examples and made it available as a zip file. It's small, and you just unzip it, and it should be ready to use.

Beyond the Petaflop - How Bitgrid could meet DARPA's needs ahead of schedule.

2010-06-23T22:36:00.000-07:00

I found this story about a new DARPA request for proposals via Digg. The challenge is to build a computer system which can process 10^18 floating point operations per second. Here's the back of the envelope calculation I posted in response:

I think it could be done in 3 years if it didn't have to fit in one rack, it could have already been done, had there not been such a heavy emphasis on the Von Neuman architecture for the past 40 years.

Imagine a bit slice processor....with perhaps 1000 transistors. Put those in the same die in a 1000x1000 grid, this would require 10^9 transistors. You could clock them at the nice sane clock speed of 1 ghz. That would fit in a die the same size as a current generation cpu. That's 10^6 slices times 10^9 cycles/second, or 10^15 bit computes per second, on a practical size die, with current technology.

Even if you lost 99.9% of the compute efficiency in shuffling bits around to do a floating point operation, you could still do 10^12 Floating point operations per second, on a prototype chip... today.

The chip would be easy to test, because all of the bit slices would be identical, so the testing of each part could be done in parallel... perhaps 1 second to test time per die. (Testing is a big part of cost when it comes to chips) The chip would cost somewhere around $10 each.

If you allow me to continue with my estimate of 10^12 flops per chip, and it were possible to build a grid of 1000x1000 of them... that takes you to the magic 10^18 Flops that DARPA wants, for a cost of about $10,000,000.

10^18 operations per second, with 10^15 transistors, clocked at 1Ghz. Feasible... yes... but it does require you to give up sequential programming, and think in terms of graph flow.

It's called bitgrid, I thought of it around 1981.... and I've written some of this up at http://bitgrid.blogspot.com

Enabling technology on its way from HP

2010-04-07T21:40:00.000-07:00

HP has continued to make progress on their Memristors, and this latest advance which allows them to be put into large arrays would be perfect to use as the latch / demultiplexor components of the bitgrid cell. It would be amazing to have a bitgrid with 1000x1000 bits operating at 1 Ghz.

HRSA - Close, but no cigar

2008-07-31T12:13:00.000-07:00

I came across this PowerPoint about HRSA: High-Speed,
Hierarchical Synchronous Reconfigurable Array which is part of the BRASS Project University of California at Berkeley via a google search.

They figured out that long interconnects are a problem when you're trying to get speed out of a logic array, but don't seem to be willing to give up the big complex interconnection logic.

I'd call this a step closer to the bit grid, but definitely not a hit.

Instruction Systolic Array

2008-07-29T11:19:00.000-07:00

The BitGrid is based on the idea of data flowing through a grid of instructions. The logical inverse of this situation is to have the data remain in place, while instructions to modify it flow past it... this is the Instruction Systolic Array.

The idea is intriguing because it offers a way to get the benefits of the systolic array without having to have all of the bandwidth necessary to update all of the cells at once. The web site is well thought out and informative as well. I like the illustration of the matrix multiply using their concept.

There are a lot of architectures that got skipped along the way to our current crop of FPGA and other programmable logic circuits. I think that the systolic array warrants further consideration as well as the BitGrid.

BitGrid - A minimalist systolic array

2008-07-18T22:10:00.001-07:00

Sometimes the key to everything is to find the right words... the right words to explain a concept, the right words to feed into a search engine. I've learned some new words to explain the BitGrid, and they help tie it into the history of computing a bit better, and give context. The two words are

systolic array

The bitgrid as I imagined it way back in the 1980s is a systolic array. It takes information, and processes pieces of it simultaneously. It's an extension of the then-common idea of a bit-slice processor, which was used to create really fast custom processors before the microprocessor really took off.

The BitGrid is a minimalist bidirectional systolic array. According to the wikipedia entry on the subject, the pros and cons of systolic arrays are:

Pros

Faster
Scalable

Cons

Expensive
They are a highly specialized for particular applications.
Difficult to build

The fact that I want to process 4 bits at a time means that each cell is almost trivial, a 4bit wide 4address line EEPROM table, for a total of 64 bits of information. This makes it cheap and easy to design and build, pretty much wiping out the Cons in the table above. I don't have a way to get silicon, yet but I expect it should be the matter of getting a cell and it's addressing logic right, then replicating a big grid of these onto a single chip.

I've figured out that a n bit multiplier requires n*(n-1) cells. A divider takes the same number of cells. Adding and subtracting n bits requires n cells.

It's amazing how little of this can be found via a straightforward Google search, unless you know exactly which magic words to use. Semantic web searches will add value, should they ever actually get here.

Logisim

2006-10-15T09:01:00.000-07:00

Earlier this week I foundCarl Burch's wonderful Logisim which does digital logic simulation. It took a few hours of tweaking, and one flash of insight (below)... but here is what a single BitGrid cell looks like in an idealized format.

The cell consists of a single 16x4 RAM cell (4 bits address, 4 bits data). I used a ROM in the simulation to allow it to persist across saves, and simplify the layout.

There are any number of ways you could wire this thing up... the flash of insight I had was that I wanted it to be very simple to turn a cell into a simple pass-through repeater. I figured that if addresses 0-F were programmed with contents 0-F, and it just worked that way... it would be easiest to understand. This leads naturally to the layout you see pictured here.

If you want to see for yourself, here is the Logisim Circuit file. (You'll have to save it, then rename it to *.circ for Logisim due to limitations of my web host at 1and1.com)

I welcome comments and suggestions.
--Mike--

Signs of life

2006-02-24T07:16:00.000-08:00

I ran into Joshua, who might be able to help me get a chip made! He's been through chip design, and is a EE at heart. I laid out what I wanted to do, answering questions along the way... and there were no obvious huge stumbling blocks with what I want to do.

It's not dead yet! 8)

BitGrid in 25 words or less

2005-03-17T14:52:00.000-08:00

Get bit from each neighbor
concatenated 4 bit number becomes index
One lookup table per neighbor