BitGrid - brilliant, or waste of time??: 2024

Wednesday, November 27, 2024

Reconsidering the expulsion of routing hardware from BitGrid

I often reconsider the decisions that have lead to the current state of my ideal BitGrid design. This 1999 paper from Tyler J. Moeller at MIT has me reconsidering the expulsion of all routing hardware from the BitGrid.

Specifically, it's page 52 where he lays out a Bit Level Systolic array... and I see that it definitely needs to have more than one bit inbound from some directions.

I've come to realize (since adding Isolinear Memory) that the key property of a BitGrid that makes it valuable is homogeneity, not necessisarily simplicity. If it were possible to add routing for the 4 bits into/out of a cell without loss of generality and homogeneity, it might be an acceptable tradeoff.

Since I'll have two layers of programability in a BitGrid anyway, the LUT values in one control plane, and the configuration in another, routing data via part of the configuration layer shouldn't take too many gates to pull off.

I think this change would allow far more efficient utilization of a BitGrid, but the complexity might not be worth it. I'm open to discussion, should the bus factor for BitGrid ever increase above 1.

Tuesday, November 19, 2024

BitGrid, where and how it got here

As far as I can tell, I've been working on the BitGrid since the early 1980s. I recall finding notebook pages about it from those days. I started this blog back in 2004. Since getting Long Covid in 2020, and the accompanying brain fog (plus being over 40..50..60) I don't remember exactly how it started. In 2022, someone wrote a page about it on the Esoteric Languages Wiki, complete with some examples. Here are the things I've learned that I'm fairly sure of, and all the context I can give you.

First a bit of relevant computing history

When the ENIAC, one of the earliest computers, was built, programming it was a matter of connecting cables in plugboards, throwing switches on "function tables". This effectively wired up a very expensive special purpose computer. Programming could take weeks, but everything could then work in parallel at state of the art speeds, equivalent to about 500 floating point operations per second.

Then the von Neumann architecture was grafted onto it, and the speed was cut by a factor of 6, because all of the inherent parallelism was lost.

Modern FPGAs are optimized for lowest possible latency, which is why they have switching fabrics on them. This makes them expensive, which tends to drive the need for maximum utilization, which then drives adding "special features" like dedicated RAM blocks, Multipliers, etc.

In the end, you've got a very weird, heterogeneous mess. The hardware design languages that you end up forced into using do their best to push your design into all the nooks and crannies, but it's never simple, nor fast. Compiles can take hours, and you're always going to be worried about timing issues.

Much like the problems with the ENIAC that eventually required a sacrifice in run time performance, to make programming easier... the BitGrid makes similar horrible, but useful, tradeoffs.

I've been toying with alternative architectures since the 1980s as a hobby. Over the decades I've become convinced that the best architecture is a grid of cells, in a grid, like a chess board. Each cell would have 4 inputs and outputs, one for each neighbor, latches would then store the outputs of each, so that computation was spread across 2 phases. Every cell that was computing would see stable inputs, and there would be no race conditions.

However, this makes it hard to think about... it might be worth using the conceptually easier model of latching all of the inputs in parallel, and also latching all of the outputs, at least for the first version of a chip. It's equivalent, as far as I can tell, but slower, in run times.

Then there's the problem of memory, I started thinking about this about 6 months ago, as it seemed that I might actually be able to make an ASIC of the BitGrid.

In FPGAs, they add dedicated blocks of RAM, that you can aggregate in various ways. I rejected this approach, because it would add the von Neumann bottleneck, and ruin the homogeneity that makes BitGrid universal.

In trying to actualize the BitGrid design, I learned that the LUTs on FPGAs are effectively a string of D flip flops daisy chained so that the bits can be streamed in at device programming time, then run through a multiplexer. If you add some more logic around this string of flip flops, you can then use it as a serial RAM, without losing generality, nor introducing the von Neumann bottleneck. I called it Isolinear Memory back in September.

BitGrid - and interesting tangents

If you've read about von Neuman, and wondered what it was like to bootstrap computing, and if there could be another way.... here's an interesting Birthday Gift from me to you... the BitGrid, and a writeup, on an HN thread

https://news.ycombinator.com/item?id=42115107

Thursday, September 26, 2024

Bitgrid and Isolinear Memory

In exploring the design of a BitGrid cell to fit within the confines of a TinyTapeout project, it quickly became apparent that the shift registers to hold all of the programming data for a cell were the majority of the silicon.

It also has become apparent that in order to utilize BitGrid in an actual application, there need to be blocks of memory embedded into the cells.

I decided that it might be interesting to allow the use of the shift registers that normally hold the contents of the BitGrid cell's LUT as memory. Adding some multiplexing and a few more shift registers could greatly increase the functionality of a cell without destroying the homogeneity and logical consistency that makes the BitGrid so easy to think about and use. For now, I'm calling it IsoLinear Memory, as it would add a line of memory into any bitgrid array on demand, in any direction, uniformly.

Here is the cell design at present, as I learn to use KiCad 8.0. The cells get their inputs from IN_A through IN_D, and are clocked in to latch U6 at the rising edge of Clock_A. The state of the LUT table is stored in U2 and U3, and the value selected is fed through either U4 or U5 to the output multiplexer.

The last output cell is also sent to the multiplexer.

I'm going to add logic to allow shifting the LUT contents out, subject to the contents of U1.

That's it for today.

BitGrid - brilliant, or waste of time??