tag:blogger.com,1999:blog-72586352024-03-13T01:22:00.633-07:00BitGrid - brilliant, or waste of time??Wherein Mike Warot describes a novel approach to computing which follows George Gilders <a href="http://www.wired.com/wired/archive/1.04/gilder_pr.html">call to waste transistors</a>.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.comBlogger32125tag:blogger.com,1999:blog-7258635.post-44997272095465293132023-12-03T10:56:00.000-08:002023-12-03T10:56:14.452-08:00BitGrid meets the Advent Of Code<p> I'm a big fan of the <a href="https://adventofcode.com/" target="_blank">Advent of Code</a>, in the past, I used Lazarus/Free Pascal as my language of choice, you can see the <a href="https://adventofcode.com/" target="_blank">code here</a>. So, this year, I'm going to update the <a href="https://github.com/mikewarot/Bitgrid">BitGrid engine</a>, and do whatever is required to get code to run in the virtual BitGrid. You can follow my <a href="https://github.com/mikewarot/Advent_of_Code_in_BitGrid" target="_blank">progress here</a>.<br /><br />This requires an ecosystem to support me. I've got to add I/O, programming and Debug SubSystems to the BitGrid engine. I've got to figure out a programming language, and a compiler, router, etc. I hope to get all of the 2023 problems done before Advent of Code 2024 starts.</p>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com2tag:blogger.com,1999:blog-7258635.post-85193539934640696712023-06-15T22:55:00.001-07:002023-06-15T22:57:23.255-07:00Making a fresh start, in Pascal<p>I've recently been made aware that there is a <a href="https://esolangs.org/wiki/Bitgrid" target="_blank">page devoted to BitGrid</a> in the Esoteric Languages wiki. Needless to say, this came as a bit of a shock to me. I was particularly impressed with the working example of the game of life implemented in a BitGrid.<br /><br />I've let this project sit far too long, and since I've got nothing but free time these days, I'm pushing at implementing a simulator for the BitGrid, in Lazarus, which is a GUI builder based on Free Pascal. The <a href="https://github.com/mikewarot/Bitgrid" target="_blank">GitHub repository for the project</a> has the latest code, which isn't much to look at for the moment.<br /><br />I'm hoping that I can sustain small bits of progress that add up over time.<br /><br />Eventually, I'll have a system to really allow programming and evaluation of programs. It would then be on to a chip design, and onward to Exaflop computing. ;-)</p>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-50288067378219391162021-05-20T23:09:00.001-07:002021-05-20T23:09:03.203-07:00A chance to make a bitgrid chip appears suddenly<p> This <a href="https://news.ycombinator.com/item?id=27215912">thread on Hacker News</a> pointed to a Google funded <a href="https://efabless.com/open_shuttle_program/2">project to let people design their own chips</a>. The window is already half way over, so I need to learn chip design in the next 2 weeks or so. It should be fun.<br /></p>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-68962312085325339492010-10-28T08:23:00.000-07:002010-10-28T08:23:42.735-07:00Applications - image processing, survey planeOne of the long term issues with having a technology, which should be much more powerful than existing hardware is finding an appropriate use scenario for it. Today one occurred to me on my commute.<br />
<br />
I've been doing a lot of experimenting with synthetic focus imagery. Having recently written a tool to help me do image matching, I've begun to appreciate why Hugin gets so bogged down generating and matching control points. The cross correlation of 2d image sets is a huge resource hog. Fortunately, the bitgrid should be quite capable of handling it, because the computations are data local, with the only global data being the source material which loads once per frame, and the output maximum coordinates, again once per frame.<br />
<br />
I imagine a remote control glider with a pair (latter an array) of cameras feeding into a system which correlates the images to generate altitude data. It should be possible once this 2d triangulation is done, with hints from the navigation system, to then generate a 3d image of the area below, with altitude information at least as accurate as the pixel resolution allows.<br />
<br />
If the plane is slow and stable enough to allow multiple overlapping images of the same area, it should be possible to derive super-resolution images using Richardson-Lucy deconvolution.<br />
<br />
It all hinges on the question of power consumption of a single bitgrid cell. Something I don't know, but an experienced IC designer should be able to figure out on his lunch break.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com2tag:blogger.com,1999:blog-7258635.post-41535930897807657832010-08-14T06:05:00.000-07:002010-08-14T06:28:13.062-07:00Spreadsheet iteration and other linguistic hitsI continue to search for ideas that are close to the BitGrid, and I've come across Amir Hersch's mention of the need for a "spreadsheet iterator" in <a href="http://fpgacomputing.blogspot.com/2008/12/more-versus-faster.html">his blog post</a> titled "More versus Faster"... the BitGrid would be a good spreadsheet iterator.<br />
<br />
I'm still trying to figure out the cost/benefit ratio of getting rid of all routing in an a real world FPGA device. As an abstraction tool, it's totally cool and cost effective, as there are no static or dynamic power costs in a thought experiment. 8)<br />
<br />
As an intermediate stage of compiling a design, there are time costs in translations, but they <i>might be worth it</i> when it comes to the ability to move elements of a system design orthogonal to other design decisions.<br />
<br />
Time and persistent effort to get the questions answered will tell. I'm glad I'm still asking questions and pursuing the goal of getting a BitGrid chip built<br />
<br />
Oh... another linguistic hit "hardware spreadsheet" as mentioned <a href="http://fpgacomputing.blogspot.com/2008/09/parallel-programming-is-easy-making.html">here</a>.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-80285088945262114932010-07-27T06:38:00.000-07:002010-07-27T06:41:14.244-07:00Reconfigurable Systolic ArrayI've been searching, and searching, and searching for anything that matches the Bitgrid in architecture... and I've found nothing... today's Google search is <a href="http://www.google.com/search?q=reconfigurable+systolic+array">reconfigurable systolic array</a>.<br />
<br />
I get a lot of results, mostly academic (which means they are behind a paywall, and thus worthless). It does give me a better way to describe the bitgrid, though.<br />
<br />
The bitgrid is a fine grained homogeneous 2d reconfigurable systolic array and/or mesh. It will be verified as to utility by simulation. I hope to popularize it with blogs, social media, and making a game out of it.<br />
<br />
It is my belief that the flexibility of the LUT based approach more than makes up for the lack of dedicated routing and compute blocks. Any inactive elements of the circuit are unclocked, and thus should be at very low power.<br />
<br />
I'm not sure if I'm going to be a good fit for the <a href="http://www.google.com/search?q=omnipresent+high+performance+computing">OHPC project</a> or not, I've got until August 6 to write a proposal.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-3925240797018625762010-07-09T20:10:00.000-07:002010-07-09T20:10:22.557-07:00Prior art - non found... and I've got a headacheEvery single article I checked trying to find a pure LUT based FPGA had some sort of routing fabric in it.<br />
<br />
I can't find anything that is close to the bitgrid.<br />
<br />
I'm going to try to relax, and wait for the Excedrin to kick in.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com1tag:blogger.com,1999:blog-7258635.post-50138613641712428932010-07-06T14:48:00.000-07:002010-07-06T14:48:51.031-07:00Learning about chip designI really need to find out how much power a BitGrid cell will consume, in order to find out how well it could realistically deliver on the Exascale challenge. This is forcing me to learn all about VLSI design. Thanks to <a href="http://cmosedu.com/">CMOSedu</a>, I'm getting up to speed on <a href="http://www.staticfreesoft.com/">Electric</a> and <a href="http://www.linear.com/designtools/software/ltspice.jsp">LTspice</a> right now. I hope to learn enough to have an answer within an order of magnitude this week. I'm hoping I can get it in the range of 1 pJoule/operation/cell for a naive design. This would allow a 1000x1000 chip to operate at 1 watt at 1 Ghz. I expect static power to be very low.<br />
<br />
<br />
I've got a month until the first DARPA deadline for submissions...<br />
<br />
wish me luck.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-10925401187937123442010-07-04T10:53:00.000-07:002010-07-04T10:54:22.763-07:00What's required for ExascaleI've been watching the google results for the key word "Exascale"... and found this at the bottom of <a href="http://www.genomeweb.com/blog/emergence-exascale">this article about bioinformatics</a>. (Emphasis mine)<br />
<br />
<blockquote>And in another opinion piece over at International Science Grid This Week, Irving Wladawsky-Berger, a 37-year IBM veteran, lays out some of the big challenges on the way to achieving exascale computing. Whereas the evolution from terascale to petascale went smoothly using tens to hundreds of thousands of processors from the PC and Unix markets, they will not get us to exascale, writes Wladawsky-Berger. <b>Exascale will require some other kind of major transition in chip architecture, not to mention a completely new programming paradigm</b>. </blockquote><div>Well... that is precisely describes the Bitgrid, or will, once I get the thing built.</div>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-27358408725931968522010-07-01T13:36:00.000-07:002010-07-01T13:36:15.171-07:00SimGrid - Up and running<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnNjr2X8pbK_oAkINuparVpaIGDy40N9An9ckoN9IjExRam7n0zDFsC8KsRawaUE7gMlQys6kYg26N-E2jszb5E48LEuuKcWvZ9WO8mzfqU-3a9Pk5ul6ZVqyaWvl04uB12OSyFA/s1600/SimGrid_Screenshot_20100701.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="384" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgnNjr2X8pbK_oAkINuparVpaIGDy40N9An9ckoN9IjExRam7n0zDFsC8KsRawaUE7gMlQys6kYg26N-E2jszb5E48LEuuKcWvZ9WO8mzfqU-3a9Pk5ul6ZVqyaWvl04uB12OSyFA/s640/SimGrid_Screenshot_20100701.png" width="640" /></a></div><div style="text-align: left;">I managed to get a Grid Simulation working.... here is a screenshot of it. I named it SimGrid, and it works in conjunction with Sim01 as a programming tool. First you work out your entry on Sim01 for a single cell, then paste the hex code into the appropriate cell in the Grid. The code to generate it can scale to arbitrary dimensions, limited by the screen and Windows resource handles.</div><div style="text-align: left;"><br />
</div><div style="text-align: left;">In this simulation, every cell is loaded with the code required to pass each bit through the cell, and have it emerge on the opposite side. Its a fairly easy way to check for logic flaws in the simulator. Here we see a bit from the left side propagating all the way across to the right.</div><div style="text-align: left;"><br />
</div><div style="text-align: left;">I feel that it's reasonable to estimate I can emulate an arbitrary grid of these, even as many as 1000x1000 without problem. The problem now is to find a problem domain that is appropriate for the architecture to use as a baseline for performance evaluations. I need to make sure it's going to be quick enough once cast into silicon to have any commercial value. I'm hoping it will be fast enough for the Exaflop realm, but I have to real way to tell at this point in time.</div><div style="text-align: left;"><br />
</div><div style="text-align: left;">I'm starting to dig around for silicon simulation resources, as well as a place to do fabrication of this as an ASIC should I find funding.</div><div style="text-align: left;"><br />
</div><div style="text-align: left;">I feel like Tesla probably felt when he got the idea of polyphase power straight in his head.... BitGrid has enormous potential, but now is the time for a lot of blood sweat and tears to make it come into being.</div><div style="text-align: left;"><br />
</div><div style="text-align: left;"><br />
</div><br />
<br />
<br />
<div><br />
</div><div><br />
</div><div><br />
</div>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-40538162232255917812010-06-30T09:21:00.000-07:002010-06-30T09:21:58.704-07:00Unrolling programs instead of loops<div style="color: #111111; font-family: sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 1em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">A programming trick that used to work was to unroll loops, to prevent the pipeline penalties that occur when you branch. It worked well for a while. The bitgrid is based on the idea of unrolling the whole friggin program. Instead of making a list with less branches... why not distribute each and every instruction of a program out into a physical processing instance?</div><div style="color: #111111; font-family: sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 1em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">To make it feasible in hardware, use the simplest computing grid feasible, a grid cells (each cell having 4 inputs (one bit from each neighbor), 4 outputs (one bit TO each neighbor) and a 16 entry look up table), each of which is a pitiful unit of computation by itself.... in a grid size to fit the application at hand, they can execute all of the instructions necessary to compute a result simultaneously.</div><div style="color: #111111; font-family: sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 1em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">Communication isn't shared, because every input and output only has 1 place to go or come from. It only has to go to the next cell... so there are no long communication lines to worry about. Each cell can function as a router and logic element at the same time.</div><div style="color: #111111; font-family: sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 1em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;">Programming is a matter of setting the values in the lookup tables, which could be made of static RAM cells.</div><div style="color: #111111; font-family: sans-serif; font-size: 13px; line-height: 20px; margin-bottom: 1em; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"><br />
</div>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-67168917136432694852010-06-28T09:32:00.000-07:002010-06-28T09:32:55.847-07:00Introducing Sim02<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQB0biZ8vTNJ8gQpAncfByl7tu5M-FaBygGp-LivNJXHSi-UwEnQDHd4r3aGj5QOGkwWgVvz_S4GLcAL6swyheRujte2jXtmtQiGhaRCIfA2cWCZ0sCFaGsJ4HHHy2AAoWp-SCMg/s1600/Sim02_ScreenShot_201006281124.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" height="332" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQB0biZ8vTNJ8gQpAncfByl7tu5M-FaBygGp-LivNJXHSi-UwEnQDHd4r3aGj5QOGkwWgVvz_S4GLcAL6swyheRujte2jXtmtQiGhaRCIfA2cWCZ0sCFaGsJ4HHHy2AAoWp-SCMg/s640/Sim02_ScreenShot_201006281124.png" width="640" /></a></div><div class="separator" style="clear: both; text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: left;">Now that Sim01 is capable of demonstrating how a single cell works in terms of bits, it's time to start building a grid simulator. This is my first effort at it. Using Sim01 to figure out the hex codes (a new feature I added to support this)... I was able to determine the proper codes to do a pass through... and make that the default for this 2 cell array. </div><div class="separator" style="clear: both; text-align: left;"><br />
</div><div class="separator" style="clear: both; text-align: left;">Here you see an input from the West side of the array propagated all the way through to the east side. Simultaneously an input from the North of Cell 01 (right cell) is propagated to the bottom.</div><div class="separator" style="clear: both; text-align: left;"><br />
</div><div class="separator" style="clear: both; text-align: left;">It's all prototype code, but it is functional. I intend to scale this up to a simulator capable of simulating an arbitrary size grid. Then I'll add some input and output functionality to allow the processing of real data with a simulated bitgrid.</div><div class="separator" style="clear: both; text-align: left;"><br />
</div>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-82743996255287233502010-06-24T14:47:00.000-07:002010-06-24T14:47:32.337-07:00Introducing Bitgrid-Sim version 0.03I've created a basic simulator for 1 cell of a bitgrid. The simulator currently is a Windows only application, but it's should be fairly easy to port to Linux should it be required later. Here's the first public screen shot:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1oR9MffjUNJXrW8jMtihAI6HVacYUtKOxKTM1FVBHmqWssqqiBpol6xWEfiZr3Qoj1A4objeq8BdADGHewKKfZfZDUwGrYDyfCmX6SnDOHiL71iy5fAuraz9xKL0dehUF2U-hSQ/s1600/BitGrid003_ScreenShot.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="379" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh1oR9MffjUNJXrW8jMtihAI6HVacYUtKOxKTM1FVBHmqWssqqiBpol6xWEfiZr3Qoj1A4objeq8BdADGHewKKfZfZDUwGrYDyfCmX6SnDOHiL71iy5fAuraz9xKL0dehUF2U-hSQ/s640/BitGrid003_ScreenShot.png" width="640" /></a></div><br />
The simulation shown is a pass through.... in this configuration the bitgrid cell acts as an expensive set of wires, passing signals straight through. It's useful for filler between active cells.<br />
<br />
In this simulator you can edit the program (showing in the 4 areas of check boxes) and simulate inputs The checkboxes corresponding to the current set of inputs are highlighted red as a programming aid. You can save your work as well.<br />
<br />
Users can load a few examples, or create there own. In addition to creating a Google Code <a href="http://code.google.com/p/bitgrid-sim/">project site</a> for it, I've also packaged up the Source, Executable and Examples and made it <a href="http://bitgrid-sim.googlecode.com/files/BitGrid003.zip">available as a zip file</a>. It's small, and you just unzip it, and it should be ready to use.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-74663125285410817432010-06-23T22:36:00.000-07:002010-06-23T22:36:19.553-07:00Beyond the Petaflop - How Bitgrid could meet DARPA's needs ahead of schedule.I found<a href="http://www.networkworld.com/community/node/62808"> this story</a> about a new DARPA request for proposals <a href="http://digg.com/hardware/Beyond_the_petaflop_DARPA_wants_quintillion_speed_computers">via Digg</a>. The challenge is to build a computer system which can process 10^18 floating point operations per second. Here's the back of the envelope calculation I posted in response:<br />
<br />
<br />
I think it could be done in 3 years if it didn't have to fit in one rack, it could have already been done, had there not been such a heavy emphasis on the Von Neuman architecture for the past 40 years.<br />
<br />
Imagine a bit slice processor....with perhaps 1000 transistors. Put those in the same die in a 1000x1000 grid, this would require 10^9 transistors. You could clock them at the nice sane clock speed of 1 ghz. That would fit in a die the same size as a current generation cpu. That's 10^6 slices times 10^9 cycles/second, or 10^15 bit computes per second, on a practical size die, with current technology.<br />
<br />
Even if you lost 99.9% of the compute efficiency in shuffling bits around to do a floating point operation, you could still do 10^12 Floating point operations per second, on a prototype chip... today.<br />
<br />
The chip would be easy to test, because all of the bit slices would be identical, so the testing of each part could be done in parallel... perhaps 1 second to test time per die. (Testing is a big part of cost when it comes to chips) The chip would cost somewhere around $10 each.<br />
<br />
If you allow me to continue with my estimate of 10^12 flops per chip, and it were possible to build a grid of 1000x1000 of them... that takes you to the magic 10^18 Flops that DARPA wants, for a cost of about $10,000,000.<br />
<br />
10^18 operations per second, with 10^15 transistors, clocked at 1Ghz. Feasible... yes... but it does require you to give up sequential programming, and think in terms of graph flow.<br />
<br />
It's called bitgrid, I thought of it around 1981.... and I've written some of this up at http://bitgrid.blogspot.comMike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com1tag:blogger.com,1999:blog-7258635.post-41203529288840457102010-04-07T21:40:00.000-07:002010-04-07T21:43:20.493-07:00Enabling technology on its way from HPHP has continued to make progress on their Memristors, and this<a href="http://www.nytimes.com/2010/04/08/science/08chips.html?hpw"> latest advance</a> which allows them to be put into large arrays would be perfect to use as the latch / demultiplexor components of the bitgrid cell. It would be amazing to have a bitgrid with 1000x1000 bits operating at 1 Ghz.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-91961246189612342722008-07-31T12:13:00.000-07:002008-07-31T12:18:30.791-07:00HRSA - Close, but no cigarI came across <a href="http://brass.cs.berkeley.edu/documents/hsra_poster_fpga99.ppt">this PowerPoint</a><span style="text-decoration: underline;"></span> about HRSA: High-Speed,<br />Hierarchical Synchronous Reconfigurable Array which is part of the BRASS Project University of California at Berkeley via a <a href="http://www.google.com/search?hl=en&q=systolic+array+bit+LUT">google search</a>.<br /><br />They figured out that long interconnects are a problem when you're trying to get speed out of a logic array, but don't seem to be willing to give up the big complex interconnection logic.<br /><br />I'd call this a step closer to the bit grid, but definitely not a hit.<br /><div shape="_x0000_s2050" class="O"><div style="text-align: center;"><span style="font-size:133;"></span><span style=""> </span></div> </div>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com1tag:blogger.com,1999:blog-7258635.post-14971251560154688752008-07-29T11:19:00.000-07:002008-07-29T11:25:06.515-07:00Instruction Systolic ArrayThe BitGrid is based on the idea of data flowing through a grid of instructions. The logical inverse of this situation is to have the data remain in place, while instructions to modify it flow past it... this is the <a href="http://www.iti.fh-flensburg.de/lang/papers/isa/index.htm">Instruction Systolic Array</a>.<br /><br />The idea is intriguing because it offers a way to get the benefits of the systolic array without having to have all of the bandwidth necessary to update all of the cells at once. The web site is well thought out and informative as well. I like the <a href="http://www.iti.fh-flensburg.de/lang/papers/isa/isa2.htm">illustration of the matrix multiply</a> using their concept.<br /><br />There are a lot of architectures that got skipped along the way to our current crop of FPGA and other programmable logic circuits. I think that the systolic array warrants further consideration as well as the BitGrid.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-58447891911408883502008-07-18T22:10:00.001-07:002008-07-21T06:45:20.841-07:00BitGrid - A minimalist systolic arraySometimes the key to everything is to find the right words... the right words to explain a concept, the right words to feed into a search engine. I've learned some new words to explain the BitGrid, and they help tie it into the history of computing a bit better, and give context. The two words are<br /><a href="http://www.google.com/search?q=systolic+array"><blockquote>systolic array</blockquote></a><br />The bitgrid as I imagined it way back in the 1980s is a systolic array. It takes information, and processes pieces of it simultaneously. It's an extension of the then-common idea of a bit-slice processor, which was used to create <span style="font-style: italic;">really fast</span> custom processors before the microprocessor really took off.<br /><br />The BitGrid is a minimalist bidirectional systolic array. According to the <a href="http://en.wikipedia.org/wiki/Systolic_array">wikipedia entry</a> on the subject, the pros and cons of systolic arrays are:<br /><p>Pros</p> <ul><li>Faster</li><li>Scalable</li></ul> <p>Cons</p> <ul><li>Expensive</li><li>They are a highly specialized for particular applications.</li><li>Difficult to build</li></ul><br />The fact that I want to process 4 bits at a time means that each cell is almost trivial, a 4bit wide 4address line EEPROM table, for a total of 64 bits of information. This makes it cheap and easy to design and build, pretty much wiping out the Cons in the table above. I don't have a way to get silicon, <span style="font-style: italic;">yet</span> but I expect it should be the matter of getting a cell and it's addressing logic right, then replicating a big grid of these onto a single chip.<br /><br />I've figured out that a <span style="font-weight: bold;">n bit</span> multiplier requires n*(n-1) cells. A divider takes the same number of cells. Adding and subtracting n bits requires n cells.<br /><br />It's amazing how little of this can be found via a straightforward Google search, unless you know exactly which <span style="font-style: italic;">magic words</span> to use. Semantic web searches will add value, should they ever actually get here.<br /><br /><a href="http://www.google.com/search?q=systolic+array"></a>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-1160928625765786912006-10-15T09:01:00.000-07:002006-10-15T09:27:37.520-07:00Logisim<a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/331/438/1600/Idealized%20Cell.0.png"><img style="margin: 0pt 0pt 10px 10px; float: right; cursor: pointer;" src="http://photos1.blogger.com/blogger/331/438/400/Idealized%20Cell.0.png" alt="" border="0" /></a><br /><br />Earlier this week I found<a href="http://ozark.hendrix.edu/%7Eburch/">Carl Burch</a>'s wonderful <a href="http://ozark.hendrix.edu/%7Eburch/logisim/">Logisim</a> which does digital logic simulation. It took a few hours of tweaking, and one flash of insight (below)... but here is what a single BitGrid cell looks like in an idealized format.<br /><br />The cell consists of a single 16x4 RAM cell (4 bits address, 4 bits data). I used a ROM in the simulation to allow it to persist across saves, and simplify the layout.<br /><br />There are any number of ways you could wire this thing up... the flash of insight I had was that I wanted it to be very simple to turn a cell into a simple pass-through repeater. I figured that if addresses 0-F were programmed with contents 0-F, and it just worked that way... it would be easiest to understand. This leads naturally to the layout you see pictured here.<br /><br />If you want to see for yourself, here is <a href="http://warot.com/bitgrid/bitgrid.xml">the Logisim Circuit file</a>. (You'll have to save it, then rename it to *.circ for Logisim due to limitations of my web host at 1and1.com)<br /><br />I welcome comments and suggestions.<br /> --Mike--Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com7tag:blogger.com,1999:blog-7258635.post-1140794357196642132006-02-24T07:16:00.000-08:002006-02-24T07:19:17.213-08:00Signs of lifeI ran into Joshua, who might be able to help me get a chip made! He's been through chip design, and is a EE at heart. I laid out what I wanted to do, answering questions along the way... and there were no obvious huge stumbling blocks with what I want to do.<br /><br />It's not dead yet! 8)Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com1tag:blogger.com,1999:blog-7258635.post-1111100579928544432005-03-17T14:52:00.000-08:002005-03-17T15:02:59.930-08:00BitGrid in 25 words or lessGet bit from each neighbor<br />concatenated 4 bit number becomes index<br />One lookup table per neighborMike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-1111033068523962492005-03-16T20:16:00.000-08:002005-03-16T20:17:48.820-08:00Bitgrid pro and con - AKA the Thesis<h3>If the <i>Bitgrid</i> is such a great idea, why haven’t I heard of it before?</h3> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">One word answer: “Efficiency”</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">My review of the current literature, aided by my trusty pal Google has shown that the past 25 years of programmable/custom logic design is focused on serving one God, efficiency. All of the designs I’ve seen (admittedly a small subset because I’m not a professional circuit engineer) optimize on some or all of these common goals:</p> <ul style="margin-top: 0in;" type="disc"> <li class="MsoNormal" style="">Speed</li><li class="MsoNormal" style="">Power</li><li class="MsoNormal" style="">Size</li><li class="MsoNormal" style="">Circuit complexity</li><li class="MsoNormal" style="">Unit cost</li> </ul> <p class="MsoNormal" style="margin-left: 0.25in;"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">They do this for a very good set of reasons. You want the lowest power dissipation because it makes it easier to feed and clean up after. You want the fastest speed because that is the driving factor for using hardware instead of software. You want the smallest design size so that you use less die area, and have less chance of a losing a chip to a defect. The circuit complexity goal drives a huge investment in design tools to automate design tasks as much as possible.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">The things that are often traded away for these goals in a chip design are:</p> <ul style="margin-top: 0in;" type="disc"> <li class="MsoNormal" style="">Flexibility</li><li class="MsoNormal" style="">Fault tolerance</li><li class="MsoNormal" style="">Engineering costs</li> </ul> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">The primary reason for going with hardware in the first place is usually speed. If speed is not an issue, then it is usually a good idea to do a given task in software. Software is infinitely malleable, and far easier to patch and update.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">Fault tolerance is usually excluded from designs of custom chips because it is difficult to achieve, and is better addressed by testing and quality control measures. Only when a given feature of a design is homogenous such as in RAM or ROM, is the option to include spares included in a design.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">Engineering costs are usually considered last in a mass produced chip, but they are never trivial. The processes are optimized to automate as much of the design work as possible away from human engineers, but there are always going to be complexity limitations imposed by the heterogeneity of the elements in a given Programmable Logic / Custom ASIC design.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">When viewed from the perspective of the design community I’ve observed, it becomes obvious why nobody has built a bitgrid yet… it’s inefficient as hell by their criteria. An insider would never seriously consider a bitgrid design.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">That, gentle reader, is why you haven’t heard of the bitgrid before.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <h3>Reasons to consider the bitgrid</h3> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">One word answer: “Efficiency”</p> <p class="MsoNormal">Only in this case, a different set of parameters to optimize on:</p> <ul style="margin-top: 0in;" type="disc"> <li class="MsoNormal" style="">Flexibility</li><li class="MsoNormal" style="">Fault tolerance</li><li class="MsoNormal" style="">Testing time</li><li class="MsoNormal" style="">End users</li> </ul> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">The bitgrid is based on a single basic component, with a known, easily comprehended design, in an orthogonal grid. The homogenous nature of the grid makes it trivial to relocate a given portion of an application program. </p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">It can route around a bad cell, if one is found. Each and every cell is a programmable wire at minimum utilization. As long as extensive faults are not present, and slack is available, it should be feasible to route around bad cells in almost any design.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">Because it’s possible to test the RAM that stores the programming along with the bitgrid cells one at a time, it should be very easy to quickly and confidently test a chip with a minimum of testing equipment complexity.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">Because of the simple nature of the bitgrid, a set of graphical tools for design and debugging can be built and will be applicable to any implementation of the chip. The parameters of the IO pins and array size are the main constraints to give to the tools. There are no design heterogeneity obstacles to complicate tool development.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">There are, of course some bit trade offs made, including:</p> <ul style="margin-top: 0in;" type="disc"> <li class="MsoNormal" style="">Speed</li><li class="MsoNormal" style="">Latency</li><li class="MsoNormal" style="">Power</li><li class="MsoNormal" style="">Die Size</li> </ul> <ul style="margin-top: 0in;" type="disc"> <li class="MsoNormal" style="">Unit cost</li> </ul> <p class="MsoNormal" style="margin-left: 0.25in;"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">Compared to an ASIC, a bitgrid will always be slower, use more power, consume a larger die space and cost more per unit. A given bit will have to traverse at least 4 gates per cell, just to emulate a wire. The end user, will be compelled to expend some time optimizing their programming to fit in the smallest available bitgrid device applicable, of course.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">Summary</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">I’m confident that as Moore’s law continues to drive down transistor prices, the bitgrid will become seen as a viable computing architecture for select applications, and may possibly mature into general use over time. The closest analogy I can evoke at this time is to the debates that were made with the introduction of high-level computer languages. It occurred when the cost of computation was driven below the cost of the programmers. I believe this transition is on its way for silicon.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">I believe the best way to predict the future is to invent it. I hope you like my invention.</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal"><span style=""> </span>--Mike—</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p> <p class="MsoNormal">March 16, 2005</p> <p class="MsoNormal"><!--[if !supportEmptyParas]--> <!--[endif]--><o:p></o:p></p>Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-1110556518216491142005-03-11T07:37:00.000-08:002005-03-11T08:11:57.806-08:00Why we need a Free Hardware FoundationWe have the Free Software Foundation as a result of Richard Stallmans irritation with closed source software. I propose a parallel organization to help promote and propagate open source, Free hardware. I want to be able to license the bitgrid under the equivalent of the GPL. I'm certain that there are others with ideas that they want to share in the same way.<br /><br />Specifically, if my bitgrid idea actually pans out, I'd like to let anyone use it in a design according to an equivalent of the GPL for hardware. They would then be bound to do the same for their designs, so that improvements can get worked into the technology, and we all benefit.<br /><br />The main threat I want to hold off is that of submarine patents. If we can build an open database of ideas which can be shared by all, it could be a very powerful tool. I'm not a lawyer, so I don't know if patent or copyright law would be the best place to try to build a GPL equivalent, but I'm sure its time to figure it out.<br /><br />[update]<br />An analogy made in <a href="http://features.linuxtoday.com/news_story.php3?ltsn=1999-06-22-005-05-NW-LF&reply=008&quote=1">this post</a> by Eric Ste-Marie in 1999:<br /><blockquote><br />On the other hand, the day that we can have a processor definition from the internet, download that definition in a "processor makng machine", add whatever material is needed, press a button, wait 5 minutes and "DING!" your processor is ready; well this day, maybe free hardware foundation will become popular and worth the effort. Until that time it will be a hobbyist thing and won't have any impact on the computer world compared to Free software.</blockquote><br />Perhaps the bitgrid could fill that role by being virtual hardware?Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0tag:blogger.com,1999:blog-7258635.post-1110517100652246272005-03-10T20:03:00.000-08:002005-03-10T21:47:19.283-08:00The bitgrid story<span style="font-size:130%;">The Von Neuman architecture - Our current standard<br /></span><br />It all started with the realization that most of the transistors in a modern computer are just sitting idle 99% of the time. Everyone focuses on the CPU, with a single instruction stream being read and executed at a blinding speed. Everything optimizes the CPU, thus RAM is designed to be able to read a cell quickly, when needed, but is idle otherwise. This has advantages, including until very recently, the lack of heatsink requirements for memory chips. The laptop I write this entry on has 512 megabytes of RAM, which means there are 4 Billion transistors which are just sitting there, waiting for their turn in the computing process. Meanwhile most of the 30 million or so transistors in the CPU are also dedicated to the "cache", which is a faster RAM, also present to help optimise the flow of data to and from the CPU.<br /><br />At the heart of everything in the CPU, there are registers (another form of RAM), and the ALU (Arithmetic Logic Unit), Execution pipeline, instruction decoders, and assorted hardware. There are also specialized hardware optimizations such as "speculative execution" which literally computes advances the program along both forks in the instruction flow, until well after it is known which one is the proper path, in order to know the correct answer that much faster.<br /><br />I can't emphasize enough how everything is optimized to get data to the CPU, through the CPU, and back out from the CPU. Its all about getting flow through the core as quickly as possible. The problem is that the core can only get so fast... because it always has to wait to talk to RAM.<br /><br />The next logical step has been known for a very long time (before 1970?), parallel processing. I'm surpised its taken as long as it has for the dual core chips to show up. I expected them back in the early 1990s. They will help performance, and push things along because there will be multiple tasks happening while the CPU waits for the next set of data. Its not optimal though, because the programming tools and operating systems all are still optimized for the single instruction flow, but this shift has been long anticipated, and much research has been done, so there are appreciable benefits to this "multi-core" approach. (Intel calls it Hyperthreading when done by certain Pentium 4 chips)<br /><br />This is the future path most computing hardware will take. The model is mature, and universally understood. It entails a known set of trade-offs which I have summarized above.<br /><span style="font-size:130%;"><br />The Bit Grid</span><br />Bitgrid is a radically different architecture. I see the first uses as a coprocessor, for doing a fixed task at very high speed. Tasks which might otherwise be dedicated to a custom hardware processor, such as signal processing, graphics, compression, etc. I've always figured it would make a really good radar signal processor, or text search engine.<br /><br />The concept is pretty simple, but the implications are radical. The basic computing element in a BitGrid is a cell, which has 4 neighbors, <span style="font-style: italic;">up, left, down, right</span>. For each of these neigbors, there is an input, and output, and a program. So, in summary a cell has<br /><ul> <li>4 single bit inputs, one from each neighbor</li> <li>4 single bit outputs, one to each neighbor</li> <li>4 programs, one for each output (16 bits each)<br /></li> </ul> The flow is simple:<br /><ul> <li>combine the 4 inputs from the neigbors into a nibble - <span style="font-style: italic;">n</span><br /></li> <li>select bit <span style="font-style: italic;">n</span> from each "program" i.e. up(<span style="font-style: italic;">n</span>), left(<span style="font-style: italic;">n</span>), down(<span style="font-style: italic;">n</span>), right(<span style="font-style: italic;">n</span>)<br /></li> <li>output that bit to each neighbor</li> </ul> The key is that once the programs are loaded, the bitgrid because a hardware virtualization of the logic embedded in the program. A very <span style="font-style: italic;">fast</span> virtualization. All parts of the design run at the same time, there is no program counter, no need to optimize around a specific set of registers, etc. All of the cells can be in used if the programmer is efficient. (I've put this model into a spreadsheet that works well - though it was interesting getting around the circular reference restrictions once I tried to make a grid of cells)<br /><span style="font-size:130%;"><br /><br />Interesting properties of the bitgrid.<br /></span>Fault tollerance - A bitgrid chip doesn't need to be perfect. If a cell is bad, but doesn't lie near an edge, it can be routed around. Software to do this routing would eventually find it way into the normal set of tools associated with bitchip deployment. This should greatly reduce the cost of the chips, should they ever be manufactured in bulk.<br /><br />Program transformation - A program for a bitgrid could be automatically rotated, compartmentize, and routed to best fit a given chipset. As above, I expect this software to be a basic part of any bitgrid deployment.<br /><br />Duplexing - When I decided on a 4 in, 4 program, 4 out architecture, it seemed a reasonable choice. When modeling the circuit for a 8 bit programmable 2s complement generator, it became apparent that data flow in more than one direction across a chip was quite useful. I had the mode bit going down, with the carry bit going up the same column. In non-trivial applications, this could help boost the efficiency of utilization of the cells.<br /><br />Inertia as default - When programming a grid, I've chosen a set of defaults for my tools that implement what appears to be inertia. Once a bit is in motion across the grid, it defaults to travel in that direction, without collision. The physics parallels are intriguing.<br /><br />Interface as RAM - It would be quite desireable to make the bitgrid appear to be a RAM chip externally. The "program" store could appear as a conigous block of static RAM to the host processor. The IO pins for each cell could be similarly mapped. Thus the first top 8 input bits would just be written to at address 0, for example. The output bits would be a read from the same address. Interfacing to a host becomes trivial.<br /><br /><br /><br /><span style="font-size:130%;">My story<br /></span>As I've stated before, the bitgrid idea goes way back with me, to the my brief stint as an Engineering student (Rose-Hulman, 1981-1982). I had read about, and studied a bit about bit slice processors, and came up with the idea of building a vast array of them. I wrote page upon page of different ideas, and tweaks to the idea, but then I let it all drop. (A bad habit I'm working to change)<br /><br />I've talked with friends about it from time to time, but figured I'd never be able to do it because I don't have the resources or skills to get a chip built. Let alone the whole patenting process...<br /><br />I've hit middle age, and am figuring out just what my life goals are. 43 things was a nice catalyst, and one of my entries was about the bit grid. (Another is a refactoring compiler, which will be discussed elsewhere). At this point my goals for the bitgrid are:<br /><ul> <li>Have at least one other person truely "get" the idea, so I have a collaborator</li> <li>Actually get a good technical discussion going about the merits of this architecture</li> <li>Determine the true value of the concept, where its niche in the world of computing is</li> </ul> Assuming that I'm not crazy, then we move on to longer term goals:<br /><ul> <li>Build some software emulation programs to test out ideas</li> <li>Write some code to solve a non-trivial computing problem (for some reason FFTs keep coming to mind)</li> <li>Figure out just how fast and cheap the chips would be</li> <li>Get chips made (I don't care who does this, as long as it stays open source)<br /></li> <li>Debug said chip designs</li> <li>Get <span style="font-style: italic;">good</span> chips made</li> <li>Find real customers, get feedback, loop</li> </ul> Right now... I'm just trying to find others who think the idea has merit, and want to sheppard it along. I thank Doc Searls for his encouragement, and I thank <span style="font-weight: bold;">you</span> for your time and consideration.<br /><br />--Mike-Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com1tag:blogger.com,1999:blog-7258635.post-1110433964384284062005-03-09T21:48:00.000-08:002005-03-09T21:52:44.386-08:00The bitgrid futureAs I stated over at <a href="http://pluralsight.com/blogs/hsutter/archive/2004/12/17/3957.aspx">Herb Sutter's blog entry about concurrency</a>, I think the future of computing will include the bitgrid, in some form. The Von Neuman architecture has its definite efficiencies, but the fact that a given RAM cell just sits there waiting almost 100% of the time, seems to me to be the ultimate in waste.<br /><br />I look forward to the future, its going to be a wild ride getting there.Mike Warothttp://www.blogger.com/profile/12975818268596648269noreply@blogger.com0