A programming trick that used to work was to unroll loops, to prevent the pipeline penalties that occur when you branch. It worked well for a while. The bitgrid is based on the idea of unrolling the whole friggin program. Instead of making a list with less branches... why not distribute each and every instruction of a program out into a physical processing instance?
To make it feasible in hardware, use the simplest computing grid feasible, a grid cells (each cell having 4 inputs (one bit from each neighbor), 4 outputs (one bit TO each neighbor) and a 16 entry look up table), each of which is a pitiful unit of computation by itself.... in a grid size to fit the application at hand, they can execute all of the instructions necessary to compute a result simultaneously.
Communication isn't shared, because every input and output only has 1 place to go or come from. It only has to go to the next cell... so there are no long communication lines to worry about. Each cell can function as a router and logic element at the same time.
Programming is a matter of setting the values in the lookup tables, which could be made of static RAM cells.