Reproducing "Connectionist-Symbolic Machine Intelligence using Cellular Automata based Reservoir-Hyperdimensional Computing" ←

by Ozgur Yilmaz, arxiv.org/abs/1503.00851

So, maybe i miss-read something but one experiment is supposed to input a few binary values then run it few hundred iterations and feed a batch of the last CA states to a ridge linear readout to recover the inputs. It's supposed to show that the CA can hold values over a longer period and is therefore suitable as a computing reservoir.

Initially i thought it interesting but then i realized that a lot of CA rules would satisfy this without much computing going on. Imagine a rule that simply copies the alive cell in the middle and all input bits are placed with a little distance to each other. Then the readout actually get's the unmodified input as source to recover the input, which isn't a proof of anything.

So i probably misunderstood something. Anyways, i ran the following setup to test the idea a bit:

The dataset consist of the 9-bit binary representations of numbers 0 - 511. 100 samples are randomly selected and moved to the validation set.

A single fixed random mapping of the input values to the CA state space is used to spread the values across a 200 cell wide input state such that no input is closer than 5 cells to another input. The edges of the state are toroid, e.g. they wrap around. Initial state is zero, except the input bits.

All 256 possible rules in the classic wolfram binary-state 3-neighbourhood CA are run for 500 iterations and the last 200 iterations are fed to the sklearn.linear_model.Ridge readout. The training-set is used to train the readout.

When simply querying for the reproduction of the input values, in fact 169 rules have zero training error and 145 even generalize to have zero validation error. (csv)

5-bit xor ←

Next the readout was trained to produce the xor of the first and last 5 bits of a 10-bit number. These sets have 924 and 100 samples for train and test.

Here are the top-50 rules:

(mae is mean absolute error, csv)

rule	correct_train	mae_train	correct_test	mae_test
153	924	0.000018	100	0.000153
60	924	0.000018	100	0.000153
195	924	0.000018	100	0.000153
102	924	0.000018	100	0.000153
150	924	0.000027	100	0.032076
105	924	0.000027	100	0.032076
146	924	0.000045	81	0.155052
18	924	0.000045	81	0.155052
161	924	0.000046	25	0.407795
129	924	0.000046	25	0.407795
126	924	0.000046	25	0.407795
149	924	0.000052	2	0.510267
86	924	0.000052	2	0.510204
135	924	0.000052	2	0.502302
30	924	0.000052	2	0.501841
89	924	0.000052	2	0.498935
45	924	0.000052	1	0.497252
101	924	0.000052	3	0.507690
75	924	0.000052	2	0.501961
22	924	0.000060	7	0.484349
182	924	0.000064	64	0.298419
124	924	0.000137	5	0.440620
193	924	0.000137	5	0.441169
110	920	0.001217	13	0.425420
137	920	0.001217	13	0.426449
109	835	0.025168	18	0.317240
73	818	0.033937	17	0.357383
147	653	0.114803	14	0.403177
54	653	0.114803	14	0.402360
169	471	0.195880	25	0.326195
225	417	0.222927	12	0.354916
210	390	0.211516	13	0.290926
82	390	0.211516	13	0.290926
181	390	0.211516	13	0.290991
26	366	0.220765	7	0.329513
154	366	0.220765	7	0.329513
167	366	0.220765	7	0.329575
122	327	0.197325	6	0.299221
218	175	0.352447	8	0.373356
90	160	0.297512	0	0.323019
165	160	0.297512	0	0.323019
57	115	0.395584	0	0.441891
230	113	0.415542	6	0.456920
188	99	0.414891	0	0.469630
99	96	0.407152	2	0.449875
145	87	0.434149	2	0.472446
158	87	0.450478	1	0.472240
118	86	0.435258	1	0.467948
214	86	0.443312	1	0.477272
62	84	0.435100	2	0.469588

Note that Steven Wolfram's favorite rule 30 is not able to generalize to the validation data. Only 6 rules generalize completely and 23 have zero training error.

8-bit square ←

Here, the 9-bit numbers are used again and the target values are 8-bit binary representations of the result of (n * n) mod 251. Only 50 samples are moved to the validation set so the training set has size 462. Can the CA reservoir generalize to calculate the square of a number?

(csv)

rule	correct_train	mae_train	correct_test	mae_test
86	462	0.000050	0	0.496036
149	462	0.000050	0	0.496256
30	462	0.000050	0	0.487999
135	462	0.000050	0	0.487980
101	462	0.000050	0	0.481407
75	462	0.000050	2	0.480249
89	462	0.000051	0	0.487788
45	462	0.000051	0	0.488288
161	462	0.000054	1	0.490340
126	462	0.000054	1	0.490340
129	462	0.000054	1	0.490340
22	462	0.000063	1	0.490240
182	462	0.000113	0	0.479602
124	462	0.000125	0	0.486095
193	462	0.000125	0	0.486428
18	462	0.000155	0	0.510122
146	462	0.000155	0	0.510122
105	462	0.000372	0	0.658249
150	462	0.000372	0	0.658249
153	462	0.012338	1	2.193423
102	462	0.012338	1	2.193423
195	462	0.012338	1	2.193423
60	462	0.012338	1	2.193423
110	457	0.003107	0	0.493481
137	457	0.003107	0	0.493090
109	430	0.038524	0	0.530648
73	424	0.030511	0	0.514033
54	246	0.176413	0	0.523882
147	246	0.176414	0	0.523725
154	143	0.296260	0	0.581172
26	143	0.296260	0	0.581172
167	143	0.296260	0	0.581173
210	134	0.282733	0	0.550056
82	134	0.282733	0	0.550056
181	134	0.282733	0	0.550097
169	131	0.296666	0	0.581449
225	114	0.317405	0	0.528717
122	84	0.335783	1	0.501831
218	50	0.418594	0	0.496316
230	48	0.390327	1	0.495561
188	43	0.404201	0	0.501767
131	39	0.408533	1	0.499596
145	35	0.405773	0	0.502563
118	35	0.410958	0	0.502439
62	34	0.412358	0	0.504509
158	27	0.442094	0	0.502145
214	25	0.438685	0	0.491097
222	24	0.447798	0	0.494096
133	18	0.430502	0	0.500568
94	18	0.430502	0	0.500568

It seems, no. The top-most number of correct validation results is 2 among all 256 rules. Again, 23 rules have zero training error. At least.

The well-performing rules are identical in both experiments:

 18,  22,  30,  45,  60,  75,  86,  89, 101, 102, 105, 124, 
126, 129, 135, 146, 149, 150, 153, 161, 182, 193, 195

Shown as binary representation, which are the lookup values for the 3-neighbourhood (* marks rules that generalized perfectly in the xor experiment):

 18: [0, 1, 0, 0, 1, 0, 0, 0]
 22: [0, 1, 1, 0, 1, 0, 0, 0]
 30: [0, 1, 1, 1, 1, 0, 0, 0]
 45: [1, 0, 1, 1, 0, 1, 0, 0]
 60: [0, 0, 1, 1, 1, 1, 0, 0] *
 75: [1, 1, 0, 1, 0, 0, 1, 0]
 86: [0, 1, 1, 0, 1, 0, 1, 0]
 89: [1, 0, 0, 1, 1, 0, 1, 0]
101: [1, 0, 1, 0, 0, 1, 1, 0]
102: [0, 1, 1, 0, 0, 1, 1, 0] *
105: [1, 0, 0, 1, 0, 1, 1, 0] *
124: [0, 0, 1, 1, 1, 1, 1, 0]
126: [0, 1, 1, 1, 1, 1, 1, 0]
129: [1, 0, 0, 0, 0, 0, 0, 1]
135: [1, 1, 1, 0, 0, 0, 0, 1]
146: [0, 1, 0, 0, 1, 0, 0, 1]
149: [1, 0, 1, 0, 1, 0, 0, 1]
150: [0, 1, 1, 0, 1, 0, 0, 1] *
153: [1, 0, 0, 1, 1, 0, 0, 1] *
161: [1, 0, 0, 0, 0, 1, 0, 1]
182: [0, 1, 1, 0, 1, 1, 0, 1]
193: [1, 0, 0, 0, 0, 0, 1, 1]
195: [1, 1, 0, 0, 0, 0, 1, 1] *

the bits correspond to these neighbourhood states:

000 100 010 110 001 101 011 111

Below are plots of the state history (100 iterations, toroid 101-cell state) for these particular rules, starting from a single seed cell:

more tests ←

Tried a couple of things :rofl:

Among them is feeding the input data sequentially to the CA state. For the following table, the input sequence is placed in the first state slice as above, then every seq_input_stride steps the input is inserted again to the running CA state. Either a copy of the first input slice or sequentially: every input bit one after another, at their respective fixed random positions.

Training objective was to reproduce the binary float representation of the input number (0 to 511). Only tried the 23 strong rules from above and iterations and output_steps where fixed after preliminary tests.

rule	iterations	output_steps	wrap	seq_input	seq_input_stride	readout_var	correct_train	mae_train	correct_test	mae_test
105	200	100	False	none	0	0.000482	412	0.000057	96	0.032667
105	200	100	False	sequential	1	0.000482	412	0.000057	96	0.032667
105	200	100	False	repeat	7	0.000482	412	0.000057	96	0.032667
105	200	100	False	sequential	7	0.000482	412	0.000057	96	0.032667
150	200	100	False	repeat	1	0.000482	412	0.000057	96	0.032667
150	200	100	False	sequential	1	0.000482	412	0.000057	96	0.032667
105	200	100	False	repeat	1	0.000482	412	0.000057	96	0.032667
150	200	100	False	repeat	3	0.000482	412	0.000057	96	0.032667
105	200	100	False	repeat	3	0.000482	412	0.000057	96	0.032667
150	200	100	False	repeat	7	0.000482	412	0.000057	96	0.032667
150	200	100	False	sequential	7	0.000482	412	0.000057	96	0.032667
150	200	100	False	none	0	0.000482	412	0.000057	96	0.032667
150	200	100	False	sequential	3	0.000482	412	0.000057	96	0.032667
105	200	100	False	sequential	3	0.000482	412	0.000057	96	0.032667
105	200	100	True	sequential	3	0.000511	412	0.000058	91	0.043323
18	200	100	False	repeat	3	0.000642	412	0.000077	91	0.033158
150	200	100	True	sequential	1	0.000511	412	0.000058	91	0.043323
150	200	100	True	repeat	1	0.000511	412	0.000058	91	0.043323
150	200	100	True	sequential	3	0.000511	412	0.000058	91	0.043323
146	200	100	False	none	0	0.000642	412	0.000077	91	0.033158
150	200	100	True	sequential	7	0.000511	412	0.000058	91	0.043323
18	200	100	False	none	0	0.000642	412	0.000077	91	0.033158
105	200	100	True	repeat	3	0.000511	412	0.000058	91	0.043323
146	200	100	False	repeat	1	0.000642	412	0.000077	91	0.033158
150	200	100	True	repeat	3	0.000511	412	0.000058	91	0.043323
105	200	100	True	repeat	1	0.000511	412	0.000058	91	0.043323
105	200	100	True	sequential	7	0.000511	412	0.000058	91	0.043323
105	200	100	True	sequential	1	0.000511	412	0.000058	91	0.043323
18	200	100	False	sequential	3	0.000642	412	0.000077	91	0.033158
146	200	100	False	repeat	3	0.000642	412	0.000077	91	0.033158
18	200	100	False	repeat	1	0.000642	412	0.000077	91	0.033158
105	200	100	True	repeat	7	0.000511	412	0.000058	91	0.043323
18	200	100	False	sequential	1	0.000642	412	0.000077	91	0.033158
146	200	100	False	sequential	1	0.000642	412	0.000077	91	0.033158
146	200	100	False	sequential	3	0.000642	412	0.000077	91	0.033158
18	200	100	False	repeat	7	0.000642	412	0.000077	91	0.033158
105	200	100	True	none	0	0.000511	412	0.000058	91	0.043323
150	200	100	True	none	0	0.000511	412	0.000058	91	0.043323
18	200	100	False	sequential	7	0.000642	412	0.000077	91	0.033158
146	200	100	False	sequential	7	0.000642	412	0.000077	91	0.033158
150	200	100	True	repeat	7	0.000511	412	0.000058	91	0.043323
146	200	100	False	repeat	7	0.000642	412	0.000077	91	0.033158
60	200	100	True	sequential	1	0.000764	412	0.016511	90	0.040892
102	200	100	True	sequential	1	0.000764	412	0.016511	90	0.040892
102	200	100	True	none	0	0.000764	412	0.016511	90	0.040892
153	200	100	True	repeat	3	0.000764	412	0.016511	90	0.040892
153	200	100	True	repeat	7	0.000764	412	0.016511	90	0.040892
195	200	100	True	repeat	7	0.000764	412	0.016511	90	0.040892
102	200	100	True	sequential	3	0.000764	412	0.016511	90	0.040892
60	200	100	True	sequential	3	0.000764	412	0.016511	90	0.040892

(readout_var is the variance of the linear readout matrix)

It can generalize to a high degree in this task. Although all the sequential-input-or-not and even the different rules do not change the numbers too much. For the square of number experiment above this setup still does not get better than 6 correct validation samples out of 100.

Mhh, so what about higher order CAs? If we allow 3 states instead of 2, the 1D replacement CA suddenly has astonishing 3^3^3, or 3^27 possible rules. A do-nothing python loop through that number already takes 300 hours:

for r in tqdm(range(7_625_597_484_987)):
    pass

0%|                  | 567,067,242/7,625,597,484,987 [01:24<314:10:54, 6,741,520.98it/s]