Card Network

Business Card Neural Network

I recently came across Andrew Kensler's fascinating Business Card Raytracer. His code, in just 1337 bytes, generates a full raytraced image complete with depth of field, reflections, a sky gradient, and a textured floor. In a similar vein, I set out to create a full neural network that could fit on the back of a business card. The code below is the result, creating a 3-layer fully-connected neural network with leaky-relu activations and training it to generate a small image of my name.

#include <bits/stdc++.h>

typedef float f;typedef std::vector<f> v;
f K=0.01,R=0.01;struct L{int w,h,i;v W,x,
z;L(int i,int o){W.resize(i*o);for(f&a:W)
a=(rand()/(1.1-(1<<31))-0.5)/sqrt(i+o+0.)
;w=i;h=o;}v F(v X){x=X;v a(h);i=h*w;while
(i--)a[i/w]+=W[i]*x[i%w];z=a;for(f&i:a)i=
K>i?K:i;return a;}v B(v Y){v X(w);i=w*h;
while(i--){f Z=Y[i/w]*(z[i/w]>0?1:K);X[i%
w]+=Z*W[i];W[i]-=R*Z*x[i%w];}return X;}};
int main(){L n[]{L(40,50),L(50,50),L(50,1
)};int l,z,k,i=500337;while(i--){z=242-i%
243;k=40;v x(k),y;while(k--)x[k]=sin((k%2
?z/27/3.:z%27/5.)+6.14*k/20);y=x;for(l=0;
l<3;l++)y=n[l].F(y);v Y{2*(y[0]-("\x1e\0\
\0\b\x01\0@H\374\377B\x12@\x18\x12G\302\
\x10@\x12\372\377\221\x10\0\0\200\0\0\0\
\x02"[z/8]>>z%8&1))};while(l--)Y=n[l].B(Y
);putchar(" .,-*oO##"[(int)(y[0]*8)%9]);
if(i%27<1)puts(i?i%243?"":"\033[9A":"\r\
   --- Oisin Carroll ---\n\
 gh:Oisincar  web:imois.in");}}

You can view the output here by clicking the play button. (I hate to admit it's not actually running the code in the browser :P, but you'll see the same or similar output if you run the code yourself).

> ./card_network

The code is only 878 characters, and (as mentioned previously) creates and trains a small neural network. I'm sure it's possible to make it shorter still, but I was happy enough to get this far! I think making it any shorter would make it impossible to reason about, now at least new layers can be added or removed easily and the forward/back propagation is pretty neat! (I'm not biased I swear).

Here's a small breakdown of how it works.

#include <bits/stdc++.h>

typedef float f;typedef std::vector<f> v;
f K=0.01,R=0.01;

Define simple imports and typedefs. The two floats define constants used for the 'leak' of leaky relu, (I.e. the \(K\) in \(f(x) = max(Kx, x)\)), and the learning rate.

		struct L{int w,h,i;v W,x,
z;L(int i,int o){W.resize(i*o);for(f&a:W)
a=(rand()/(1.1-(1<<31))-0.5)/sqrt(i+o+0.)
;w=i;h=o;}

Start of the layer struct L. The constructor takes the input and output sizes, and initializes a weights matrix in an approximation of Xavier Initilization. Ideally, the random variable would be normally distributed, but linearly seems to work decently too!

	  v F(v X){x=X;v a(h);i=h*w;while
(i--)a[i/w]+=W[i]x[i%w];z=a;for(f&i:a)i=
K>i?K:i;return a;}v B(v Y){v X(w);i=w*h;
while(i--){f Z=Y[i/w]*(z[i/w]>0?1:K);X[i%
w]+=Z*W[i];W[i]-=R*Z*x[i%w];}return X;}};

Two functions for the layer struct. F performs forward propagation, taking the input vector and returning the output. Some values are cached for later. B performs backward propagation, taking the derivative of the output, updating the weights matrix, and returning the derivative of the input. Both are using leaky relu.

int main(){L n[]{L(40,50),L(50,50),L(50,1
)};int l,z,k,i=500337;while(i--){

Start of main, create a 3 layer network. The network takes 40 values for input, and has two hidden layers with 50 nodes each, and returns single value at the end. The input is based on the x,y coordinate currently predicted, and the output is a single value for whether this pixel should be a space or '#'. At the end we open the training loop, which runs for 500336 iterations, which is a multiple of the output size; 243.

				 z=242-i%
243;k=40;v x(k),y;while(k--)x[k]=sin((k%2
?z/27/3.:z%27/5.)+6.14*k/20);

Generate the input vector x. Each element of x is based on the sin of either the x or y coordinate of the pixel currently being predicted.

			     y=x;for(l=0;
l<3;l++)y=n[l].F(y);

Forward propagation

		    v Y{2*(y[0]-("\x1e\0\
\0\b\x01\0@H\374\377B\x12@\x18\x12G\302\
\x10@\x12\372\377\221\x10\0\0\200\0\0\0\
\x02"[z/8]>>z%8&1))};

From the output of the network, y, calculate \(\frac{dE}{dy}\). We use mean squared error, so the derivative is just \(2(y-\hat{y})\), where \(\hat{y}\) is the target value and is encoded in a string. When written with non-ascii characters this is a lot shorter.

		     while(l--)Y=n[l].B(Y
);

Back propagation

  putchar(" .,-*oO##"[(int)(y[0]*8)%9]);
if(i%27<1)puts(i?i%243?"":"\033[9A":"\r\
   --- Oisin Carroll ---\n\
 gh:Oisincar  web:imois.in");}}

Output: Choose different chars based on how high/low the value predicted is. Then print special characters if we're at the end of a line (i%27==0) or we've done a full epoch (i%243==0). Finally, when we're finished, i==0 and we print the end text.

Optimizations:

On reddit /u/pm_me_P_vs_NP_papers managed to shorten the code even further - by 50 bytes! They used some fantastically clever byte packing, improved loops and more. Their final length is only 828 bytes and 22 lines. You can read about their optimizations HERE!

UPDATE (OCT/2022). I decided to replicate their updates here, in case their post gets deleted in the future.

/u/pm_me_P_vs_NP_papers' reddit post:

Awesome demo! I managed to get it from 878 bytes 23 lines down to 828 bytes 22 lines:

#include <bits/stdc++.h>

using f=float;using v=std::vector<f>;f K,
R=K=.01,Z;struct L{int w,h,i;v W,x,z;L(int
i,int o){W.resize(i*o);for(f&a:W)a=(rand(
)/(1.1-(1<<31))-.5)/sqrt(i+o+0.);w=i;h=o;
}v F(v X){x=X;v a(h);for(i=h*w;i--;)a[i/w
]+=W[i]*x[i%w];z=a;for(f&i:a)i=K>i?K:i;
return a;}v B(v Y){v X(w);for(i=h*w;i
--;X[i%w]+=Z*W[i],W[i]-=R*Z*x[i%w])Z=Y[i/
w]*(z[i/w]>0?1:K);return X;}};int main(){
L n[]{L(40,50),L(50,50),L(50,1)};int T[]=
{30,66,67093636,16795915,38212120,9438256
,18874273,66,128,128},i=500337,z,k;while(
i--){z=242-i%243;k=40;v x(k),y;while(k--)
x[k]=sin((k%2?z/27/3.:z%27/5.)+.307*k);y=
x;for(L&l:n)y=l.F(y);v Y{2*(y[0]-(T[z/26]
>>z%26&1))};for(z=3;z--;)Y=n[z].B(Y);
putchar(" .,-*oO##"[(int)(y[0]*8)%9]);if(
i%27<1)puts(i?i%243?"":"\033[9A":"\r\
   --- Oisin Carroll ---\n\
 gh:Oisincar  web:imois.in");}}

By far the biggest shave was compressing the "training data" string. My first idea was to store an array of 16 bit integers instead of a string of inefficiently encoded 8 bit values, but then I figured there might be some other even more efficient number of bits per value, and with this python script:

data = b"\x1e\0\0\b\x01\0@H\374\377B\x12@\x18\x12G\302\x10@\x12\372\377\221\x10\0\0\200\0\0\0\x02" + b"\0"*32

data_bits = ""
for c in data:
	data_bits += bin(c+256)[-8:][::-1] # reverse each byte so we can reverse again later (gah endianness)

for bits_per in range(1, 32):
	v = []
	for i in range(0, len(data_bits), bits_per):
		v.append(int(data_bits[i:i+bits_per][::-1], 2))
	while v[-1] == 0: v = v[:-1] # shave trailing zeros
	s = "{"
	for x in v:
		if len(str(x)) < len(hex(x)):
			s += str(x) + ","
		else:
			s += hex(x) + ","
	s = s[:-1] + "}"
	print(bits_per, s)

I found the most efficient encoding was 26 bits / element, with this array: {30,66,67093636,16795915,38212120,9438256,18874273,66,128,128} which I called T in main. The lookup was also slightly adjusted to T[z/26]>>z%26&1.

Some other less dramatic shaves are:

Shorter type aliases:

typedef float f;typedef std::vector<f> v;
using f=float;using v=std::vector<f>;

Shorter float declarations (including changing some 0.5 to .5, etc.):

f K=0.01,R=0.01;
f K,R=K=.01;

Shorter loops:

i=h*w;while(i--)
for(i=h*w;i--;)

Shorter loops v2:

for(l=0;l<3;l++)y=n[l].F(y); [snip] while(l--)Y=n[l].B(Y);
for(l=0;l<3;)y=n[l++].F(y); [snip] while(l--)Y=n[l].B(Y);
for(L&l:n)y=l.F(y); [snip] for(l=3;l--;)Y=n[l].B(Y);
//                             ^ reuse z here instead of l -> one less declaration

Shorter constants:

6.14*k/20
.307*k

Shorter loops v3: (get rid of brackets by bunching expression statements in the increment field)

The only downside is that Z must be declared outside the for loop, and you can either play it safe by declaring it right before the loop, or play it slightly risky and declare Z globally (because there was already a global float declaration which we can take advantage of)

for(i=h*w;i--;){f Z=Y[i/w]*(z[i/w]>0?1:K);X[i%w]+=Z*W[i];W[i]-=R*Z*x[i%w];}
f Z;for(i=h*w;i--;X[i%w]+=Z*W[i],W[i]-=R*Z*x[i%w])Z=Y[i/w]*(z[i/w]>0?1:K);

-> declare Z globally since there are never going to be two instances of this function ever running recursively (-4+2 chars)

End…

Thanks for reading! I'd love to hear suggestions on how to make it shorter, or any other ideas.