## Business Card Neural Network

I recently came across Andrew Kensler's fascinating Business Card Raytracer. His code, in just 1337 bytes, generates a full raytraced image complete with depth of field, reflections, a sky gradient, and a textured floor. In a similar vein, I set out to create a full neural network that could fit on the back of a business card. The code below is the result, creating a 3-layer fully-connected neural network with leaky-relu activations and training it to generate a small image of my name.

#include <bits/stdc++.h>

typedef float f;typedef std::vector<f> v;
f K=0.01,R=0.01;struct L{int w,h,i;v W,x,
z;L(int i,int o){W.resize(i*o);for(f&a:W)
a=(rand()/(1.1-(1<<31))-0.5)/sqrt(i+o+0.)
;w=i;h=o;}v F(v X){x=X;v a(h);i=h*w;while
(i--)a[i/w]+=W[i]*x[i%w];z=a;for(f&i:a)i=
K>i?K:i;return a;}v B(v Y){v X(w);i=w*h;
while(i--){f Z=Y[i/w]*(z[i/w]>0?1:K);X[i%
w]+=Z*W[i];W[i]-=R*Z*x[i%w];}return X;}};
int main(){L n[]{L(40,50),L(50,50),L(50,1
)};int l,z,k,i=500337;while(i--){z=242-i%
243;k=40;v x(k),y;while(k--)x[k]=sin((k%2
?z/27/3.:z%27/5.)+6.14*k/20);y=x;for(l=0;
l<3;l++)y=n[l].F(y);v Y{2*(y[0]-("\x1e\0\
\0\b\x01\0@H\374\377B\x12@\x18\x12G\302\
\x10@\x12\372\377\221\x10\0\0\200\0\0\0\
\x02"[z/8]>>z%8&1))};while(l--)Y=n[l].B(Y
);putchar(" .,-*oO##"[(int)(y[0]*8)%9]);
if(i%27<1)puts(i?i%243?"":"\033[9A":"\r\
--- Oisin Carroll ---\n\
gh:Oisincar  web:imois.in");}}


You can view the output here by clicking the play button. (I hate to admit it's not actually running the code in the browser :P, but you'll see the same or similar output if you run the code yourself).

> ./card_network

The code is only 878 characters, and (as mentioned previously) creates and trains a small neural network. I'm sure it's possible to make it shorter still, but I was happy enough to get this far! I think making it any shorter would make it impossible to reason about, now at least new layers can be added or removed easily and the forward/back propagation is pretty neat! (I'm not biased I swear).

Here's a small breakdown of how it works.

#include <bits/stdc++.h>

typedef float f;typedef std::vector<f> v;
f K=0.01,R=0.01;


Define simple imports and typedefs. The two floats define constants used for the 'leak' of leaky relu, (I.e. the $K$ in $f(x) = max(Kx, x)$), and the learning rate.

		struct L{int w,h,i;v W,x,
z;L(int i,int o){W.resize(i*o);for(f&a:W)
a=(rand()/(1.1-(1<<31))-0.5)/sqrt(i+o+0.)
;w=i;h=o;}


Start of the layer struct L. The constructor takes the input and output sizes, and initializes a weights matrix in an approximation of Xavier Initilization. Ideally, the random variable would be normally distributed, but linearly seems to work decently too!

	  v F(v X){x=X;v a(h);i=h*w;while
(i--)a[i/w]+=W[i]x[i%w];z=a;for(f&i:a)i=
K>i?K:i;return a;}v B(v Y){v X(w);i=w*h;
while(i--){f Z=Y[i/w]*(z[i/w]>0?1:K);X[i%
w]+=Z*W[i];W[i]-=R*Z*x[i%w];}return X;}};


Two functions for the layer struct. F performs forward propagation, taking the input vector and returning the output. Some values are cached for later. B performs backward propagation, taking the derivative of the output, updating the weights matrix, and returning the derivative of the input. Both are using leaky relu.

int main(){L n[]{L(40,50),L(50,50),L(50,1
)};int l,z,k,i=500337;while(i--){


Start of main, create a 3 layer network. The network takes 40 values for input, and has two hidden layers with 50 nodes each, and returns single value at the end. The input is based on the x,y coordinate currently predicted, and the output is a single value for whether this pixel should be a space or '#'. At the end we open the training loop, which runs for 500336 iterations, which is a multiple of the output size; 243.

				 z=242-i%
243;k=40;v x(k),y;while(k--)x[k]=sin((k%2
?z/27/3.:z%27/5.)+6.14*k/20);


Generate the input vector x. Each element of x is based on the sin of either the x or y coordinate of the pixel currently being predicted.

			     y=x;for(l=0;
l<3;l++)y=n[l].F(y);


Forward propagation

		    v Y{2*(y[0]-("\x1e\0\
\0\b\x01\0@H\374\377B\x12@\x18\x12G\302\
\x10@\x12\372\377\221\x10\0\0\200\0\0\0\
\x02"[z/8]>>z%8&1))};


From the output of the network, y, calculate $\frac{dE}{dy}$. We use mean squared error, so the derivative is just $2(y-\hat{y})$, where $\hat{y}$ is the target value and is encoded in a string. When written with non-ascii characters this is a lot shorter.

		     while(l--)Y=n[l].B(Y
);


Back propagation

  putchar(" .,-*oO##"[(int)(y[0]*8)%9]);
if(i%27<1)puts(i?i%243?"":"\033[9A":"\r\
--- Oisin Carroll ---\n\
gh:Oisincar  web:imois.in");}}


Output: Choose different chars based on how high/low the value predicted is. Then print special characters if we're at the end of a line (i%27==0) or we've done a full epoch (i%243==0). Finally, when we're finished, i==0 and we print the end text.

On reddit /u/pm_me_P_vs_NP_papers managed to shorten the code even further - by 50 bytes! Making the final length only 828 bytes and 22 lines. You can read about their optimizations HERE!

Thanks for reading! I'd love to hear suggestions on how to make it shorter, or any other ideas.