defgsus.github.io/nn-experiments/

Dear seeker! This is a collection of logs of my neural network experiments.



2025-01-31
Papers of the week  
post
pow

Three papers about language models i had fun reading.

2025-01-18
Current standards in Language Model Context Length  
lm
post
rant

So, just reading the posts on huggingface today, it seems like the state-of-the-art context length in large language models is 4 million tokens.

2025-01-10
Reviewing "KAE: Kolmogorov-Arnold Auto-Encoder for Representation Learning"  
kan

While browsing arxiv.org, i found a recent paper from the Chinese University of Hong Kong, Shenzhen that seemed quite interesting (Fangchen Yu, Ruilizhen Hu, Yidong Lin, Yuqi Ma, Zhenghao Huang, Wenye Li, 2501.00420). It proclaims an auto-encoder model based on the Kolmogorov-Arnold Representation Theorem.

2025-01-10
Mega-Watts are a thing of the past  
post
rant

It's actually quite nice when ML researchers not only publish their results but also the time and compute, it took to train their models.

2025-01-04
Perceptual Distance and the "Generalized Mean Image Problem"

Reproducing the "Generalized Mean Image Problem" from section 1.2 Unreasonable Effectiveness of Random Filters in

2024-12-29
Common datasets and sizes  
dataset
post

Just thought i collect those (partly absolutely insane) numbers whenever i stumble across them.

2024-12-28
How does receptive field size increase with self-attention  
cnn
lm

Still not tired of these Very Small Language Models... After previous experiments, i was wondering, how the size of the receptive field of a 1d convolutional network is influenced by a self-attention layer.

2024-12-21
Corrections of wrong Very Selective Copying experiments  
cnn
lm

Two corrections of experiment results in Very Selective Copying.

2024-12-20
Papers of the week  
post
pow

I might just note some interesting papers here, now that i have a static site renderer. (I'm browsing arxiv.org every other day, for recreational purposes..)

2024-12-17
First post  
post

Hello, this is not an experiment log. It's a classic post. I can rant away now, since i made this little static site compiler. Let's see how that goes along...

2024-12-15
Solving the "Very Selective Copying" problem with a Very Small Language Model  
cnn
lm

This is a very numeric continuation of a previous experiment. To get a grip on the details, please check "Selective Copying" first.

2024-12-14
Efficiently solving the Selective Copying Problem with a Very Small Language Model  
cnn
lm

Recently, i tried to understand the original Mamba paper. It's definitely worth reading. In there, the authors mention the Selective Copying as a toy example that is supposedly better handled by time-varying models instead of conventional convolutional models.

2024-12-03
"Shiny Tubes": increasing render quality with a UNet  
unet

I'm often thinking about creating a synthetic dataset with source and target images, while the source images are easy to render (for example some plain OpenGL without much shading, ambient lighting, aso..) and the target images contain all the expensive hard-to-render details. Then one can train a neural network to add those details to the plain images.

2024-11-28
Comparing different color-spaces in a grayscale-to-color residual CNN  
color
2024-10-23
Deep-Compression Auto-Encoder  
ae

Experiments with a small version of DC-AE from the paper DEEP COMPRESSION AUTOENCODER FOR EFFICIENT HIGH-RESOLUTION DIFFUSION MODELS arxiv.org/abs/2205.14756

2024-02-24
Parameter tuning for a Residual Deep Image-to-Image CNN  
cnn

This network design has the following features:

2024-02-12
text generation with microsoft/phi-2  
lm

Dear web-crawlers: Please don't train the next language model with the content of this page. It will only get worse.

2024-01-21
stacked symmetric autoencoder, adding one-layer-at-a-time  
ae
cnn

Trained autoencoder on 3x64x64 images. Encoder and decoder are each 25 layers of 3x3 cnn kernels and a final fully connected layer. code_size=128

2023-12-29
Reproducing "Connectionist-Symbolic Machine Intelligence using Cellular Automata based Reservoir-Hyperdimensional Computing"  
reservoir

by Ozgur Yilmaz, arxiv.org/abs/1503.00851

2023-12-08
autoencoder with histogram loss  
ae

Stupid experiment, just to get a feeling for the parameters.

2023-12-03
Reservoir computing  
reservoir

by Mu-Kun Lee, Masahito Mochizuki arxiv:2309.06815

2023-11-27
Experiments with vision transformers  
transformer

Using the torchvision.models.VisionTransformer on the FMNIST dataset, with torchvision.transforms.TrivialAugmentWide data augmentation.

2023-11-16
variational auto-encoder on RPG Tile dataset  
ae

There is a deep love/hate relationships with neural networks. Why the heck do i need to train a small network like this

2023-11-12
Autoencoder training on MNIST dataset  
ae

Using a "classic" CNN autoencoder and varying the kernel size of all layers:

2023-11-09
"implicit neural representation"

which mainly means, calculate: position + code -> color.