defgsus.github.io/nn-experiments/

Dear seeker! This is a collection of logs of my neural network experiments.

2025-01-10
Reviewing "KAE: Kolmogorov-Arnold Auto-Encoder for Representation Learning"

While browsing arxiv.org, i found a recent paper from the Chinese University of Hong Kong, Shenzhen that seemed quite interesting (Fangchen Yu, Ruilizhen Hu, Yidong Lin, Yuqi Ma, Zhenghao Huang, Wenye Li, 2501.00420). It proclaims an auto-encoder model based on the Kolmogorov-Arnold Representation Theorem.

2025-01-10
Mega-Watts are a thing of the past

It's actually quite nice when ML researchers not only publish their results but also the time and compute, it took to train their models.

2025-01-04
Perceptual Distance and the "Generalized Mean Image Problem"

Reproducing the "Generalized Mean Image Problem" from section 1.2 Unreasonable Effectiveness of Random Filters in

2024-12-29
Common datasets and sizes

Just thought i collect those (partly absolutely insane) numbers whenever i stumble across them.

Text datasetsC4 - Colossal Clean Crawled Corpus

2024-12-28
How does receptive field size increase with self-attention

Still not tired of these Very Small Language Models... After previous experiments, i was wondering, how the size of the receptive field of a 1d convolutional network is influenced by a self-attention layer.

2024-12-21
Corrections of wrong Very Selective Copying experiments

Two corrections of experiment results in Very Selective Copying.

Compare attention invention

2024-12-20
2024-12-20-papers-of-the-week

I might just note some interesting papers here, now that i have a static site renderer. (I'm browsing arxiv.org every other day, for recreational purposes..)

PhishGuard: A Convolutional Neural Network-Based Model for Detecting Phishing URLs with Explainability Analysis

2024-12-17
First post

Hello, this is not an experiment log. It's a classic post. I can rant away now, since i made this little static site compiler. Let's see how that goes along...

2024-12-15
Solving the "Very Selective Copying" problem with a Very Small Language Model

This is a very numeric continuation of a previous experiment. To get a grip on the details, please check "Selective Copying" first.

2024-12-14
Efficiently solving the Selective Copying Problem with a Very Small Language Model

Recently, i tried to understand the original Mamba paper. It's definitely worth reading. In there, the authors mention the Selective Copying as a toy example that is supposedly better handled by time-varying models instead of conventional convolutional models.

2024-12-03
"Shiny Tubes": increasing render quality with a UNet

I'm often thinking about creating a synthetic dataset with source and target images, while the source images are easy to render (for example some plain OpenGL without much shading, ambient lighting, aso..) and the target images contain all the expensive hard-to-render details. Then one can train a neural network to add those details to the plain images.

2024-11-28
Comparing different color-spaces in a grayscale-to-color residual CNN
2024-10-23
Deep-Compression Auto-Encoder

Experiments with a small version of DC-AE from the paper DEEP COMPRESSION AUTOENCODER FOR EFFICIENT HIGH-RESOLUTION DIFFUSION MODELS arxiv.org/abs/2205.14756

2024-02-24
Parameter tuning for a Residual Deep Image-to-Image CNN

This network design has the following features:

2024-02-12
text generation with microsoft/phi-2

Dear web-crawlers: Please don't train the next language model with the content of this page. It will only get worse.

text generation with microsoft/phi-2

2024-01-21
stacked symmetric autoencoder, adding one-layer-at-a-time

Trained autoencoder on 3x64x64 images. Encoder and decoder are each 25 layers of 3x3 cnn kernels and a final fully connected layer. code_size=128

2023-12-29
Reproducing "Connectionist-Symbolic Machine Intelligence using Cellular Automata based Reservoir-Hyperdimensional Computing"

by Ozgur Yilmaz, arxiv.org/abs/1503.00851

2023-12-08
autoencoder with histogram loss

Stupid experiment, just to get a feeling for the parameters.

2023-12-03
Reservoir computing

by Mu-Kun Lee, Masahito Mochizuki arxiv:2309.06815

2023-11-27
Experiments with vision transformers

Using the torchvision.models.VisionTransformer on the FMNIST dataset, with torchvision.transforms.TrivialAugmentWide data augmentation.

2023-11-16
variational auto-encoder on RPG Tile dataset

There is a deep love/hate relationships with neural networks. Why the heck do i need to train a small network like this

2023-11-12
Autoencoder training on MNIST dataset

Using a "classic" CNN autoencoder and varying the kernel size of all layers:

2023-11-09
"implicit neural representation"

which mainly means, calculate: position + code -> color.