<< nn-experiments
- Reviewing "KAE: Kolmogorov-Arnold Auto-Encoder for Representation Learning"
- Preface
- Kolmogorov-Arnold Auto-Encoder
- Testing the baseline MLP model
- Testing the KAE (p=3) model
- Testing MLP with 64 hidden dims
- Testing different activation functions
- MLP with ReLU6
- Results of l2 reconstruction on MNIST
- Results of l2 reconstruction on MNIST including extra tasks
- Appendix
- MLP, relu/sigmoid, ADAM, lr=0.0001, batch size=256
- MLP, relu, ADAMW, lr=0.0003, batch size=64
- MLP, relu6, ADAM, lr=0.0001, batch size=256
- MLP, relu6, ADAMW, lr=0.0003, batch size=32
- MLP, relu6, ADAMW, lr=0.0003, batch size=64
- MLP, relu6, ADAMW, lr=0.0003, batch size=128
- MLP (hid=64), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- MLP (hid=64), relu6, ADAMW, lr=0.0003, batch size=64
- MLP (hid=128), relu6, ADAMW, lr=0.0003, batch size=64
- MLP (hid=256), relu6, ADAMW, lr=0.0003, batch size=64
- KAE (p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (p=4), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (p=6), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (p=3), relu6/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=64
- KAE (p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=512
- KAE (p=3), relu6, ADAMW, lr=0.0003, batch size=64
- KAE (p=3), relu/sigmoid, ADAMW, lr=0.0003, batch size=64
- KAE (hid=64, p=2), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=128, p=2), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=256, p=2), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=64, p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=128, p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=256, p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=64, p=4), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=128, p=4), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=256, p=4), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=64, p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=128, p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256
- KAE (hid=256, p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256
Reviewing "KAE: Kolmogorov-Arnold Auto-Encoder for Representation Learning" ←
While browsing arxiv.org, i found a recent paper from the Chinese University of Hong Kong, Shenzhen that seemed quite interesting (Fangchen Yu, Ruilizhen Hu, Yidong Lin, Yuqi Ma, Zhenghao Huang, Wenye Li, 2501.00420). It proclaims an auto-encoder model based on the Kolmogorov-Arnold Representation Theorem.
Kolmogorov-Arnold Network (KAN) is a relatively new approach to Neural Networks (2404.19756), where activation functions are learned edge-wise instead of node-wise (or not at all). It's claimed to have higher representational abilities compared to standard linear layers or MLPs.
Preface ←
Well, this is a little rant, which i add here without the intend to diminish the quality of this particular paper. I actually like it.
Now, there is a, in my opinion, unhealthy trend in AI research to put the word superior in the paper abstract and proclaim that the new model is superior to all state-of-the-art (SOTA) models, or at least to the standard baselines. It might be true, but it might also just be true for a very particular experimental setup and nowhere else.
There is another trend which is, to release the unreviewed preprint on arxiv.org and nowhere else. I frequently stumble across papers which have, for example, tables with performance values, where the text indicates that the proposed model is superior, while the actual numbers show the opposite. Once you find such a detail, you realize that you have very likely wasted your time reading the paper up to this point. I still wonder, why one would release such a paper. Does it provide a higher academic ranking tying your name with the words superior and SOTA often enough?
I'm not a payed researcher so i generally have no access to Elsevier or Springer papers. Those papers are reviewed by researchers and, we all hope, none of those hoax papers make it into a professional journal. The fact that reviewers are not payed for their reviewing work and that professional publishers sell the access to those papers (at very high rates!), while the research is generally funded by tax-payers money is the worst trend of all. See, e.g., here.
Anyways, that's just a side-note to provide some context. Out of curiosity, i will examine the paper myself.
Kolmogorov-Arnold Auto-Encoder ←
I like auto-encoders and did a lot of experiments with them. Since the authors of the KAE paper provided their code (github.com/SciYu/KAE), i naturally did not hesitate, copied the KAE model into my own experimental framework and tried a few things. The results, however, were chastening. The model is slower and does not perform nearly as good as my baseline CNN models. But, to be fair, it's only a single layer model by default. So... back to base research.
Table 2 in the paper, shows the superiority of the KAE model. The table looks pretty convincing. And indeed, using the author's code, it can be completely reproduced. Generally, the paper is very clean and tidy and the conducted experiments are insightful.
I cloned the repo (at commit bce71dca
from Dec 31, 2024) and started a
jupyter lab to reproduce the results of Table 2:
from typing import Optional
import math
import torch
import ExpToolKit
def run_experiment(
config_path: str,
num_trials: int = 10,
is_print: bool = False,
overrides: Optional[dict] = None,
):
print(config_path)
config = ExpToolKit.load_config(config_path)
if overrides:
for key, value in overrides.items():
config[key].update(value)
print("updated config:")
display(config)
print()
test_losses = []
for trial in range(num_trials):
torch.cuda.empty_cache()
config["TRAIN"]["random_seed"] = 2024 + trial
train_setting = ExpToolKit.create_train_setting(config)
if trial == 0:
display(train_setting)
num_params = sum(
math.prod(p.shape)
for p in train_setting["model"].parameters()
if p.requires_grad
)
print(f"\nmodel parameters: {num_params:,}\n")
print(f"trial {trial + 1}/{num_trials}")
model, train_loss_epoch, train_loss_batch, epoch_time, test_loss_epoch = \
ExpToolKit.train_and_test(**train_setting, is_print=is_print)
print(f"test loss: {test_loss_epoch[-1]}, seconds: {epoch_time[-1]}")
test_losses.append(test_loss_epoch[-1])
print(f"average/best test loss: {sum(test_losses)/len(test_losses)} / {min(test_losses)}")
Testing the baseline MLP model ←
The shipped configuration files enable testing all the models from Table 2 on
l2 reconstruction loss of the MNIST dataset with a latent dimension of 16,
with Adam optimizer at learnrate 0.0001 and weight decay of 0.0001. In the paper,
the baseline model is called AE
but i switched to MLP
for this article.
run_experiment("model_config/config0.yaml")
model_config/config0.yaml {'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=16, bias=True) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=784, bias=True) ) (1): Sigmoid() ) ), 'train_loader': <torch.utils.data.dataloader.DataLoader at 0x7f0dcc999fd0>, 'test_loader': <torch.utils.data.dataloader.DataLoader at 0x7f0dcc999f70>, 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'epochs': 10, 'device': device(type='cuda'), 'random_seed': 2024} model parameters: 25,888 trial 1/10 test loss: 0.05721132168546319, seconds: 51.601099491119385 trial 2/10 test loss: 0.05495429076254368, seconds: 50.91784954071045 trial 3/10 test loss: 0.054549571592360735, seconds: 51.24936056137085 trial 4/10 test loss: 0.05354054784402251, seconds: 51.37986373901367 trial 5/10 test loss: 0.05453997133299708, seconds: 52.185614347457886 trial 6/10 test loss: 0.05438828486949206, seconds: 52.00752115249634 trial 7/10 test loss: 0.05443032365292311, seconds: 52.78800082206726 trial 8/10 test loss: 0.05106307221576571, seconds: 52.04966950416565 trial 9/10 test loss: 0.049927499424666164, seconds: 52.67065501213074 trial 10/10 test loss: 0.04967067549005151, seconds: 51.24051094055176 average/best test loss: 0.05342755588702858 / 0.04967067549005151
First of all, it's nice to see that my aged NVIDIA GeForce GTX 1660 Ti seems to be only twice as slow as the NVIDIA TITAN V used by the authors. So i can do some test of my own.
The average test loss of 0.053 about matches the loss reported in Table 2 of 0.056 ±0.002. Note that i only ran the lr=0.0001 setup. The paper reports the average of all runs including learnrate 0.00001 and two different settings of weight decay.
Testing the KAE (p=3) model ←
run_experiment("model_config/config6.yaml")
model_config/config6.yaml {'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): Sigmoid() ) ), 'train_loader': <torch.utils.data.dataloader.DataLoader at 0x7f0ee02d3af0>, 'test_loader': <torch.utils.data.dataloader.DataLoader at 0x7f0dc2b159d0>, 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'epochs': 10, 'device': device(type='cuda'), 'random_seed': 2024} model parameters: 101,152 trial 1/10 test loss: 0.024415594851598145, seconds: 64.95739388465881 trial 2/10 test loss: 0.024201368540525438, seconds: 64.82365393638611 trial 3/10 test loss: 0.024129191506654026, seconds: 65.10156893730164 trial 4/10 test loss: 0.025636526104062796, seconds: 64.00057244300842 trial 5/10 test loss: 0.0220940746832639, seconds: 64.52272725105286 trial 6/10 test loss: 0.021915703685954212, seconds: 63.072824239730835 trial 7/10 test loss: 0.027073210990056395, seconds: 61.848424196243286 trial 8/10 test loss: 0.02452602991834283, seconds: 61.77449369430542 trial 9/10 test loss: 0.02409802973270416, seconds: 63.03647565841675 trial 10/10 test loss: 0.024723049299791456, seconds: 62.12168884277344 average/best test loss: 0.024281277931295336 / 0.021915703685954212
Test loss of 0.024 also matches the result in the paper. I also tested the other models and all reported results were reproduced.
Now, there is an obvious detail that jumps out. The KAE model has 4 times the number of parameters as the MLP model. This is addressed in section 4.4 of the paper. The only way to increase the number of parameters in the MLP autoencoder is to add a new layer.
Testing MLP with 64 hidden dims ←
The authors add a hidden layer with 64 dimensions, which increases the MLP model parameters to 103,328, which almost exactly matches the KAE (p=3) model size.
(From now on, i spare you with the full textual output, which is listed in the Appendix)
run_experiment("model_config/config0.yaml", overrides={"MODEL": {"hidden_dims": [64]}})
StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=64, bias=True) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): Linear(in_features=64, out_features=16, bias=True) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=64, bias=True) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): Linear(in_features=64, out_features=784, bias=True) ) (3): Sigmoid() ) ) model parameters: 103,328 average/best test loss: 0.049578257398679854 / 0.047026059683412315
So, comparing the KAE and a similar sized MLP gives:
model | params | test loss (10 runs) |
---|---|---|
MLP (h=64) | 103,328 | 0.050 |
KAE (p=3) | 101,152 | 0.024 |
Testing different activation functions ←
Whenever i see a Sigmoid activation i think about the good ol' Geoff Hinton times, when this activation was used a lot. It puts everything between zero and one, which intuitively seems to be a good choice for generating images. However, personally, i never had a good experience with it. The common activation function today is ReLU or some variant of it. For example, in this MNIST autoencoder experiment the best functions were ReLU6 and LeakyReLU.
The author's code does not allow setting up the activation functions in the config file so i adjusted the code to test different functions.
MLP with ReLU6 ←
Replacing both the ReLU and the Sigmoid with ReLU6:
model_config/config0.yaml StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=16, bias=True) ) (1): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=784, bias=True) ) (1): ReLU6(inplace=True) ) ) average/best test loss: 0.03945172145031392 / 0.03638914376497269
It's significantly better than the original MLP from above. Adjusting the optimizer and the training batch size to my personal defaults, we can almost reach the KAE loss:
run_experiment("model_config/config0.yaml", overrides={
"TRAIN": {
"batch_size": 64,
"optim_type": "ADAMW",
"lr": 0.0003,
}
})
average/best test loss: 0.029271703445987333 / 0.02781044684682682
Adding the 64-dim hidden layer to match the KAE model size, however, produces a slightly worse loss of 0.03.
Using the same optimizer and batch size settings for the KAE model, performance drops from 0.024 to 0.035. So that does not help.
Switching the activation back to the original ReLU and Sigmoid but keeping the optimizer and batch size only yields a test loss of 0.030.
Obviously, the authors have found good meta-parameters for training the KAE model. They just don't hold for training the MLP.
I ran a couple more experiments but the reporting and the switching of activation functions got a bit difficult. So i forked the repository and added the necessary code to run the experiments automatically. To reproduce the results:
git clone https://github.com/defgsus/KAE
# setup your virtualenv and install requirements.txt, torch and torchvision, then
python defgsus train
python defgsus test
The details are listed in the Appendix (rendered with python defgsus markdown
).
Following is the table of all experiment results in compact form.
Results of l2 reconstruction on MNIST ←
model | act | params | optim | lr | batch size | test loss (10 runs)↓ |
train time (10 ep) |
---|---|---|---|---|---|---|---|
MLP | relu/sigmoid | 25,888 | Adam | 0.0001 | 256 | 0.0532 ±0.0020 | 48.4 sec |
MLP | relu | 25,888 | Adamw | 0.0003 | 64 | 0.0294 ±0.0010 | 55.2 sec |
MLP | relu6 | 25,888 | Adam | 0.0001 | 256 | 0.0392 ±0.0015 | 49.4 sec |
MLP | relu6 | 25,888 | Adamw | 0.0003 | 32 | 0.0298 ±0.0011 | 64.1 sec |
MLP | relu6 | 25,888 | Adamw | 0.0003 | 64 | 0.0293 ±0.0010 | 55.1 sec |
MLP | relu6 | 25,888 | Adamw | 0.0003 | 128 | 0.0304 ±0.0009 | 51.1 sec |
MLP (hid=64) | relu/sigmoid | 103,328 | Adam | 0.0001 | 256 | 0.0496 ±0.0018 | 50.1 sec |
MLP (hid=64) | relu6 | 103,328 | Adamw | 0.0003 | 64 | 0.0301 ±0.0015 | 59.1 sec |
MLP (hid=128) | relu6 | 205,856 | Adamw | 0.0003 | 64 | 0.0267 ±0.0013 | 69.1 sec |
MLP (hid=256) | relu6 | 410,912 | Adamw | 0.0003 | 64 | 0.0239 ±0.0011 | 69.4 sec |
KAE (p=3) | relu/sigmoid | 101,152 | Adam | 0.0001 | 256 | 0.0243 ±0.0014 | 62.3 sec |
KAE (p=4) | relu/sigmoid | 126,240 | Adam | 0.0001 | 256 | 0.0228 ±0.0011 | 67.2 sec |
KAE (p=5) | relu/sigmoid | 151,328 | Adam | 0.0001 | 256 | 0.0224 ±0.0009 | 75.2 sec |
KAE (p=6) | relu/sigmoid | 176,416 | Adam | 0.0001 | 256 | 0.0227 ±0.0011 | 81.8 sec |
KAE (p=3) | relu6/sigmoid | 101,152 | Adam | 0.0001 | 256 | 0.0235 ±0.0012 | 63.7 sec |
KAE (p=3) | relu/sigmoid | 101,152 | Adam | 0.0001 | 64 | 0.0256 ±0.0020 | 72.5 sec |
KAE (p=3) | relu/sigmoid | 101,152 | Adam | 0.0001 | 512 | 0.0255 ±0.0007 | 61.5 sec |
KAE (p=3) | relu6 | 101,152 | Adamw | 0.0003 | 64 | 0.0346 ±0.0023 | 70.8 sec |
KAE (p=3) | relu/sigmoid | 101,152 | Adamw | 0.0003 | 64 | 0.0308 ±0.0032 | 70.5 sec |
KAE (hid=64, p=2) | relu/sigmoid | 308,128 | Adam | 0.0001 | 256 | 0.0256 ±0.0014 | 94.6 sec |
KAE (hid=128, p=2) | relu/sigmoid | 615,456 | Adam | 0.0001 | 256 | 0.0218 ±0.0015 | 157.9 sec |
KAE (hid=256, p=2) | relu/sigmoid | 1,230,112 | Adam | 0.0001 | 256 | 0.0226 ±0.0029 | 250.1 sec |
KAE (hid=64, p=3) | relu/sigmoid | 410,528 | Adam | 0.0001 | 256 | 0.0222 ±0.0016 | 105.7 sec |
KAE (hid=128, p=3) | relu/sigmoid | 820,256 | Adam | 0.0001 | 256 | 0.0176 ±0.0007 | 196.8 sec |
KAE (hid=256, p=3) | relu/sigmoid | 1,639,712 | Adam | 0.0001 | 256 | 0.0159 ±0.0010 | 331.3 sec |
KAE (hid=64, p=4) | relu/sigmoid | 512,928 | Adam | 0.0001 | 256 | 0.0199 ±0.0019 | 142.4 sec |
KAE (hid=128, p=4) | relu/sigmoid | 1,025,056 | Adam | 0.0001 | 256 | 0.0165 ±0.0008 | 258.2 sec |
KAE (hid=256, p=4) | relu/sigmoid | 2,049,312 | Adam | 0.0001 | 256 | 0.0149 ±0.0006 | 437.4 sec |
KAE (hid=64, p=5) | relu/sigmoid | 615,328 | Adam | 0.0001 | 256 | 0.0182 ±0.0009 | 155.1 sec |
KAE (hid=128, p=5) | relu/sigmoid | 1,229,856 | Adam | 0.0001 | 256 | 0.0155 ±0.0007 | 332.4 sec |
KAE (hid=256, p=5) | relu/sigmoid | 2,458,912 | Adam | 0.0001 | 256 | 0.0141 ±0.0007 | 553.6 sec |
There surely is a way to increase the performance of the proposed KAE (p=3) model, with the right batch size, optimizer settings, input and layer normalization or other means. I did not find it, yet, but adding an extra layer shows significant performance gains!
However, the difference of performance between a well-trained, much smaller MLP auto-encoder and the KAN auto-encoder (p=5) is only 0.007 in my experiments which does not really justify using the term superior five times in the document. That might be my personal distaste but i rather would just term it increased performance.
Now, despite of all the numbers, what do the models actually do? They squeeze the 28x28 MNIST images through a 16-dim latent vector and reproduce them. The compression ratio is 49! Let's see some of the images from the MNIST validation set (odd columns are originals, even columns are reproductions):
example images | loss | model |
---|---|---|
![]() |
0.0532 | MLP, relu/sigmoid, ADAM, lr=0.0001, batch size=256 The simple MLP used in Table 2 of the paper |
![]() |
0.0293 | MLP, relu6, ADAMW, lr=0.0003, batch size=64 The improved MLP |
![]() |
0.0243 | KAE (p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256 Original KAE from paper |
![]() |
0.0218 | KAE (hid=128, p=2), relu/sigmoid, ADAM, lr=0.0001, batch size=256 Improved KAE with hidden layer |
![]() |
0.0141 | KAE (hid=256, p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256 Best model in these experiments, although completely oversized ;) |
Results of l2 reconstruction on MNIST including extra tasks ←
Below is the same table including the test results for classification, retrieval and denoising, as detailed in the author's README file. (You can click the headers to sort the table)
model | act | params | optim/lr/bs | test loss (10 runs)↓ |
train time (10 ep) |
classifier accuracy↑ |
retriever recall@5↑ |
denoiser salt&pepper↓ |
---|---|---|---|---|---|---|---|---|
MLP | relu/sigmoid | 25,888 | Adam/0.0001/256 | 0.0532 ±0.0020 | 48.4 sec | 0.8859 | 0.4021 | 0.0870 |
MLP | relu | 25,888 | Adamw/0.0003/64 | 0.0294 ±0.0010 | 55.2 sec | 0.9524 | 0.5261 | 0.0750 |
MLP | relu6 | 25,888 | Adam/0.0001/256 | 0.0392 ±0.0015 | 49.4 sec | 0.9266 | 0.5021 | 0.0772 |
MLP | relu6 | 25,888 | Adamw/0.0003/32 | 0.0298 ±0.0011 | 64.1 sec | 0.9515 | 0.5407 | 0.0762 |
MLP | relu6 | 25,888 | Adamw/0.0003/64 | 0.0293 ±0.0010 | 55.1 sec | 0.9523 | 0.5296 | 0.0750 |
MLP | relu6 | 25,888 | Adamw/0.0003/128 | 0.0304 ±0.0009 | 51.1 sec | 0.9492 | 0.5157 | 0.0742 |
MLP (hid=64) | relu/sigmoid | 103,328 | Adam/0.0001/256 | 0.0496 ±0.0018 | 50.1 sec | 0.8084 | 0.3531 | 0.0851 |
MLP (hid=64) | relu6 | 103,328 | Adamw/0.0003/64 | 0.0301 ±0.0015 | 59.1 sec | 0.9392 | 0.5348 | 0.0746 |
MLP (hid=128) | relu6 | 205,856 | Adamw/0.0003/64 | 0.0267 ±0.0013 | 69.1 sec | 0.9468 | 0.5641 | 0.0726 |
MLP (hid=256) | relu6 | 410,912 | Adamw/0.0003/64 | 0.0239 ±0.0011 | 69.4 sec | 0.9581 | 0.6025 | 0.0714 |
KAE (p=3) | relu/sigmoid | 101,152 | Adam/0.0001/256 | 0.0243 ±0.0014 | 62.3 sec | 0.9523 | 0.5446 | 0.0672 |
KAE (p=4) | relu/sigmoid | 126,240 | Adam/0.0001/256 | 0.0228 ±0.0011 | 67.2 sec | 0.9540 | 0.5375 | 0.0681 |
KAE (p=5) | relu/sigmoid | 151,328 | Adam/0.0001/256 | 0.0224 ±0.0009 | 75.2 sec | 0.9533 | 0.5375 | 0.0702 |
KAE (p=6) | relu/sigmoid | 176,416 | Adam/0.0001/256 | 0.0227 ±0.0011 | 81.8 sec | 0.9541 | 0.5608 | 0.0704 |
KAE (p=3) | relu6/sigmoid | 101,152 | Adam/0.0001/256 | 0.0235 ±0.0012 | 63.7 sec | 0.9592 | 0.6092 | 0.0665 |
KAE (p=3) | relu/sigmoid | 101,152 | Adam/0.0001/64 | 0.0256 ±0.0020 | 72.5 sec | 0.9530 | 0.5546 | 0.0676 |
KAE (p=3) | relu/sigmoid | 101,152 | Adam/0.0001/512 | 0.0255 ±0.0007 | 61.5 sec | 0.9399 | 0.5172 | 0.0688 |
KAE (p=3) | relu6 | 101,152 | Adamw/0.0003/64 | 0.0346 ±0.0023 | 70.8 sec | 0.9365 | 0.5522 | 0.0887 |
KAE (p=3) | relu/sigmoid | 101,152 | Adamw/0.0003/64 | 0.0308 ±0.0032 | 70.5 sec | 0.9370 | 0.5342 | 0.0788 |
KAE (hid=64, p=2) | relu/sigmoid | 308,128 | Adam/0.0001/256 | 0.0256 ±0.0014 | 94.6 sec | 0.9339 | 0.5227 | 0.0692 |
KAE (hid=128, p=2) | relu/sigmoid | 615,456 | Adam/0.0001/256 | 0.0218 ±0.0015 | 157.9 sec | 0.9527 | 0.5730 | 0.0649 |
KAE (hid=256, p=2) | relu/sigmoid | 1,230,112 | Adam/0.0001/256 | 0.0226 ±0.0029 | 250.1 sec | 0.9511 | 0.5753 | 0.0652 |
KAE (hid=64, p=3) | relu/sigmoid | 410,528 | Adam/0.0001/256 | 0.0222 ±0.0016 | 105.7 sec | 0.9529 | 0.5846 | 0.0706 |
KAE (hid=128, p=3) | relu/sigmoid | 820,256 | Adam/0.0001/256 | 0.0176 ±0.0007 | 196.8 sec | 0.9542 | 0.6200 | 0.0684 |
KAE (hid=256, p=3) | relu/sigmoid | 1,639,712 | Adam/0.0001/256 | 0.0159 ±0.0010 | 331.3 sec | 0.9582 | 0.6357 | 0.0656 |
KAE (hid=64, p=4) | relu/sigmoid | 512,928 | Adam/0.0001/256 | 0.0199 ±0.0019 | 142.4 sec | 0.9587 | 0.6048 | 0.0717 |
KAE (hid=128, p=4) | relu/sigmoid | 1,025,056 | Adam/0.0001/256 | 0.0165 ±0.0008 | 258.2 sec | 0.9592 | 0.6278 | 0.0676 |
KAE (hid=256, p=4) | relu/sigmoid | 2,049,312 | Adam/0.0001/256 | 0.0149 ±0.0006 | 437.4 sec | 0.9608 | 0.6482 | 0.0668 |
KAE (hid=64, p=5) | relu/sigmoid | 615,328 | Adam/0.0001/256 | 0.0182 ±0.0009 | 155.1 sec | 0.9566 | 0.6170 | 0.0698 |
KAE (hid=128, p=5) | relu/sigmoid | 1,229,856 | Adam/0.0001/256 | 0.0155 ±0.0007 | 332.4 sec | 0.9607 | 0.6241 | 0.0671 |
KAE (hid=256, p=5) | relu/sigmoid | 2,458,912 | Adam/0.0001/256 | 0.0141 ±0.0007 | 553.6 sec | 0.9612 | 0.6313 | 0.0675 |
In conclusion i would argue that this particular polynomial KAN-based autoencoder is an interesting new approach. The model code is easy to read and certainly invites for further experimentation.
Appendix ←
Just listing the lengthy experiment setups / outputs here. It's not much to see, just a proof of reproducibility. You can repeat the experiments with
git clone https://github.com/defgsus/KAE
# setup your virtualenv and install requirements.txt, torch and torchvision, then
python defgsus train
python defgsus test
# render table and output
python defgsus markdown
It takes a couple of hours, though!
MLP, relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=16, bias=True) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=784, bias=True) ) (1): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6afa90>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6af1f0>} model parameters: 25,888 trial 1/10 test loss: 0.054844805505126715, seconds: 48.74358034133911 trial 2/10 test loss: 0.05495429076254368, seconds: 47.40274238586426 trial 3/10 test loss: 0.054549571592360735, seconds: 48.13005805015564 trial 4/10 test loss: 0.05354054784402251, seconds: 47.44930839538574 trial 5/10 test loss: 0.05453997133299708, seconds: 47.6246395111084 trial 6/10 test loss: 0.05438828486949206, seconds: 47.376731872558594 trial 7/10 test loss: 0.05443032365292311, seconds: 48.409247636795044 trial 8/10 test loss: 0.05106307221576571, seconds: 47.933077812194824 trial 9/10 test loss: 0.049927499424666164, seconds: 50.86521673202515 trial 10/10 test loss: 0.04967067549005151, seconds: 50.02432680130005 average/best test loss: 0.053190904268994935 / 0.04967067549005151
MLP, relu, ADAMW, lr=0.0003, batch size=64 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': 'relu', 'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 64, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=16, bias=True) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=784, bias=True) ) (1): ReLU(inplace=True) ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d1351cd0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6af880>} model parameters: 25,888 trial 1/10 test loss: 0.029944378443679233, seconds: 55.40973496437073 trial 2/10 test loss: 0.027822728844205287, seconds: 54.660828590393066 trial 3/10 test loss: 0.03028506580384294, seconds: 54.57886362075806 trial 4/10 test loss: 0.029726570009425947, seconds: 54.942195415496826 trial 5/10 test loss: 0.028753836965484985, seconds: 55.46592450141907 trial 6/10 test loss: 0.02827958514688501, seconds: 54.46421766281128 trial 7/10 test loss: 0.03000373778876605, seconds: 55.709019899368286 trial 8/10 test loss: 0.029764349957939924, seconds: 54.3574538230896 trial 9/10 test loss: 0.028051003173088573, seconds: 56.52020335197449 trial 10/10 test loss: 0.03092550927666342, seconds: 55.56819820404053 average/best test loss: 0.02935567654099814 / 0.027822728844205287
MLP, relu6, ADAM, lr=0.0001, batch size=256 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': 'relu6', 'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=16, bias=True) ) (1): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=784, bias=True) ) (1): ReLU6(inplace=True) ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d1351760>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6afe20>} model parameters: 25,888 trial 1/10 test loss: 0.03937941854819656, seconds: 49.730987310409546 trial 2/10 test loss: 0.03887385474517942, seconds: 48.26471924781799 trial 3/10 test loss: 0.03937122141942382, seconds: 48.90603280067444 trial 4/10 test loss: 0.04118271768093109, seconds: 48.22850513458252 trial 5/10 test loss: 0.03768241703510285, seconds: 49.30234146118164 trial 6/10 test loss: 0.03638914376497269, seconds: 50.181209087371826 trial 7/10 test loss: 0.04111840622499585, seconds: 50.58647346496582 trial 8/10 test loss: 0.03880140176042914, seconds: 50.34295725822449 trial 9/10 test loss: 0.03865907033905387, seconds: 48.52420377731323 trial 10/10 test loss: 0.04104078523814678, seconds: 49.796762466430664 average/best test loss: 0.03924984367564321 / 0.03638914376497269
MLP, relu6, ADAMW, lr=0.0003, batch size=32 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': 'relu6', 'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 32, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=16, bias=True) ) (1): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=784, bias=True) ) (1): ReLU6(inplace=True) ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6afdc0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d1351d90>} model parameters: 25,888 trial 1/10 test loss: 0.030145346975555053, seconds: 63.46911311149597 trial 2/10 test loss: 0.028754215681562407, seconds: 65.45769238471985 trial 3/10 test loss: 0.03130609097595984, seconds: 63.62731218338013 trial 4/10 test loss: 0.02995197093501068, seconds: 63.822922229766846 trial 5/10 test loss: 0.028888749393125693, seconds: 65.16088700294495 trial 6/10 test loss: 0.027876560507824246, seconds: 64.96088886260986 trial 7/10 test loss: 0.030093948038431784, seconds: 63.83512210845947 trial 8/10 test loss: 0.030830478617034782, seconds: 63.844937801361084 trial 9/10 test loss: 0.02906663065996414, seconds: 63.632938861846924 trial 10/10 test loss: 0.030966824111037742, seconds: 63.29023098945618 average/best test loss: 0.029788081589550635 / 0.027876560507824246
MLP, relu6, ADAMW, lr=0.0003, batch size=64 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': 'relu6', 'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 64, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=16, bias=True) ) (1): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=784, bias=True) ) (1): ReLU6(inplace=True) ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6affd0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d13516a0>} model parameters: 25,888 trial 1/10 test loss: 0.029864317351940332, seconds: 55.62854290008545 trial 2/10 test loss: 0.02781044684682682, seconds: 54.88072180747986 trial 3/10 test loss: 0.030311387293278032, seconds: 53.7932653427124 trial 4/10 test loss: 0.029717421194740162, seconds: 55.48655819892883 trial 5/10 test loss: 0.02875516924318994, seconds: 55.250420808792114 trial 6/10 test loss: 0.028260746875860887, seconds: 55.32818794250488 trial 7/10 test loss: 0.029995871423061485, seconds: 54.1175856590271 trial 8/10 test loss: 0.02970408286401041, seconds: 55.907835483551025 trial 9/10 test loss: 0.02802042883767444, seconds: 55.862600564956665 trial 10/10 test loss: 0.03092010900568051, seconds: 55.07930874824524 average/best test loss: 0.029335998093626296 / 0.02781044684682682
MLP, relu6, ADAMW, lr=0.0003, batch size=128 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': 'relu6', 'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 128, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=16, bias=True) ) (1): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=784, bias=True) ) (1): ReLU6(inplace=True) ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cc640>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cc280>} model parameters: 25,888 trial 1/10 test loss: 0.030623565604792364, seconds: 51.72880673408508 trial 2/10 test loss: 0.02996114940865885, seconds: 51.676554679870605 trial 3/10 test loss: 0.03071878032310854, seconds: 50.302478313446045 trial 4/10 test loss: 0.02936490311558488, seconds: 49.930197954177856 trial 5/10 test loss: 0.029585700600018985, seconds: 50.63589954376221 trial 6/10 test loss: 0.02945860029681574, seconds: 52.13498044013977 trial 7/10 test loss: 0.03143414425887639, seconds: 50.9267463684082 trial 8/10 test loss: 0.03150970132762118, seconds: 49.932409048080444 trial 9/10 test loss: 0.02962162767690194, seconds: 51.81868243217468 trial 10/10 test loss: 0.032096313666316524, seconds: 51.64334535598755 average/best test loss: 0.030437448627869547 / 0.02936490311558488
MLP (hid=64), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [64], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=64, bias=True) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): Linear(in_features=64, out_features=16, bias=True) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=64, bias=True) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): Linear(in_features=64, out_features=784, bias=True) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d1351cd0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cc040>} model parameters: 103,328 trial 1/10 test loss: 0.051934999600052836, seconds: 50.947391748428345 trial 2/10 test loss: 0.04881709320470691, seconds: 48.80104160308838 trial 3/10 test loss: 0.0477697504684329, seconds: 50.15599513053894 trial 4/10 test loss: 0.04974388424307108, seconds: 51.431276082992554 trial 5/10 test loss: 0.048030237294733526, seconds: 51.55111837387085 trial 6/10 test loss: 0.04976865621283651, seconds: 48.58534121513367 trial 7/10 test loss: 0.047026059683412315, seconds: 49.52819037437439 trial 8/10 test loss: 0.05237543722614646, seconds: 51.40358114242554 trial 9/10 test loss: 0.04865111084654927, seconds: 48.65099883079529 trial 10/10 test loss: 0.05166534520685673, seconds: 49.85130286216736 average/best test loss: 0.049578257398679854 / 0.047026059683412315
MLP (hid=64), relu6, ADAMW, lr=0.0003, batch size=64 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': 'relu6', 'hidden_dims': [64], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 64, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=64, bias=True) ) (1): ReLU6(inplace=True) (2): DenseLayer( (layer): Linear(in_features=64, out_features=16, bias=True) ) (3): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=64, bias=True) ) (1): ReLU6(inplace=True) (2): DenseLayer( (layer): Linear(in_features=64, out_features=784, bias=True) ) (3): ReLU6(inplace=True) ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cc5b0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10ccac0>} model parameters: 103,328 trial 1/10 test loss: 0.031225492110013204, seconds: 58.489842653274536 trial 2/10 test loss: 0.03038751133450657, seconds: 57.93046164512634 trial 3/10 test loss: 0.030448248562444546, seconds: 59.022109031677246 trial 4/10 test loss: 0.03345142389131579, seconds: 59.17180943489075 trial 5/10 test loss: 0.0289119388789508, seconds: 59.28048276901245 trial 6/10 test loss: 0.03027447387813383, seconds: 59.327950954437256 trial 7/10 test loss: 0.028075553085299056, seconds: 59.53023934364319 trial 8/10 test loss: 0.029163281296848493, seconds: 60.20081067085266 trial 9/10 test loss: 0.03053849844178956, seconds: 60.13843059539795 trial 10/10 test loss: 0.028196997003285748, seconds: 58.36703062057495 average/best test loss: 0.030067341848258766 / 0.028075553085299056
MLP (hid=128), relu6, ADAMW, lr=0.0003, batch size=64 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': 'relu6', 'hidden_dims': [128], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 64, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=128, bias=True) ) (1): ReLU6(inplace=True) (2): DenseLayer( (layer): Linear(in_features=128, out_features=16, bias=True) ) (3): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=128, bias=True) ) (1): ReLU6(inplace=True) (2): DenseLayer( (layer): Linear(in_features=128, out_features=784, bias=True) ) (3): ReLU6(inplace=True) ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f8baf281970>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f8baf2816d0>} model parameters: 205,856 trial 1/10 test loss: 0.02757667852150407, seconds: 70.6561987400055 trial 2/10 test loss: 0.029030630614157695, seconds: 66.79325604438782 trial 3/10 test loss: 0.025214034945342193, seconds: 67.67058849334717 trial 4/10 test loss: 0.026811098003653205, seconds: 71.8734381198883 trial 5/10 test loss: 0.02679019872170345, seconds: 73.1500997543335 trial 6/10 test loss: 0.024743097069062244, seconds: 68.46631598472595 trial 7/10 test loss: 0.02485356943764884, seconds: 68.00167441368103 trial 8/10 test loss: 0.027746190607642673, seconds: 69.04577732086182 trial 9/10 test loss: 0.026676727375786774, seconds: 69.02845931053162 trial 10/10 test loss: 0.02724222306185847, seconds: 65.86859202384949 average/best test loss: 0.02666844483583596 / 0.024743097069062244
MLP (hid=256), relu6, ADAMW, lr=0.0003, batch size=64 ←
model_config/config0.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': 'relu6', 'hidden_dims': [256], 'latent_dim': 16, 'layer_type': 'LINEAR', 'model_type': 'AE'}, 'TRAIN': {'batch_size': 64, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=784, out_features=256, bias=True) ) (1): ReLU6(inplace=True) (2): DenseLayer( (layer): Linear(in_features=256, out_features=16, bias=True) ) (3): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): Linear(in_features=16, out_features=256, bias=True) ) (1): ReLU6(inplace=True) (2): DenseLayer( (layer): Linear(in_features=256, out_features=784, bias=True) ) (3): ReLU6(inplace=True) ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f8baf281130>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f8baf281a60>} model parameters: 410,912 trial 1/10 test loss: 0.02382377028512727, seconds: 67.96561050415039 trial 2/10 test loss: 0.024177915114126387, seconds: 66.49497652053833 trial 3/10 test loss: 0.0236373558687936, seconds: 67.77597880363464 trial 4/10 test loss: 0.02233525288475167, seconds: 67.86813306808472 trial 5/10 test loss: 0.026097432799210216, seconds: 69.33182644844055 trial 6/10 test loss: 0.025345706956306842, seconds: 72.02309012413025 trial 7/10 test loss: 0.02403836791065468, seconds: 70.12623643875122 trial 8/10 test loss: 0.02219296034401769, seconds: 68.07407331466675 trial 9/10 test loss: 0.023282034619218985, seconds: 71.18422937393188 trial 10/10 test loss: 0.024291369779284592, seconds: 72.9444727897644 average/best test loss: 0.023922216656149194 / 0.02219296034401769
KAE (p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cc550>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10ccdf0>} model parameters: 101,152 trial 1/10 test loss: 0.024415594851598145, seconds: 61.92895293235779 trial 2/10 test loss: 0.024201368540525438, seconds: 60.32642364501953 trial 3/10 test loss: 0.024129191506654026, seconds: 62.38642168045044 trial 4/10 test loss: 0.025636526104062796, seconds: 61.56795644760132 trial 5/10 test loss: 0.0220940746832639, seconds: 62.635332107543945 trial 6/10 test loss: 0.021915703685954212, seconds: 62.71131467819214 trial 7/10 test loss: 0.027073210990056395, seconds: 63.939212799072266 trial 8/10 test loss: 0.02452602991834283, seconds: 63.21239233016968 trial 9/10 test loss: 0.02409802973270416, seconds: 62.5829131603241 trial 10/10 test loss: 0.024723049299791456, seconds: 61.32445979118347 average/best test loss: 0.024281277931295336 / 0.021915703685954212
KAE (p=4), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 4}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=4) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=4) ) (1): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cc3a0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6af9a0>} model parameters: 126,240 trial 1/10 test loss: 0.0213471460621804, seconds: 65.92803621292114 trial 2/10 test loss: 0.022648265538737177, seconds: 67.24404215812683 trial 3/10 test loss: 0.022140852641314268, seconds: 68.13891673088074 trial 4/10 test loss: 0.02260885932482779, seconds: 67.84793710708618 trial 5/10 test loss: 0.024192711059004068, seconds: 67.24179553985596 trial 6/10 test loss: 0.02392191574908793, seconds: 66.79944491386414 trial 7/10 test loss: 0.024204003950580956, seconds: 68.41916704177856 trial 8/10 test loss: 0.02161008436232805, seconds: 67.58119535446167 trial 9/10 test loss: 0.023785065673291684, seconds: 65.27455425262451 trial 10/10 test loss: 0.021693919878453018, seconds: 67.33229947090149 average/best test loss: 0.022815282423980534 / 0.0213471460621804
KAE (p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 5}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=5) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=5) ) (1): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cce80>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d1351a90>} model parameters: 151,328 trial 1/10 test loss: 0.023980671400204302, seconds: 74.61662101745605 trial 2/10 test loss: 0.022810124233365058, seconds: 74.86650276184082 trial 3/10 test loss: 0.021235586237162353, seconds: 74.01571750640869 trial 4/10 test loss: 0.02169646918773651, seconds: 74.23548221588135 trial 5/10 test loss: 0.02105184900574386, seconds: 75.06123161315918 trial 6/10 test loss: 0.023493063217028976, seconds: 76.60210585594177 trial 7/10 test loss: 0.023077776981517674, seconds: 75.59740281105042 trial 8/10 test loss: 0.02147274692542851, seconds: 74.46156001091003 trial 9/10 test loss: 0.022462127590551974, seconds: 76.81111788749695 trial 10/10 test loss: 0.022454827837646008, seconds: 75.66343092918396 average/best test loss: 0.022373524261638522 / 0.02105184900574386
KAE (p=6), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 6}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=6) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=6) ) (1): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cc7c0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6afd60>} model parameters: 176,416 trial 1/10 test loss: 0.02248598770238459, seconds: 79.81454825401306 trial 2/10 test loss: 0.022188912145793438, seconds: 79.89576363563538 trial 3/10 test loss: 0.024314309703186154, seconds: 81.67849159240723 trial 4/10 test loss: 0.024558632262051107, seconds: 81.62779688835144 trial 5/10 test loss: 0.02227715775370598, seconds: 81.00719881057739 trial 6/10 test loss: 0.02273458130657673, seconds: 83.1842896938324 trial 7/10 test loss: 0.023906563920900226, seconds: 81.46982002258301 trial 8/10 test loss: 0.02215076144784689, seconds: 83.68938827514648 trial 9/10 test loss: 0.021122285490855576, seconds: 83.52689623832703 trial 10/10 test loss: 0.021565554104745387, seconds: 81.79217624664307 average/best test loss: 0.02273047458380461 / 0.021122285490855576
KAE (p=3), relu6/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': ['relu6', 'sigmoid'], 'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10ccbe0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6afdc0>} model parameters: 101,152 trial 1/10 test loss: 0.02309356229379773, seconds: 64.85483431816101 trial 2/10 test loss: 0.02193830031901598, seconds: 64.11653161048889 trial 3/10 test loss: 0.024399833753705025, seconds: 65.46109867095947 trial 4/10 test loss: 0.023626095009967686, seconds: 64.1012167930603 trial 5/10 test loss: 0.02228605728596449, seconds: 64.01207900047302 trial 6/10 test loss: 0.022312369430437684, seconds: 63.67721509933472 trial 7/10 test loss: 0.025964177446439862, seconds: 63.69110441207886 trial 8/10 test loss: 0.023402246087789534, seconds: 61.47948455810547 trial 9/10 test loss: 0.02332183890976012, seconds: 63.08373951911926 trial 10/10 test loss: 0.025107008311897515, seconds: 62.94281458854675 average/best test loss: 0.023545148884877562 / 0.02193830031901598
KAE (p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=64 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 64, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cc310>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6afd90>} model parameters: 101,152 trial 1/10 test loss: 0.02615715488554186, seconds: 72.68000745773315 trial 2/10 test loss: 0.02310845555417287, seconds: 71.651535987854 trial 3/10 test loss: 0.02798851725354696, seconds: 71.62791204452515 trial 4/10 test loss: 0.027769463587623493, seconds: 73.73475170135498 trial 5/10 test loss: 0.023415053381946434, seconds: 72.88235187530518 trial 6/10 test loss: 0.02224666504248692, seconds: 73.22373843193054 trial 7/10 test loss: 0.027987814644814295, seconds: 73.20678901672363 trial 8/10 test loss: 0.02645594084481145, seconds: 73.04287147521973 trial 9/10 test loss: 0.026078407195912805, seconds: 73.70239424705505 trial 10/10 test loss: 0.02469016543951384, seconds: 69.29945802688599 average/best test loss: 0.025589763783037095 / 0.02224666504248692
KAE (p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=512 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 512, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d10cce20>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f36ef6afb80>} model parameters: 101,152 trial 1/10 test loss: 0.025707779359072445, seconds: 59.64279365539551 trial 2/10 test loss: 0.02481404719874263, seconds: 60.674640417099 trial 3/10 test loss: 0.026527704764157535, seconds: 60.42444372177124 trial 4/10 test loss: 0.025667520519346, seconds: 62.54549741744995 trial 5/10 test loss: 0.025927974842488766, seconds: 61.43470597267151 trial 6/10 test loss: 0.02496118126437068, seconds: 63.49837613105774 trial 7/10 test loss: 0.026538813766092063, seconds: 62.94175696372986 trial 8/10 test loss: 0.024276717472821473, seconds: 62.804282665252686 trial 9/10 test loss: 0.025847604498267174, seconds: 60.39290261268616 trial 10/10 test loss: 0.02517965892329812, seconds: 60.71731376647949 average/best test loss: 0.025544900260865692 / 0.024276717472821473
KAE (p=3), relu6, ADAMW, lr=0.0003, batch size=64 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'activation': ['relu6', 'relu6'], 'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 64, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU6(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU6(inplace=True) ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d1351c70>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d1351130>} model parameters: 101,152 trial 1/10 test loss: 0.03023416113559228, seconds: 70.26620602607727 trial 2/10 test loss: 0.039044849622021816, seconds: 71.62893223762512 trial 3/10 test loss: 0.0326307240375288, seconds: 71.98599314689636 trial 4/10 test loss: 0.03546572939320734, seconds: 70.04723477363586 trial 5/10 test loss: 0.03381784916351176, seconds: 70.74350953102112 trial 6/10 test loss: 0.03386436209414795, seconds: 70.8341600894928 trial 7/10 test loss: 0.03602883121247884, seconds: 70.6436402797699 trial 8/10 test loss: 0.036401670034618895, seconds: 70.8380868434906 trial 9/10 test loss: 0.035208017191594575, seconds: 70.97638368606567 trial 10/10 test loss: 0.03290520119629088, seconds: 69.85479497909546 average/best test loss: 0.034560139508099316 / 0.03023416113559228
KAE (p=3), relu/sigmoid, ADAMW, lr=0.0003, batch size=64 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 64, 'epochs': 10, 'lr': 0.0003, 'optim_type': 'ADAMW', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): Sigmoid() ) ), 'optimizer': AdamW ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0003 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d1351e80>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f37d13512b0>} model parameters: 101,152 trial 1/10 test loss: 0.027002361751380998, seconds: 69.63095498085022 trial 2/10 test loss: 0.03218813594074765, seconds: 68.59184980392456 trial 3/10 test loss: 0.0328096655571157, seconds: 71.18175435066223 trial 4/10 test loss: 0.03489779873163837, seconds: 70.70620441436768 trial 5/10 test loss: 0.026414862899169042, seconds: 70.2962257862091 trial 6/10 test loss: 0.02531469639414435, seconds: 70.63081979751587 trial 7/10 test loss: 0.03441416723712994, seconds: 69.97365498542786 trial 8/10 test loss: 0.030439759812252536, seconds: 71.21950697898865 trial 9/10 test loss: 0.03216738462637944, seconds: 70.41167783737183 trial 10/10 test loss: 0.032150394182391226, seconds: 72.3699688911438 average/best test loss: 0.030779922713234924 / 0.02531469639414435
KAE (hid=64, p=2), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [64], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 2}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=2) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=2) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=2) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=2) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7fb46abbdbe0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7fb46abbdc10>} model parameters: 308,128 trial 1/10 test loss: 0.024810216249898077, seconds: 87.20655822753906 trial 2/10 test loss: 0.025054814480245113, seconds: 90.82882761955261 trial 3/10 test loss: 0.029328040825203062, seconds: 93.76674461364746 trial 4/10 test loss: 0.026678384700790047, seconds: 94.57724380493164 trial 5/10 test loss: 0.024223004141822456, seconds: 94.48876214027405 trial 6/10 test loss: 0.024324131943285466, seconds: 93.90495753288269 trial 7/10 test loss: 0.02501974389888346, seconds: 96.939692735672 trial 8/10 test loss: 0.026064124144613742, seconds: 96.95923495292664 trial 9/10 test loss: 0.025096864346414803, seconds: 98.49881863594055 trial 10/10 test loss: 0.025496953958645464, seconds: 98.96546983718872 average/best test loss: 0.02560962786898017 / 0.024223004141822456
KAE (hid=128, p=2), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [128], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 2}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=2) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=2) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=2) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=2) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7fb549e842b0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7fb46abbd9d0>} model parameters: 615,456 trial 1/10 test loss: 0.02336266408674419, seconds: 158.22575211524963 trial 2/10 test loss: 0.019091003900393845, seconds: 158.271062374115 trial 3/10 test loss: 0.019837975315749646, seconds: 159.3715991973877 trial 4/10 test loss: 0.02341140890493989, seconds: 155.35668230056763 trial 5/10 test loss: 0.02207847419194877, seconds: 152.05787658691406 trial 6/10 test loss: 0.020837245509028435, seconds: 158.38157606124878 trial 7/10 test loss: 0.02213256466202438, seconds: 162.78448700904846 trial 8/10 test loss: 0.020774537371471523, seconds: 161.09157872200012 trial 9/10 test loss: 0.023839181195944546, seconds: 156.59118175506592 trial 10/10 test loss: 0.022777973441407084, seconds: 157.24475932121277 average/best test loss: 0.02181430285796523 / 0.019091003900393845
KAE (hid=256, p=2), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [256], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 2}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=2) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=2) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=2) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=2) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7fb46abbdd90>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7fb46abbd490>} model parameters: 1,230,112 trial 1/10 test loss: 0.026064584124833347, seconds: 250.39859056472778 trial 2/10 test loss: 0.024486870411783455, seconds: 249.52806568145752 trial 3/10 test loss: 0.02816329142078757, seconds: 250.88722562789917 trial 4/10 test loss: 0.01849869654979557, seconds: 250.4723780155182 trial 5/10 test loss: 0.023125345911830665, seconds: 250.70067381858826 trial 6/10 test loss: 0.022905606171116234, seconds: 246.0978467464447 trial 7/10 test loss: 0.021652239840477705, seconds: 250.5683355331421 trial 8/10 test loss: 0.019134251540526746, seconds: 251.5078740119934 trial 9/10 test loss: 0.021603700052946807, seconds: 250.71569180488586 trial 10/10 test loss: 0.020234312349930405, seconds: 250.3537197113037 average/best test loss: 0.02258688983740285 / 0.01849869654979557
KAE (hid=64, p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [64], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=3) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=3) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7fa16d6daee0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7fa16d6da130>} model parameters: 410,528 trial 1/10 test loss: 0.022235007444396614, seconds: 101.52159070968628 trial 2/10 test loss: 0.022399124642834067, seconds: 102.52322673797607 trial 3/10 test loss: 0.02662791032344103, seconds: 105.62324666976929 trial 4/10 test loss: 0.022848214395344256, seconds: 104.23346161842346 trial 5/10 test loss: 0.02125693657435477, seconds: 107.69089484214783 trial 6/10 test loss: 0.021431715320795776, seconds: 106.03088760375977 trial 7/10 test loss: 0.02070729318074882, seconds: 107.04297685623169 trial 8/10 test loss: 0.022259943094104527, seconds: 106.59215521812439 trial 9/10 test loss: 0.021174417017027734, seconds: 106.26171398162842 trial 10/10 test loss: 0.021046917978674175, seconds: 109.13478350639343 average/best test loss: 0.022198747997172176 / 0.02070729318074882
KAE (hid=128, p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [128], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=3) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=3) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f5243ed50a0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f516422b250>} model parameters: 820,256 trial 1/10 test loss: 0.018272698298096655, seconds: 189.71533584594727 trial 2/10 test loss: 0.018411319074220955, seconds: 190.0505406856537 trial 3/10 test loss: 0.01766556976363063, seconds: 206.22142052650452 trial 4/10 test loss: 0.01832906869240105, seconds: 196.5188329219818 trial 5/10 test loss: 0.01683416806627065, seconds: 196.67028522491455 trial 6/10 test loss: 0.016869719396345316, seconds: 202.8109085559845 trial 7/10 test loss: 0.016718478570692242, seconds: 199.81436681747437 trial 8/10 test loss: 0.016854225541464984, seconds: 194.35895204544067 trial 9/10 test loss: 0.018173357425257563, seconds: 194.3624300956726 trial 10/10 test loss: 0.018231959571130572, seconds: 197.3516390323639 average/best test loss: 0.017636056439951062 / 0.016718478570692242
KAE (hid=256, p=3), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [256], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 3}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=3) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=3) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=3) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f187cf11340>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f187cf11040>} model parameters: 1,639,712 trial 1/10 test loss: 0.016384680382907392, seconds: 326.47234320640564 trial 2/10 test loss: 0.015581317734904588, seconds: 331.4330863952637 trial 3/10 test loss: 0.017017113068141042, seconds: 331.6513261795044 trial 4/10 test loss: 0.015830345428548755, seconds: 327.08414101600647 trial 5/10 test loss: 0.017262230068445204, seconds: 328.4695780277252 trial 6/10 test loss: 0.014666992449201643, seconds: 329.8385400772095 trial 7/10 test loss: 0.014286837540566921, seconds: 337.83651876449585 trial 8/10 test loss: 0.016350188408978283, seconds: 335.5007619857788 trial 9/10 test loss: 0.014879045519046485, seconds: 333.5552954673767 trial 10/10 test loss: 0.0171417145524174, seconds: 331.24688816070557 average/best test loss: 0.01594004651531577 / 0.014286837540566921
KAE (hid=64, p=4), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [64], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 4}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=4) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=4) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=4) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=4) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c9638b0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c9638e0>} model parameters: 512,928 trial 1/10 test loss: 0.019172179815359413, seconds: 147.43125009536743 trial 2/10 test loss: 0.01767512778751552, seconds: 139.19849252700806 trial 3/10 test loss: 0.017656653327867387, seconds: 142.9882152080536 trial 4/10 test loss: 0.019164289673790337, seconds: 138.85816836357117 trial 5/10 test loss: 0.02301151561550796, seconds: 139.53188729286194 trial 6/10 test loss: 0.022694176388904454, seconds: 142.0449194908142 trial 7/10 test loss: 0.01957593164406717, seconds: 147.18947196006775 trial 8/10 test loss: 0.017888742545619608, seconds: 142.23271703720093 trial 9/10 test loss: 0.02021964266896248, seconds: 141.51794171333313 trial 10/10 test loss: 0.022180935135111213, seconds: 143.4875738620758 average/best test loss: 0.019923919460270556 / 0.017656653327867387
KAE (hid=128, p=4), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [128], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 4}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=4) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=4) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=4) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=4) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c963040>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c963bb0>} model parameters: 1,025,056 trial 1/10 test loss: 0.01637132160831243, seconds: 255.22766876220703 trial 2/10 test loss: 0.016493089473806323, seconds: 252.3838324546814 trial 3/10 test loss: 0.015356263937428593, seconds: 252.7017102241516 trial 4/10 test loss: 0.015355625795200467, seconds: 256.5977442264557 trial 5/10 test loss: 0.017132615856826305, seconds: 264.0748360157013 trial 6/10 test loss: 0.016231764946132897, seconds: 263.48304653167725 trial 7/10 test loss: 0.015862736152485013, seconds: 261.52497696876526 trial 8/10 test loss: 0.01712557568680495, seconds: 264.40426325798035 trial 9/10 test loss: 0.017479192721657454, seconds: 254.95627284049988 trial 10/10 test loss: 0.01787043015938252, seconds: 256.4294128417969 average/best test loss: 0.0165278616338037 / 0.015355625795200467
KAE (hid=256, p=4), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [256], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 4}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=4) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=4) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=4) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=4) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c963100>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c963490>} model parameters: 2,049,312 trial 1/10 test loss: 0.01558066001161933, seconds: 441.3702812194824 trial 2/10 test loss: 0.014235765440389514, seconds: 434.6101531982422 trial 3/10 test loss: 0.015248811431229114, seconds: 428.8143413066864 trial 4/10 test loss: 0.01436346652917564, seconds: 427.17020058631897 trial 5/10 test loss: 0.015836665499955417, seconds: 429.63030791282654 trial 6/10 test loss: 0.015187899000011384, seconds: 429.3934314250946 trial 7/10 test loss: 0.01566753287333995, seconds: 442.68628454208374 trial 8/10 test loss: 0.014760070107877254, seconds: 447.96484541893005 trial 9/10 test loss: 0.014156273938715458, seconds: 444.4883916378021 trial 10/10 test loss: 0.014120391220785677, seconds: 447.82466983795166 average/best test loss: 0.014915753605309872 / 0.014120391220785677
KAE (hid=64, p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [64], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 5}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=5) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=5) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=5) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=5) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f1daaab4fd0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f1daaab4250>} model parameters: 615,328 trial 1/10 test loss: 0.01908638181630522, seconds: 145.28206825256348 trial 2/10 test loss: 0.01730156985577196, seconds: 148.1776647567749 trial 3/10 test loss: 0.01892505888827145, seconds: 151.5980896949768 trial 4/10 test loss: 0.019929102598689498, seconds: 159.41308116912842 trial 5/10 test loss: 0.018801589566282927, seconds: 161.51790499687195 trial 6/10 test loss: 0.01736636960413307, seconds: 160.77034425735474 trial 7/10 test loss: 0.017208101460710168, seconds: 161.73465490341187 trial 8/10 test loss: 0.018564854608848692, seconds: 157.03404307365417 trial 9/10 test loss: 0.017425758531317115, seconds: 152.50793719291687 trial 10/10 test loss: 0.01755934504326433, seconds: 152.81868195533752 average/best test loss: 0.018216813197359443 / 0.017208101460710168
KAE (hid=128, p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [128], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 5}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=5) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=5) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=5) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=5) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c9638e0>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c963280>} model parameters: 1,229,856 trial 1/10 test loss: 0.015412552421912551, seconds: 329.07769083976746 trial 2/10 test loss: 0.014418959734030068, seconds: 324.79892349243164 trial 3/10 test loss: 0.01620185296051204, seconds: 326.28697967529297 trial 4/10 test loss: 0.014521781401708723, seconds: 320.42595863342285 trial 5/10 test loss: 0.015382755897007883, seconds: 340.09007692337036 trial 6/10 test loss: 0.015597944264300168, seconds: 340.1423282623291 trial 7/10 test loss: 0.015459689102135599, seconds: 341.27795457839966 trial 8/10 test loss: 0.01595883092377335, seconds: 334.1474413871765 trial 9/10 test loss: 0.015460120351053774, seconds: 340.4838047027588 trial 10/10 test loss: 0.016732382122427225, seconds: 327.4674618244171 average/best test loss: 0.015514686917886137 / 0.014418959734030068
KAE (hid=256, p=5), relu/sigmoid, ADAM, lr=0.0001, batch size=256 ←
model_config/config6.yaml updated config: {'DATA': {'type': 'MNIST'}, 'MODEL': {'hidden_dims': [256], 'latent_dim': 16, 'layer_type': 'KAE', 'model_type': 'AE', 'order': 5}, 'TRAIN': {'batch_size': 256, 'epochs': 10, 'lr': 0.0001, 'optim_type': 'ADAM', 'random_seed': 2024, 'weight_decay': 0.0001}} {'device': device(type='cuda'), 'epochs': 10, 'model': StandardAE( (encoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=5) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=5) ) (3): ReLU(inplace=True) ) (decoder): Sequential( (0): DenseLayer( (layer): KAELayer(order=5) ) (1): ReLU(inplace=True) (2): DenseLayer( (layer): KAELayer(order=5) ) (3): Sigmoid() ) ), 'optimizer': Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: None lr: 0.0001 maximize: False weight_decay: 0.0001 ), 'random_seed': 2024, 'test_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c963a60>, 'train_loader': <torch.utils.data.dataloader.DataLoader object at 0x7f195c963790>} model parameters: 2,458,912 trial 1/10 test loss: 0.014011546922847628, seconds: 569.4026217460632 trial 2/10 test loss: 0.014504674705676734, seconds: 571.4599032402039 trial 3/10 test loss: 0.015789117990061642, seconds: 568.842474937439 trial 4/10 test loss: 0.014785659755580128, seconds: 547.6136360168457 trial 5/10 test loss: 0.01338328819256276, seconds: 547.1618297100067 trial 6/10 test loss: 0.013469906221143902, seconds: 546.7046883106232 trial 7/10 test loss: 0.013546474021859467, seconds: 546.1979439258575 trial 8/10 test loss: 0.013565254909917713, seconds: 546.9919567108154 trial 9/10 test loss: 0.014034933387301862, seconds: 546.0972487926483 trial 10/10 test loss: 0.014213305548764765, seconds: 545.7550501823425 average/best test loss: 0.014130416165571657 / 0.01338328819256276