Reference
verbose
int
default: 2
Verbosity level
0
= off1
= show progress2
= show statistics
output
str
default: ./
Directory or filename of the output.
- If a directory, it must end with
/
. In that case, the filename will be the name of the yaml config file. - If a filename, it must end with
.png
. Note that a number is attached to the filename or is automatically increased, if the file already exists.
snapshot_interval
int, float
default: 20.0
Interval after which a snapshot of the currently trained image is saved.
A float number specifies the interval in seconds. An integer number specifies the interval in number-of-epochs.
A zero (either int of float) disables storage of snapshots.
epochs
int
default: 300
The number of training steps before stopping the training, not including batch sizes.
For example, if the number of epochs is 100
and a target has a batch_size of 10
,
then 1000
training steps will be performed.
start_epoch
int
default: 0
The number of epochs to skip in the beginning.
This is used by the GUI application to continue training after config changes.
resolution
list of length 2 of int
default: [224, 224]
expression variables: time
Resolution of the image to create. A single number for square images or two numbers for width and height.
It supports expression variables so you can actually change the resolution during training, e.g:
resolution: - 224 if t < .2 else 448
would change the resolution from 224x224 to 448x448 at 20% of training time.
The interpolation method defaults to 'cubic' and can be changed with interpolation.
interpolation
str
default: cubic
expression variables: time
Filter used when resizing the training image.
Can be one of:
nearest
: no interpolationlinear
: bilinear interpolationcubic
: bicubic interpolation
model
str
default: ViT-B/32
The pre-trained CLIP model to use. Options are RN50
, RN101
, RN50x4
, ViT-B/32
The models are downloaded from openaipublic.azureedge.net
and stored in the user's ~/.cache/
directory
device
str
default: auto
The device to run the training on. Can be cpu
, cuda
, cuda:1
etc.
learnrate
float
default: 1.0
expression variables: resolution, time
The learning rate of the optimizer.
Different optimizers have different learning rates that work well.
However, this value is scaled by hand so that 1.0
translates to
about the same learning rate for each optimizer.
The learnrate value is available to other expressions as lr
or learnrate
.
learnrate_scale
float
default: 1.0
expression variables: resolution, time
A scaling parameter for the actual learning rate.
It's for convenience in the case when learnrate_scale is an expression like 1. - t
.
The actual learnrate can be overridden with fixed values like 2
or 3
in
different experiments.
The learnrate_scale value is available to other expressions as lrs
or learnrate_scale
.
optimizer
str
default: adam
The torch optimizer to perform the gradient descent.
init
Defines the way, the pixels are initialized. Default is random pixels.
init.resolution
list of length 2 of int
no default
This can alter the resolution of the noise or loaded image before it is converted to the resolution of the training image.
init.mean
list of length 3 of float
default: [0.5, 0.5, 0.5]
The mean (brightness) of the initial pixel noise.
Can be a single number for gray or three numbers for RGB.
init.std
list of length 3 of float
default: [0.1, 0.1, 0.1]
The standard deviation (randomness) of the initial pixel noise.
A single number will be copied to the RGB values.
init.image
str
no default
A filename of an image to use as starting point.
The image will be scaled to the desired resolution if necessary.
init.image_tensor
list
no default
A 3-dimensional matrix of pixel values in the range [0, 1]
The layout is the same as used in
torchvision,
namely [C, H, W]
, where C
is number of colors (3),
H
is height and W
is width.
This is used by the GUI application to continue training after config changes.
targets
This is a list of targets that define the desired image.
Most important are the features where texts or images are defined which get converted into CLIP features and then drive the image creation process.
It's possible to add additional constraints which alter image creation without using CLIP, e.g. the image mean, saturation or gaussian blur.
targets.active
bool
default: True
A boolean to turn off the target during development.
This is just a convenience parameter. To turn of a target
during testing without deleting all the parameters, simply
put active: false
inside.
targets.name
str
default: target
The name of the target.
Currently this is just displayed in the statistics dump and has no functionality.
targets.start
int, float
default: 0.0
Start frame of the target. The whole target is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.end
int, float
default: 1.0
End frame of the target. The whole target is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
Weight factor that is multiplied with all the weights of features and constraints.
targets.batch_size
int
default: 1
The number of image frames to process during one epoch.
In machine learning the batch size is one of the important and magic hyper-parameters. They control how many different training samples are included into one weight update.
With CLIPig we are not training a neural network or anything complicated, we just adjust pixel colors, so different batch sizes probably do not make as much difference to the outcome.
However, increasing the batch size certainly reduces the overall computation time. E.g. you can run an experiment for 1000 epochs with batch size 1, or for 100 epochs with a batch size of 10. The latter is much faster. Basically, you can increase the batch size until memory is exhausted.
targets.select
str
default: all
Selects the way how multiple features are handled.
all
: All feature losses (multiplied with their individual weights) are added together.best
: The similarity between the features of the current image pixels and each desired feature is calculated and the feature with the highest similarity is chosen to adjust the pixels in it's direction.worst
: Similar to thebest
selection mode, the current similarity is calculated and then the worst matching feature is selected. Whilebest
mode will generally increase the influence of one or a few features, theworst
mode will try to increase the influence of all features equally.mix
: All individual features are averaged together (respecting their individual weights) and the resulting feature is compared with the features of the current image. This actually works quite well!
targets.feature_scale
str
default: equal
Adjusts the initial scaling of the similarity between training image and this feature.
equal
: All factors are 1.fair
: Factors are set such that the initial frame of the training image has the same similarity with each feature.
targets.features
A list of features to drive the image creation.
The CLIP network is used to convert texts or images into a 512-dimensional vector of latent variables.
In the image creation process each target takes a section of the current image, shows it to CLIP and compares the resulting feature vector with the vector of each defined feature.
Through backpropagation each pixel is then slightly adjusted in a way that would make the CLIP feature more similar to the defined features.
targets.features.text
str
no default
A word, sentence or paragraph that describes the desired image contents.
CLIP does understand english language fairly good, also some phrases in other languages.
targets.features.image
str
no default
Path or URL to an image file (supported formats).
Alternatively to text an image can be converted into the target feature.
Currently the image is resized to 224x224, ignoring the aspect-ratio to fit into the CLIP input window.
If the path starts with http://
or https://
it's treated as an URL and the image
is downloaded and cached in ~/.cache/img/<md5-hash-of-url>
.
targets.features.start
int, float
default: 0.0
Start frame of the specific feature
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.features.end
int, float
default: 1.0
End frame of the specific feature
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.features.weight
float
default: 1.0
expression variables: learnrate, resolution, target feature, time
A weight parameter to control the influence of a specific feature of a target.
Note that you can use negative weights as well which translates roughly to: Generate an image that is the least likely to that feature.
targets.features.scale
float
default: 1.0
expression variables: learnrate, resolution, target feature, time
A scaling parameter that is multiplied with the similarity value to yield the actual similarity used, e.g., for best_match select.
targets.features.loss
str
default: cosine
The loss function used to calculate the difference (or error) between current and desired feature.
cosine
: The loss function is1 - cosine_similarity(current, target)
. The CLIP network was trained using cosine similarity so that is the default setting.l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.transforms
Transforms shape the area of the trained image before showing it to CLIP for evaluation.
targets.transforms.add
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Adds a fixed value to all pixels.
Three numbers specify red, green and blue while a single number specifies a gray-scale color.
targets.transforms.blur
A gaussian blur is applied to the pixels.
See torchvision gaussian_blur.
targets.transforms.blur.kernel_size
list of length 2 of int
default: [3, 3]
expression variables: learnrate, resolution, time
The size of the pixel window. Must be an odd, positive integer.
Two numbers define width and height separately.
targets.transforms.blur.sigma
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Gaussian kernel standard deviation. The larger, the more blurry.
If not specified it will default to 0.3 * ((kernel_size - 1) * 0.5 - 1) + 0.8
.
Two numbers define sigma for x and y separately.
targets.transforms.border
Draws a border on the edge of the image. The resolution is not changed.
targets.transforms.border.size
list of length 2 of int
default: [1, 1]
expression variables: learnrate, resolution, time
One integer two specify width and height at the same time, or two integers to specify them separately.
targets.transforms.border.color
list of length 3 of float
default: [0.0, 0.0, 0.0]
expression variables: learnrate, resolution, time
The color of the border as float numbers in the range [0, 1]
.
Three numbers for red, green and blue or a single number to specify a gray-scale.
targets.transforms.bwnoise
float
no default
expression variables: learnrate, resolution, time
Adds gray-scale noise to the image.
The noise has a scalable normal distribution around zero.
Specifies the standard deviation of the noise distribution.
targets.transforms.center_crop
list of length 2 of int
no default
expression variables: learnrate, resolution, time
Crops an image of the given resolution from the center.
One integer for square images, two numbers to specify width and height.
targets.transforms.clamp
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Clamps the pixels into a fixed range.
First number is the minimum allowed value for all color channels, second is the maximum allowed value.
An image displayed on screen or converted to a file does only include
values in the range of [0, 1]
.
targets.transforms.contrast
float
default: 1.0
expression variables: learnrate, resolution, time
Adjust the image saturation.
In the target.transforms stage, decreasing the contrast before evaluation by CLIP will increase the dark and bright portions of the training image.
How much to adjust the contrast. Can be any non negative number. 0 gives a solid gray image, 1 gives the original image while 2 increases the contrast by a factor of 2.
targets.transforms.crop
list of length 4 of float
no default
expression variables: learnrate, resolution, time
Crops a specified section from the image.
4 numbers: x and y of top-left corner followed by width and height.
A number between 0 and 1 is considered a fraction of the full resolution. A number greater or equal to 1 is considered a pixel coordinate
targets.transforms.edge
This removes everything except edges and generally has a bad effect on image quality. It might be useful, however.
A gaussian blur is used to detect the edges:
edge = amount * abs(image - blur(image))
targets.transforms.edge.kernel_size
list of length 2 of int
default: [3, 3]
expression variables: learnrate, resolution, time
The size of the pixel window used for gaussian blur. Must be an odd, positive integer.
Two numbers define width and height separately.
targets.transforms.edge.sigma
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Gaussian kernel standard deviation. The larger, the more blurry.
If not specified it will default to 0.3 * ((kernel_size - 1) * 0.5 - 1) + 0.8
.
Two numbers define sigma for x and y separately.
targets.transforms.edge.amount
list of length 3 of float
default: [1.0, 1.0, 1.0]
expression variables: learnrate, resolution, time
A multiplier for the edge value. Three numbers to specify red, green and blue separately.
targets.transforms.fnoise
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Adds noise to the image's fourier space.
It's just a bit different than the normal noise.
The noise has a scalable normal distribution around zero.
Specifies the standard deviation of the noise distribution.
The actual value is multiplied by 15.0
to give a visually
similar distribution as the normal noise.
One value or three values to specify red, green and blue separately.
targets.transforms.mean
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Adjust the mean color value.
Three numbers specify red, green and blue while a single number specifies a gray-scale color.
targets.transforms.mul
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Multiplies all pixels by a fixed value.
Three numbers specify red, green and blue while a single number specifies a gray-scale color.
targets.transforms.noise
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Adds noise to the image.
The noise has a scalable normal distribution around zero.
Specifies the standard deviation of the noise distribution.
One value or three values to specify red, green and blue separately.
targets.transforms.pad
Pads the image with additional pixels at the borders.
targets.transforms.pad.size
list of length 2 of int
no default
expression variables: learnrate, resolution, time
The number of columns/rows to add.
One integer to specify x and y at the same time, or two integers to specify them separately.
E.g. 1, 2
would add 1 column left and one column right of
the image and two rows on top and bottom respectively.
targets.transforms.pad.color
list of length 3 of float
default: [0.0, 0.0, 0.0]
expression variables: learnrate, resolution, time
The color of the pixels that are padded around the image.
targets.transforms.pad.mode
str
default: fill
expression variables: learnrate, resolution, time
The way the padded area is filled.
fill
: fills everything with thecolor
valueedge
: repeats the edge pixelswrap
: repeats the image from the opposite edge
targets.transforms.quantize
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Quantize the color values.
This defines a fixed step-size for each color value.
Generally, do not use in target.transforms because it will remove the small gradient steps of the training. It might be useful in the post-processing stage.
The step-size. Three numbers specify red, green and blue while a single number specifies a gray-scale color.
targets.transforms.random_crop
list of length 2 of int
no default
expression variables: learnrate, resolution, time
Crops a section of the specified resolution from a random position in the image.
One integer for square images, two numbers to specify width and height.
targets.transforms.random_rotate
Randomly rotates the image.
Degree and center of rotation are chosen randomly between in the range of the specified values.
The resolution is not changed and areas outside of the image are filled with black (zero).
targets.transforms.random_rotate.degree
list of length 2 of float
no default
expression variables: learnrate, resolution, time
The minimum and maximum counter-clockwise angle of ration in degrees.
targets.transforms.random_rotate.center
list of length 2 of float
default: [0.5, 0.5]
expression variables: learnrate, resolution, time
The minimum and maximum center of rotation (for x and y) in the range [0, 1]
.
targets.transforms.random_scale
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Randomly scales an image in the range specified.
The resolution does not change, only contents are scaled. Areas outside of the image are filled with black (zero).
Minimum and maximum scale, where 0.5
means half and 2.0
means double.
targets.transforms.random_shift
list of length 2 or 4 of float
no default
expression variables: learnrate, resolution, time
This randomly translates the pixels of the image.
Pixels that are moved outside get attached on the other side.
Specifies the random range of translation.
A number larger 1 or smaller -1 translates by the actual pixels.
A number between -1 and 1 translates by the fraction of the image resolution.
E.g., shift: 0 1
would randomly translate the image to every possible position
given it's resolution.
Two numbers specify minimum and maximum shift both axis, four numbers specify minimum and maximum shift for axis x and y respectively.
targets.transforms.random_translate
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Randomly translates an image in the specified range.
The resolution does not change. Areas outside of the image are filled with black (zero).
Maximum absolute fraction for horizontal and vertical translations.
For example: random_translate: a, b
, then horizontal shift is randomly sampled in
the range -img_width * a < dx < img_width * a
and vertical shift is randomly sampled in the range
-img_height * b < dy < img_height * b
.
targets.transforms.repeat
list of length 2 of int
no default
expression variables: learnrate, resolution, time
Repeats the image a number of times in the right and bottom direction.
One integer to specify x and y at the same time, or two integers to specify them separately.
targets.transforms.resize
list of length 2 of int
no default
expression variables: learnrate, resolution, time
The resolution of the image is changed.
One integer for square images, two numbers to specify width and height.
targets.transforms.rnoise
Adds noise with a different resolution to the image.
The noise has a scalable normal distribution around zero.
targets.transforms.rnoise.std
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Specifies the standard deviation of the noise distribution.
One value or three values to specify red, green and blue separately.
targets.transforms.rnoise.resolution
list of length 2 of int
no default
expression variables: learnrate, resolution, time
The resolution of the noise image. It will be resized to the processed image.
targets.transforms.rotate
Rotates the image.
The resolution is not changed and areas outside of the image are filled with black (zero).
targets.transforms.rotate.degree
float
no default
expression variables: learnrate, resolution, time
The counter-clockwise angle of ration in degrees ([0, 360]
).
targets.transforms.rotate.center
list of length 2 of float
default: [0.5, 0.5]
expression variables: learnrate, resolution, time
The center of rotation in the range [0, 1]
.
Two numbers to specify x and y separately.
targets.transforms.saturation
float
default: 1.0
expression variables: learnrate, resolution, time
Adjust the image saturation.
In the target.transforms stage, decreasing the saturation before evaluation by CLIP will actually increase the contrast of the training image.
How much to adjust the saturation. 0 will give a black and white image, 1 will give the original image while 2 will enhance the saturation by a factor of 2.
targets.transforms.shift
list of length 2 of float
no default
expression variables: learnrate, resolution, time
This translates the image while wrapping the edges around.
Pixels that are moved outside get attached on the other side.
A number larger 1 or smaller -1 translates by the actual pixels.
A number between -1 and 1 translates by the fraction of the image resolution.
E.g., shift: .5
would move the center of the image to the previous bottom-right
corner.
A single number specifies translation on both x and y axes while two numbers specify them separately.
postproc
A list of post-processing effects that are applied every epoch and change the image pixels directly without interfering with the backpropagation stage.
All transforms that do not change the resolution are available as post processing effects.
postproc.active
bool
default: True
A boolean to turn of the post-processing stage during development.
This is just a convenience parameter. To turn of a stage
during testing without deleting all the parameters, simply
put active: false
inside.
postproc.start
int, float
default: 0.0
Start frame for the post-processing stage. The stage is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
postproc.end
int, float
default: 1.0
End frame for the post-processing stage. The stage is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
postproc.add
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Adds a fixed value to all pixels.
Three numbers specify red, green and blue while a single number specifies a gray-scale color.
postproc.blur
A gaussian blur is applied to the pixels.
See torchvision gaussian_blur.
postproc.blur.kernel_size
list of length 2 of int
default: [3, 3]
expression variables: learnrate, resolution, time
The size of the pixel window. Must be an odd, positive integer.
Two numbers define width and height separately.
postproc.blur.sigma
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Gaussian kernel standard deviation. The larger, the more blurry.
If not specified it will default to 0.3 * ((kernel_size - 1) * 0.5 - 1) + 0.8
.
Two numbers define sigma for x and y separately.
postproc.border
Draws a border on the edge of the image. The resolution is not changed.
postproc.border.size
list of length 2 of int
default: [1, 1]
expression variables: learnrate, resolution, time
One integer two specify width and height at the same time, or two integers to specify them separately.
postproc.border.color
list of length 3 of float
default: [0.0, 0.0, 0.0]
expression variables: learnrate, resolution, time
The color of the border as float numbers in the range [0, 1]
.
Three numbers for red, green and blue or a single number to specify a gray-scale.
postproc.bwnoise
float
no default
expression variables: learnrate, resolution, time
Adds gray-scale noise to the image.
The noise has a scalable normal distribution around zero.
Specifies the standard deviation of the noise distribution.
postproc.clamp
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Clamps the pixels into a fixed range.
First number is the minimum allowed value for all color channels, second is the maximum allowed value.
An image displayed on screen or converted to a file does only include
values in the range of [0, 1]
.
postproc.contrast
float
default: 1.0
expression variables: learnrate, resolution, time
Adjust the image saturation.
In the target.transforms stage, decreasing the contrast before evaluation by CLIP will increase the dark and bright portions of the training image.
How much to adjust the contrast. Can be any non negative number. 0 gives a solid gray image, 1 gives the original image while 2 increases the contrast by a factor of 2.
postproc.edge
This removes everything except edges and generally has a bad effect on image quality. It might be useful, however.
A gaussian blur is used to detect the edges:
edge = amount * abs(image - blur(image))
postproc.edge.kernel_size
list of length 2 of int
default: [3, 3]
expression variables: learnrate, resolution, time
The size of the pixel window used for gaussian blur. Must be an odd, positive integer.
Two numbers define width and height separately.
postproc.edge.sigma
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Gaussian kernel standard deviation. The larger, the more blurry.
If not specified it will default to 0.3 * ((kernel_size - 1) * 0.5 - 1) + 0.8
.
Two numbers define sigma for x and y separately.
postproc.edge.amount
list of length 3 of float
default: [1.0, 1.0, 1.0]
expression variables: learnrate, resolution, time
A multiplier for the edge value. Three numbers to specify red, green and blue separately.
postproc.fnoise
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Adds noise to the image's fourier space.
It's just a bit different than the normal noise.
The noise has a scalable normal distribution around zero.
Specifies the standard deviation of the noise distribution.
The actual value is multiplied by 15.0
to give a visually
similar distribution as the normal noise.
One value or three values to specify red, green and blue separately.
postproc.mean
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Adjust the mean color value.
Three numbers specify red, green and blue while a single number specifies a gray-scale color.
postproc.mul
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Multiplies all pixels by a fixed value.
Three numbers specify red, green and blue while a single number specifies a gray-scale color.
postproc.noise
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Adds noise to the image.
The noise has a scalable normal distribution around zero.
Specifies the standard deviation of the noise distribution.
One value or three values to specify red, green and blue separately.
postproc.quantize
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Quantize the color values.
This defines a fixed step-size for each color value.
Generally, do not use in target.transforms because it will remove the small gradient steps of the training. It might be useful in the post-processing stage.
The step-size. Three numbers specify red, green and blue while a single number specifies a gray-scale color.
postproc.random_rotate
Randomly rotates the image.
Degree and center of rotation are chosen randomly between in the range of the specified values.
The resolution is not changed and areas outside of the image are filled with black (zero).
postproc.random_rotate.degree
list of length 2 of float
no default
expression variables: learnrate, resolution, time
The minimum and maximum counter-clockwise angle of ration in degrees.
postproc.random_rotate.center
list of length 2 of float
default: [0.5, 0.5]
expression variables: learnrate, resolution, time
The minimum and maximum center of rotation (for x and y) in the range [0, 1]
.
postproc.random_scale
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Randomly scales an image in the range specified.
The resolution does not change, only contents are scaled. Areas outside of the image are filled with black (zero).
Minimum and maximum scale, where 0.5
means half and 2.0
means double.
postproc.random_shift
list of length 2 or 4 of float
no default
expression variables: learnrate, resolution, time
This randomly translates the pixels of the image.
Pixels that are moved outside get attached on the other side.
Specifies the random range of translation.
A number larger 1 or smaller -1 translates by the actual pixels.
A number between -1 and 1 translates by the fraction of the image resolution.
E.g., shift: 0 1
would randomly translate the image to every possible position
given it's resolution.
Two numbers specify minimum and maximum shift both axis, four numbers specify minimum and maximum shift for axis x and y respectively.
postproc.random_translate
list of length 2 of float
no default
expression variables: learnrate, resolution, time
Randomly translates an image in the specified range.
The resolution does not change. Areas outside of the image are filled with black (zero).
Maximum absolute fraction for horizontal and vertical translations.
For example: random_translate: a, b
, then horizontal shift is randomly sampled in
the range -img_width * a < dx < img_width * a
and vertical shift is randomly sampled in the range
-img_height * b < dy < img_height * b
.
postproc.rnoise
Adds noise with a different resolution to the image.
The noise has a scalable normal distribution around zero.
postproc.rnoise.std
list of length 3 of float
no default
expression variables: learnrate, resolution, time
Specifies the standard deviation of the noise distribution.
One value or three values to specify red, green and blue separately.
postproc.rnoise.resolution
list of length 2 of int
no default
expression variables: learnrate, resolution, time
The resolution of the noise image. It will be resized to the processed image.
postproc.rotate
Rotates the image.
The resolution is not changed and areas outside of the image are filled with black (zero).
postproc.rotate.degree
float
no default
expression variables: learnrate, resolution, time
The counter-clockwise angle of ration in degrees ([0, 360]
).
postproc.rotate.center
list of length 2 of float
default: [0.5, 0.5]
expression variables: learnrate, resolution, time
The center of rotation in the range [0, 1]
.
Two numbers to specify x and y separately.
postproc.saturation
float
default: 1.0
expression variables: learnrate, resolution, time
Adjust the image saturation.
In the target.transforms stage, decreasing the saturation before evaluation by CLIP will actually increase the contrast of the training image.
How much to adjust the saturation. 0 will give a black and white image, 1 will give the original image while 2 will enhance the saturation by a factor of 2.
postproc.shift
list of length 2 of float
no default
expression variables: learnrate, resolution, time
This translates the image while wrapping the edges around.
Pixels that are moved outside get attached on the other side.
A number larger 1 or smaller -1 translates by the actual pixels.
A number between -1 and 1 translates by the fraction of the image resolution.
E.g., shift: .5
would move the center of the image to the previous bottom-right
corner.
A single number specifies translation on both x and y axes while two numbers specify them separately.
targets.constraints
Constraints do influence the trained image without using CLIP.
They only affect the pixels that are processed by the transforms of the target.
targets.constraints.blur
Adds the difference between the image and a blurred version to the training loss.
This is much more helpful than using the gaussian blur as a post-processing step. When added to the training loss, the blurring keeps in balance with the actual image creation.
Areas that CLIP is excited about will be constantly updated and will stand out of the blur, while unexciting areas get blurred a lot.
targets.constraints.blur.kernel_size
list of length 2 of int
default: [3, 3]
expression variables: learnrate, resolution, target constraint, time
The size of the pixel window. Must be an odd, positive integer.
Two numbers define width and height separately.
targets.constraints.blur.sigma
list of length 2 of float
no default
expression variables: learnrate, resolution, target constraint, time
Gaussian kernel standard deviation. The larger, the more blurry.
If not specified it will default to 0.3 * ((kernel_size - 1) * 0.5 - 1) + 0.8
.
Two numbers define sigma for x and y separately.
targets.constraints.blur.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.blur.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.blur.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.blur.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.constraints.border
Adds a border with a specific size and color to the training loss.
targets.constraints.border.size
list of length 2 of int
default: [1, 1]
expression variables: learnrate, resolution, target constraint, time
One integer two specify width and height at the same time, or two integers to specify them separately.
targets.constraints.border.color
list of length 3 of float
default: [0.0, 0.0, 0.0]
expression variables: learnrate, resolution, target constraint, time
The color of the border as float numbers in the range [0, 1]
.
Three numbers for red, green and blue or a single number to specify a gray-scale.
targets.constraints.border.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.border.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.border.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.border.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.constraints.contrast
Pushes the contrast above or below a threshold value.
The contrast is currently calculated in the following way:
The image pixels are divided into the ones that are above and below the pixel mean values. The contrast value is then the difference between the mean of the lower and the mean of the higher pixels.
targets.constraints.contrast.above
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
below the above
value.
targets.constraints.contrast.below
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
above the below
value.
targets.constraints.contrast.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.contrast.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.contrast.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.contrast.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.constraints.edge_mean
Adds the difference between the current image and and an edge-detected version to the training loss.
A gaussian blur is used to detect the edges:
edge = amount * abs(image - blur(image))
targets.constraints.edge_mean.above
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
below the above
value.
targets.constraints.edge_mean.below
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
above the below
value.
targets.constraints.edge_mean.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.edge_mean.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.edge_mean.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.edge_mean.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.constraints.edge_mean.kernel_size
list of length 2 of int
default: [3, 3]
expression variables: learnrate, resolution, target constraint, time
The size of the pixel window of the gaussian blur. Must be an odd, positive integer.
Two numbers define width and height separately.
targets.constraints.edge_mean.sigma
list of length 2 of float
no default
expression variables: learnrate, resolution, target constraint, time
Gaussian kernel standard deviation. The larger, the more blurry.
If not specified it will default to 0.3 * ((kernel_size - 1) * 0.5 - 1) + 0.8
.
Two numbers define sigma for x and y separately.
targets.constraints.mean
Pushes the image color mean above or below a threshold value.
targets.constraints.mean.above
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
below the above
value.
targets.constraints.mean.below
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
above the below
value.
targets.constraints.mean.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.mean.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.mean.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.mean.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.constraints.noise
Adds the difference between the current image and a noisy image to the training loss.
targets.constraints.noise.std
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
Specifies the standard deviation of the noise distribution.
One value or three values to specify red, green and blue separately.
targets.constraints.noise.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.noise.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.noise.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.noise.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.constraints.normalize
Adds image normalization to the training loss.
The normalized version is found by moving the image colors into the range of min and max.
targets.constraints.normalize.min
list of length 3 of float
default: [0.0, 0.0, 0.0]
expression variables: learnrate, resolution, target constraint, time
The desired lowest value in the image.
One color for gray-scale, three colors for red, green and blue.
targets.constraints.normalize.max
list of length 3 of float
default: [1.0, 1.0, 1.0]
expression variables: learnrate, resolution, target constraint, time
The desired highest value in the image.
One color for gray-scale, three colors for red, green and blue.
targets.constraints.normalize.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.normalize.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.normalize.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.normalize.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.constraints.saturation
Pushes the saturation above or below a threshold value.
The saturation is currently calculated as the difference of each color channel to the mean of all channels.
targets.constraints.saturation.above
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
below the above
value.
targets.constraints.saturation.below
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
above the below
value.
targets.constraints.saturation.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.saturation.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.saturation.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.saturation.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.
targets.constraints.std
Pushes the standard deviation above or below a threshold value.
targets.constraints.std.above
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
below the above
value.
targets.constraints.std.below
list of length 3 of float
no default
expression variables: learnrate, resolution, target constraint, time
If specified, the training loss increases if the current value is
above the below
value.
targets.constraints.std.weight
float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
A multiplier for the resulting loss value of the constraint.
targets.constraints.std.start
int, float
default: 0.0
expression variables: learnrate, resolution, target constraint, time
Start frame of the constraints. The constraint is inactive before this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.std.end
int, float
default: 1.0
expression variables: learnrate, resolution, target constraint, time
End frame of the constraints. The constraint is inactive after this time.
- an
int
number defines the time as epoch frame - a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.an
int
number defines the time as epoch frame- a
float
number defines the time as ratio between 0.0 and 1.0, where 1.0 is the final epoch. percent
(e.g.23.5%
) defines the time as percentage of the number of epochs.
targets.constraints.std.loss
str
default: l2
expression variables: learnrate, resolution, target constraint, time
The loss function used to calculate the difference (or error) between current and desired image.
l1
ormae
: Mean absolute error is the mean of the absolute difference of each vector variable.l2
ormse
: Mean squared error is the mean of the squared difference of each vector variable. Compared to mean absolute error, it produces a smaller loss for small differences (below 1.0) and a larger loss for large differences.