Ayuda:Generar imágenes con VQGAN+CLIP/English
How to generate images with VQGAN+CLIP
VQGAN is a generative adversarial network. Generative Adversarial Networks, also known as GANs are a class of machine learning frameworks designed by Ian Goodfellow and his colleagues in 2014. Two neural networks contest with each other in a game (in the form of a zero-sum game, where one agent's gain is another agent's loss).
This technique can produce images that appear authentic to human observers. For example, a synthetic image of a cat that manages to fool the discriminator (one of the functional parts of the algorithm) is likely to lead some people to accept it as a real photograph. The difference of VQGAN with previous GAN networks is that it allows high resolution outputs.
CLIP (Contrastive Language Image Pretraining) is another artificial intelligence that allows you to transform texts into images. That is, in 'VQGAN + CLIP' , CLIP introduces text inputs to VQGAN. Here we explain how to use it.
VQGAN+CLIP in Google Colaboratory
Entering in VQGAN+CLIP (z+quantize method con augmentations).ipynb done in Google Colaboratory by Katherine Crowson you can run a VQGAN model preformated with values and combined with a CLIP model. Here we explain how to make it work.
Previous steps
- 2) At the top right click on
Conectar
(that meansConnect
) to be assigned a machine.


- 3) On the page there are black circles with an arrow that looks like "Play". Click on these buttons to run each of the cells.
- 4) Click on the cell with the text:
Licensed under the MIT License
.
-
We will get this warning.
- 5) Click on the cell with the text:
!nvidia-smi
. The data of the remote PC that will run the VQGAN + CLIP model appears here. VRAM can be like this:0MiB / 15109MiB
. The more VRAM, the more rendering power. With less than 15109MiB it might not be worth using the machine (taking on average: 1 iteration 4 seconds, that is, four times longer than a 15 GiB of RAM graphic card).
- 6) Click on the cell with the text:
Instalación de bibliotecas
(that meansInstallation of libraries
). You will see that in that cell, progress bars appear. That's the installations and downloads in progress. Wait for it to finish downloading[1].
-
This process no longer shows the same output, but is equivalent
- 7) Click on the cell with the text:
Selección de modelos a descargar
(that meansSelections of models to download
). You can choose to download other models but the default modelimagenet_16384
is good.imagenet_1024
is also light.Time total
,time spent
,time left
indicate how much time is left to download. Wait for it to finish downloading.
- 8) Click on the cell with the text:
Carga de bibliotecas y definiciones
(that meansLoad of libraries and definitions
).
- 9) Click on the cell with the text:
Parámetros
(that meansParameters
).
To the right of the parameters cell there is a text box that allows you to customize them more easily. Every time you modify the Parámetros
("Parameters
") you have to rerun the cell so that it gets updated.
Parameters
Name of the parameter | English translation | Default text | Description |
---|---|---|---|
textos |
texts |
A fantasy world |
This parameter is the text that VQGAN + CLIP will interpret as the concept of the image. If you write "fire", it will draw fire, and if you write "water", it will represent water. More information in the section "text and context". |
ancho |
wide |
480 |
The width of the image that VQGAN+CLIP will generate inside the Colab. The recommendation is to not modify it to more than 600px because the virtual machine has a limited memory. Is better to use after bigjpg (or waifu2x or any other resizer). You can change the proportions, so the image won't be squared (A little help here: Proportions calculator). |
alto |
height |
480 |
The height of each image that will be generated in the Colab. The recommendation is to not modify it to more than 600px because the virtual machine has a limited memory. Is better to use after bigjpg (or waifu2x or any other resizer). You can also change the proportion. |
modelo |
model |
imagenet_16384 |
This parameters decides which model of VQGAN will be run. There are boxes that allows to select a model. The one you select must have been previously downloaded. The number indicates the number of models it contains so imagenet_16384 is (supposedly) better than imagenet_1024 (although heavier). See QQ.
|
intervalo_imagenes |
image_interval |
50 |
This tells the program every few iterations to show the image result on the page. If you type 50, it will print the results of the iterations 0, 50, 100, 150, 200, etc. |
imagen_inicial |
initial_image |
None |
To use an initial image, you only have to upload a file to the Colab environment (in the left side) and then modify the imagen_inicial ("initial_image "): putting the exact filename. Example: sample.png . See Upload images.
|
imagenes_objetivo |
target_images |
None |
One or more images that the AI will take as "target", fulfilling the same function as putting a text on it. That is, the AI will try to imitate the image or images. They are separated with | .
|
seed |
= | -1 |
The seed of that image. -1 indicates that the seed will be random each time. By choosing -1 you will only see in the Colaboratory interface the seed chosen in the cell "Hacer la ejecución" ("Make the execution"), such as this: Using seed: 7613718202034261325 (ejemplo). If you want to find out the iterations and seeds of the images you have downloaded, they are in the image comments. On Linux, normal viewers can see comments. In Windows the default viewers cannot see the metadata, but with Jeffrey's Image Metadata Viewer you can see them[r 1][r 2].
|
max_iteraciones |
max_iterations |
-1 |
The maximum number of iterations before the program stops. Default is -1, that means the program will not stop unless it does not crash or stop for some other reason. It is recommended to change it to a value like 500 , 600 , 1000 o 3000 . A higher number is sometimes not necessary (variability decreases with higher number of iterations). Remember that doing these calculations is very expensive energetically (and if you leave the session for too long doing calculations you will have a limitation in Google Colaboratory).
|
Text and context
Language
The AI is much more trained in English so many times the context is better by entering the input in English, but it understand (somewhat) other languages. This can be seen in the Images, if you put as text input greek temples in space
the result is better than with Templos griegos en el espacio
(Spanish).
Separate entries
You can separate concepts with vertical bars (also known as pipes) ( |
) and it produces two different entries of text, each one with a different "hinge loss"[r 3]. This allows to allows you to assign effects or adjectives independently to different elements.
In the execution cell they can be seen separated by commas and in quotes.
- Text:
Cosmic egg
. - Name='cosmic egg'. (this is what the program runs).
- Text:
Bronze | space
- Name='bronze' , 'space'.
It will produce a different result than:
- Text:
Bronze, space
. - Name='bronze, space'.
Use adjectives
Adjectives / styles can be used to vary the image without varying the objects we want it to draw.
There is not a fixed number of styles, there are as many styles as we can think of.
By artist
- Beksinski style / Dali style / Van Gogh style / Giger style / Monet style / Klimt / Katsuhiro Otomo style / Goya[r 4] / Miguel Angel (Sistine Chapel style) / Joaquin Sorolla / Moebius / in Raphael style (sometimes it helps to improve the lines of the faces).
- can also be mixed.
In Wiki-art
even better results are achieved.
By art style
- Camera qualities and distortions: 4k / chromatic aberration effect / cinematic effect / diorama / dof / depth of field / field of view / fisheye lens effect / photorealistic / hyperrealistic / raytracing / stop motion / tilt-shift photography / ultrarealistic / vignetting.
- cell shading / flat colors / full of color / electric colors.
- anime / comic / graphic novel / visual novel.
- Materials: acrylic painting style / clay / coffee paint / collage / glitch / graphite drawing / gouache / illuminated manuscript / ink drawing / medieval parchment / detailed oil painting / tempera / watercolor.
- isometric / lineart / lofi / lowpoly / photoshop / pixel art / vector art / voxel art.
- Historical periods: baroque / German Romanticism / impressionism / Luminism / pointillism / postimpressionism / Vienna Secession.
- See more here Art movement.
- You can also use type of paint by regions:
- chinese painting / indian art / tibetan paintings / nordic mythology style / etc.
By movies or video games
- in Ghost in the Shell style / in Star Wars style / in Ghibli style / in Metropolis film style / Death Stranding style
- (the more the input text has to do with the movie or video game, the more it will look like it). See The 1000 best films of history (2020).
By rendering program
It mimics the result, is not like the IA is actually using those graphics engines[r 5]):
- rendered in Unreal Engine[r 6][r 7], the most used.
- rendered in Vray.
- rendered in povray.
- zbrush / cycles.
- rendered in rtx.
- rendered in octane.
- rendered in cinema4d.
- rendered in autodesk 3ds max.
- rendered in houdini.
- etc.
- They can be combined.
- Pending: To check: https://www.g2.com/products/octanerender/competitors/alternatives
- Visual reference of 4 prompts with many styles
Specific effects
- Add brdf / caustics / global-illumination / non-photorealistic / path tracing physically based rendering / raytracing / etc.
Effects / lights
- Fog / fire / lava / shining / glow / red-hot / incandescent / iridiscent / etc.
Other modifiers
- Trending on (website): for example "trending on artstation"[r 8]
- minimalistic / dismal / grim / liminal / surprising / black hole / diamond.
Assign weights
Percentages can also be used and CLIP will interpret decimals (0.1, 0.5, 0.8) as weights of that concept in the drawing (1 will be the total). You can also use "percentages" (without the percent symbol). Negative weights can be used to remove a color, for example.
- It not recommended to put weights less than -1.
- The weights are relative to each other (the total is recalculated and does not have to coincide with the numbers that have been set). That is why it is recommended that they add 100% —or 1— (more than anything for ourselves to know the real weights).
Examples of weights in decimals (the parentheses indicate the error):
- Text:
rubber:0.5 | rainbow:0.5
. Equivalent torubber:50 | rainbow:50
.- Badly done: "0.5 rubber | 0.5 rainbow". (The allocation of weights goes after the concept and after
:
). - Badly done: "awesome:100 adorable:300". (It is not separated by
|
) - Badly done: "rubber:50% | rainbow:50%". (The
%
are not admitted symbols).
- Badly done: "0.5 rubber | 0.5 rainbow". (The allocation of weights goes after the concept and after
Other example of weights (total=100):
- Text:
sky:35 | fire:35 | torment:20 | dinosaurs:10
Example:
- Text:
fantasy world | pink:0
- Result: It doesn't have pink.
Note: Deleting a word using negative values can completely change the image, with unexpected results. If you are very specific you can achieve the desired results, but still the image will change a lot.
Not checked yet
'Note 2: It is better to eliminate a concept using values to 0.
For example to remove the Unreal logo:
- Text:
… | logo of unreal:-1
. It could give a satisfactory result.
Not checked yet
- Text:
… | logo of unreal:0
. Could work better.
While:
- Text:
… | logo:-1
. It will give a totally different result by being too unspecific.
Another advices
- For astronomical images, a better result is achieved when weights are assigned to the parameters, thus the elements are defined much better (for example a galaxy).
- Texts that are too short tend to go wrong but if they are very specific not so much.
- AI warps people's faces when you name someone specific.
- Humans are AI's weak point.
- One trick that seems to work well is to use an image of a human face produced by Artbreeder (a website that also uses AIs with GAN).
- Starting point using images from VQGAN itself is also very efficient.
-
Using older images from the VQGAN + CLIP itself gives very good results, by Anarius.
Upload images
To use an initial image you first have to upload it.
- Go to the left side and click on "
Archivos
" ("Files
"). - Select the icon that represents the upload "
Subir al almacenamiento de sesión
" (something like "Upload to the session storage
"). - Upload the image you want from your file system (give it a recognizable name).
- The image will only remain during the session, then it will be deleted.
- Then you have to modify the section
imagen_inicial
(initial_image
) orimagenes_objetivo
(target_images
) putting the exact filename. In the sectionimagenes_objetivo
(target_images
) you can put several images, using|
as separator.
Other techniques
Adapt images to specific shapes
To adapt the final image to a specific shape, starting images with color masks can be used by making that shape. (Optional: it can simply be a white square, although it will be more accurate if you start from iteration 0) and things that conform to that shape are selected in the text input (for example, in a round mask: a watch, a pizza, a crystal ball, etc.)
-
You take an image produced by the 0 iteration.
-
And in an edition program (like GIMP or Photoshop) you choose a shape, making the background uniform (of the color you choose). This image will be used as
imagen_inicial
("initial_image
"). -
Example: With text input
fisheye view of a corridor | graphite draw
and using asimagen_inicial
("initial_image
") the mask (by Abulafia).
You can add or download masks from the Colaborative Drive of DotHub.
Guide the AI to a result
- When you get to an iteration you deviate from what you want. You can stop and use that last frame as new
imagen_inicial
("initial_image
"), modifying the description a bit. That way, to a certain extent, you can "guide" the AI towards what you want.
Post-execution
Once you have decided the parameters:
- Click on the cell with the text:
Hacer la ejecución…
("To do the execution…
"). - Wait for images to appear in that cell.
- When you want to save an image, press the right button + Save image and save with the name you want.
Generate a video
The corresponding "Play" is pressed.
If the range is not specified for the video, it will generate a video of all frames and it may take a while. To avoid it you can change the parameters in the cell likeinit_frame
, last_frame
and also the FPS
.
When the process ends, sometimes it does not load and it is not evident where the generated video is. It is in Archivos
("Files
") (left sidebar).
Update: There is a new specific cell called Descargar vídeo
(" Download video
"), which performs the download automatically.
Create a zip with all the images
This has not been activated in the online version yet but with the code shown it is fully functional.
If we want to download all the steps, we have generated too many images and it is very tedious for us to save them one or in one or if we have simply deleted or stopped the cell where the images were, they can still be downloaded.
The middle steps generated (although they are not shown) are placed in the /steps
folder. If we want to download all the steps, it is not feasible to do it by hand, so we are going to introduce them in a single file and thus download them more easily.
- This applies after images have been generated.
- If the machine has been disconnected, there will be no files in
/steps
(most of the times) so this procedure will be useless. The images shown in the main interface may be preserved, so they could be saved as a group using the extension Download All images. Links to this plugin are at More tools.
-
Create zip cell
Name of the parameter | Text by default | Description |
---|---|---|
initial_image |
0 |
Determines the first image that will be included in the zip. |
final_image |
-1 |
Determines the last image that will be included in the zip. The -1 is the value for including until the last one.º
|
step |
50 |
Determines the interval between an image and the next. By default is 50, the same than in the above menu. If you put the interval to 1 it saves all the middle steps. The resulting zip will be very heavy, depending of the amount of images. In addition, it also influences the time it takes to download. If it is too large, it may take a long time and / or the machine may be disconnected in the middle of the download. Normally when reconnecting the files have not been deleted (but it may be the case).
|
filename |
files.zip |
This is the name of the file to be downloaded. The images inside the folder will also have this name and a numbering. It is recommended to put distinctive names. |
Once the generated zip archive is generated it's in the left sidebar, in should be download automatically.
Archivos
(Files
)
Open new entries for commands
In order to introduce a new entry, we go with the mouse to the edge of a cell and as we pass above, it will appear two tabs: + Código
(+ Code) y + Texto
(+ Text). We click + Código
(+ Code). Another option is to use Control+M B
[2].
-
By adding a new cell we can execute custom commands
And paste this:
Code for generating the zip
# @title Create a zip with all of the images (or some) initial_image = 0 #@param {type:"integer"} final_image = -1 #@param {type:"integer"} pass = 50 #@param {type:"integer"} filename = "files.zip" #@param {type:"string"} import zipfile if final_image == -1: final_image = i zipf = zipfile.ZipFile(filename, 'w', zipfile.ZIP_DEFLATED) for i in tqdm(range(initial_image, final_imagen+1, step)): fname = f"{i:04}.png" zipf.write(f"steps/{fname}", f"{filename.split('.')[0]}-{fname}") zipf.close() print(filename, "created. Downloading… A download dialog will be open when the download is ready.") from google.colab import files files.download(filename)
If you do more than one zip, sometimes you must give your browser permission to download multiple files.
Create a zip with all the images (2)
This section it was outdated version of the previous one.
See Open new entries for commands.
Important in the case of downloading all the images: If we have many files and little space, it can fail (especially if combined with using very heavy models, such as COCO-stuff
or after generating a video). If that were the case and we were not going to use the machine again with that model, we could delete the specific model to make space or the video, once downloaded.
Control the notebook from the keyboard / automated execution
Control the notebook from the keyboard
In order to control the notebook from the keyboard we need to understand the concept of "focus". The focus is the place where the "selection" is every time . If a window, element, cell, text box, etc. has focus means that it can directly receive commands from the keyboard.
The key combinations that we will use to navigate the notebook are few:Control+Enter
to run a cell, ↓
to change cells and ↹
(the tab key, above caps lock), to move through menus or within a cell.
Steps:
1) Center the focus on the first cell (going down with ↓
once).
2) Run the second cell (Licensed under the MIT License
). A menu will appear where we have to accept (↹
and Enter
), we will go to the next cell and execute them successively[1] until to come to Parámetros
("Parameters
") that is a text cell, so we don't execute it right away. In Parámetros
("Parameters
") we will be choosing the necessary fields moving with the tabulator (↹
).
3) We can move backwards with the tabulator (Shift+↹
) or forward until we reach the heading of a cell and keep moving down with ↓
in order to execute Hacer la ejecución
(Make the execution
). In the case of wanting to save as a zip (see Create a zip with all the images) or Guardar como vídeo
(Save as video
),we will move to their respective cells. Currently we would have to manually enter the cell (see Open new entries for commands).
Automated execution
Go to the section of Parámetros
—Parameters
) (or Selección de modelos a descargar
(Selection of models to download
in the case of wanting to use a different one from the default— and once the desired parameters have been filled in, go to the menu Runtime → Run before or press Control+F8
.
Mount Google Drive
In the lateral menu we click in Activar Drive
(To activate Dive). botón añadirá automáticamente un bloque de code. If don't load/works the lateral menu you can try to Open new entries for commands, with this code:
from google.colab import drive drive.mount('/content/drive')
VQGAN+CLIP FAQ
How to stop the execution of the program?
By default max_iteraciones
is -1
, that means that the program isn't going to stop of doing iterations. If you want to sto the proccess you can press the circular button of grey background with a withe X and a tooltip that says "borrar resultado" ("delete result"). Before doing so, make sure you have saved everything you want.
The shortcut to stop a cell is Control+M I
[2].
-
It can be also stopped with the X that appears in the bottom status bar.
Sometimes a cell "hangs". Then you can delete it on the delete button (trash can symbol). To restore it use Control+M Z
[2].
How do I know if a cell has been executed?
If you position yourself on the "Play" button of a cell, you can see if it has been executed and the result of the execution.
-
Estado de ejecución.
I upload an image but is distorted from beginning
That is because 480x480 is a squar image. If you uload an image with a different propotion than 1:1, the image generated will be distorted from beginning. Calculate the proportion of the image you are going to upload and use the same proportion in the fields ancho
(width) and alto
(height) if you want to avoid this distortion. You can use this Calculadora de proporciones.
Sometimes I specify a value, but it doesn´t take it
This is because you don't have executed the cell again. Any change in the fields of the Parámetros
("Parameters
") needs to execute again that cell.
I specified a limit value but I want to continue with more iterations
It is not needed to iterate again from the beginning to continue with a result (although you should know seed
or have found out using the tool Steganography Online), you can use as imagen_inicial
("initial_image
") the last iterated image. See Guide the AI to a result.
I don´t know if I have to execute everything from the beginning (the machine has disconnected)
Although sometimes the packages and definitions are preserved in the memory although some time has passed, sometimes not and you have to start from the beginning. It can be easily checked by looking if the cell has been executed in the current session. See How do I know if a cell has been executed or trying to execute the cell Parámetros
("Parameters
") and looking the error.
What model is better for me?
![]() |
ADVERTENCIA DE FALTA DE CONTENIDO: This section es un borrador, que se encuentra (evidentemente) incompleto. Be pacient |
---|
Most of the models have no obvious advantages between them. They have been trained with different sets of images so they will produce different results, not necessarily better or worse.
- ImageNet: The ImageNet project is a large visual database designed for use in visual object recognition software research. The project has hand-annotated over 14 million images to indicate which objects are rendered, and in at least one million of the images, bounding boxes are also provided. Contains over 20,000 categories with a typical category such as "balloon" or "strawberry" consisting of several hundred images (via wikipedia).
Compared to what could be assumed, having a larger codebook does not exactly mean that it is more powerful, it simply allows you to capture more characteristics of the images. That may or may not be good, depending on what kind of results you want to achieve. The 1024 is a bit "freer" so to speak when it comes to generating images. It tends to create things more abstract, more chaotic, and more artistic. Your "world view" has fewer categories, forcing it to abstract more.
16384 it is much better for soft or minimalist backgrounds.
You can see a comparising between some images of 1024 vs 16384 in this colab: Reconstruction usage.
COCO-Stuff
(Proyecto - 7.86 GiB - FID[3]: 20.4): COCO-Stuff es una modificación con "augmentos" del un conjunto de datos (=dataset) COCO de Microsoft, con imágenes cotidianas (calles, personas, animales, interiores…).
faceshq
(3.70 GiB). Especializado en caras.
- Wikiart: Las imágenes del conjunto de datos de WikiArt se obtuvieron de WikiArt.org. Licencia: Solo para fines de investigación no comerciales. Es decir, es un conjunto entrenado con cuadros de arte por lo que los resultados generalmente serán pinturas. Un resultado similar se podría conseguir en los conjuntos de datos de
imagenet
usando estilos de pintores famosos.wikiart_1024
(913.75 MiB): Version de wikiart con un codebook de 1024 elementos.wikiart_16384
(958.75 MiB): Version de wikiart con un codebook de 16384 elementos.
s-flckr
(3.97 GiB ): Conjunto desde Flickr.ade20k
(4.61 GiB - FID[3]: 35.5) (Nota: no está por defecto, ver ¿Cómo puedo añadir nuevos modelos?): Conjunto de datos con segmentación semántica del MIT. Contiene más de 20K imágenes centradas en escenas con anotaciones exhaustivas con objetos a nivel de píxel y etiquetas de partes de objetos. Hay un total de 150 categorías semánticas, que incluyen cosas como cielo, carreteras, césped y objetos discretos como personas, coches, camas, etc.
I have left VQGAN open overnight and now it won't connect to a machine with a GPU
Although "it is not doing anything" if the connection exists, it is already speding assigned time, so if you leave the machine stopped for a long time, the next time you will not be assigned a machine with GPU.
Solution:
- Avoid leaving the machine connected longer than necessary.
- Wait long enough (1 day or so) for the limitation to pass.
- Use a different gmail account.
- If you have infinite patience, you can use the machine with only CPU. It takes about 30 times longer. Not recommended.
What are the limits of use of Colab?
Colab may provide free resources in part by having dynamic usage limits that sometimes fluctuate and by not providing guaranteed or unlimited resources. This means that general usage limits, as well as idle periods, maximum virtual machine lifespan, available GPU types, and other factors vary over time. Colab does not publish these limits, in part because they can (and sometimes do) change rapidly.
GPUs and TPUs are sometimes prioritized for users using Colab interactively over long-running computations, or for users who have recently used fewer resources in Colab. As a result, users using Colab for long-running computations, or users who have recently used more resources in Colab, are more likely to run into usage caps and have their access to GPUs and TPUs temporarily restricted. Users with heavy computational needs may be interested in using the Colab UI with a local runtime running on their own hardware. Users interested in having higher and stable usage limits may be interested in Colab Pro[r 10].
How can I upload an image from the internet directly?
If you have a poor connection sometimes the functionality of uploading an image fails or does not load. This code can be used to download the image from an internet URL.!wget https://url.to.your/image.jpg
It allows to download a URL from Internet to the notebook. See Open new entries for commands. If the URL have strange symbols you could use instead:
!wget "https://url.to.your/image.jpg"
How can I run VQGAN+CLIP locally?
![]() |
ADVERTENCIA DE FALTA DE CONTENIDO: Esta sección es un borrador, que se encuentra (evidentemente) incompleto. Se completará próximamente |
---|
The problem is that running it locally will ask for a very powerful graphic. Approximately you need from 15GB of GPU approximately so that it goes decently for you. With 10GB minimum, it will run, but extremely slow (This maybe is outdated).
There are people who have managed to make it usable with graphics cards (1060) of 6GB. A possible optimization solution would be to modify the parameter cutn
(for example to 32
). If the parameter is lowered too much (that is related with quality), it does not give good results, but if it is too high, it consumes a lot of memory.
In Local execution there are Docker and github versions.
How can I add new models?
The better way is to notify the existence of new models to the DotHub discord and wait for implementations of people. .
How can I disable the augmentations?
The augmentations are random variations in each step that that improve the final quality of the image. That is, if they are deactivated, quality is reduced, but the execution will be faster (although not much 1.01 seconds each iteration vs 1.31 seconds each iteration (on a Tesla T4). It can also vary the content substantially.
In the cell Carga de bibliotecas
(Loading libraries
) we click two times and it will appear the editable code. W search for (Control+F
) this code class MakeCutouts(nn.Module):
and remove from self.augs = nn.Sequential()
all that goes between parenthesis, being this way:
class MakeCutouts(nn.Module): def __init__(self, cut_size, cutn, cut_pow=1.): super().__init__() self.cut_size = cut_size self.cutn = cutn self.cut_pow = cut_pow self.augs = nn.Sequential() self.noise_fac = 0.1
-
Normal state
-
Final state without "augmentations"
Results:
-
medusa hydra chimera hybrid monster flying in saturn | vray unreal engine style
, i=550. With augmentations (default). -
medusa hydra chimera hybrid monster flying in saturn | vray unreal engine style
, i=550. Without augmentations.
VQGAN+CLIP error fixing

Here are some of the errors you may encounter and how to fix them.
NameError: name 'textos' is not defined
NameError Traceback (most recent call last) <ipython-input-6-b687f6112952> in <module>() 2 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') 3 print('Using device:', device) ----> 4 if textos: 5 print('Using texts:', textos) 6 if imagenes_objetivo: NameError: name 'textos' is not defined
Solution: You have pressed in Hacer la ejecución…
(Make the execution…
) before loading the parameters. Stop the cell Hacer la ejecución…
(Make the execution…
) and run the cell Parámetros
(Parameters
). After that, press the cell "Hacer la ejecución…
" (Make the execution…
) again.
NameError: name 'argparse' is not defined
NameError Traceback (most recent call last) <ipython-input-8-9ad04e66b81c> in <module>() 29 30 ---> 31 args = argparse.Namespace( 32 prompts=textos, 33 image_prompts=imagenes_objetivo, NameError: name 'argparse' is not defined
Solution: This means that you have not run the cell "Carga de bibliotecas y definiciones
" (Loading libraries and definitions
. (Also can be because the virtual machine have expired).
ModuleNotFoundError: No module named 'transformers'
ModuleNotFoundError Traceback (most recent call last) <ipython-input-12-a7b339e6dfb6> in <module>() 13 print('Using seed:', seed) 14 ---> 15 model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) 16 perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device) 17 11 frames /content/taming-transformers/taming/modules/transformer/mingpt.py in <module>() 15 import torch.nn as nn 16 from torch.nn import functional as F ---> 17 from transformers import top_k_top_p_filtering 18 19 logger = logging.getLogger(__name__) ModuleNotFoundError: No module named 'transformers'
Solution: It happens with coco
, faceshq
and sflickr
. You have to open a cell before the cell of Carga de bibliotecas y definiciones
(Load of libraries and definitions
and write:
!pip install transformers
And run that cell.
ModuleNotFoundError: No module named 'taming'
ModuleNotFoundError Traceback (most recent call last) <ipython-input-10-c0ac0bf55e51> in <module>() 11 from omegaconf import OmegaConf 12 from PIL import Image ---> 13 from taming.models import cond_transformer, vqgan 14 import torch 15 from torch import nn, optim ModuleNotFoundError: No module named 'taming'
Solution: Maybe restart the enviroment or see the solution to ModuleNotFoundError: No module named 'transformers'.
ModuleNotFoundError: No module named 'taming.modules.misc'
ModuleNotFoundError Traceback (most recent call last) <ipython-input-11-b687f6112952> in <module>() 13 print('Using seed:', seed) 14 ---> 15 model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) 16 perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device) 17 12 frames /usr/lib/python3.7/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_) ModuleNotFoundError: No module named 'taming.modules.misc'
Solution: One of the packages needed for the execution of the program fails. Run again the installation cell and try again.
- If you chose the model "
faceshq
" it could appear this error. This error have been corrected (2021-06-11).
FileNotFoundError: [Errno 2] No such file or directory
FileNotFoundError Traceback (most recent call last) <ipython-input-10-f0ccea6d731d> in <module>() 13 print('Using seed:', seed) 14 ---> 15 model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) 16 perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device) 17 1 frames /usr/local/lib/python3.7/dist-packages/omegaconf/omegaconf.py in load(file_) 181 182 if isinstance(file_, (str, pathlib.Path)): --> 183 with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f: 184 obj = yaml.load(f, Loader=get_yaml_loader()) 185 elif getattr(file_, "read", None): FileNotFoundError: [Errno 2] No such file or directory: '/content/wikiart_16384.yaml' (alternativa: Orange.png)
Solution: This could mean two things:
- You have choosen a model that have not been download. Check that the chosen model has been downloaded. It may also be that the machine has expired (that is, everything has been erased). In the latter case you would have to run everything from the beginning.
- Has puesto un nombre en
imagen_inicial
("initial_image
") que no se corresponde con la imagen que has subido. Este programa detecta como diferentes las mayúsculas de las minúsculas por lo queOrange.png
es diferente queorange.png
. Pon el nombre correcto enimagen_inicial
("initial_image
").
- If by accident you have uploaded an image with a very complicated name, you can change the name from the interface itself.
-
In the panel of
Files
to the right of each file there are a columns of points that can be clicked to change the name.
RuntimeError: CUDA out of memory
RuntimeError Traceback (most recent call last) <ipython-input-13-f0ccea6d731d> in <module>() 131 with tqdm() as pbar: 132 while True: --> 133 train(i) 134 if i == max_iteraciones: 135 break 8 frames /usr/local/lib/python3.7/dist-packages/taming/modules/diffusionmodules/model.py in nonlinearity(x) 29 def nonlinearity(x): 30 # swish ---> 31 return x*torch.sigmoid(x) 32 33 RuntimeError: CUDA out of memory. Tried to allocate […] (GPU 0; […] total capacity; […] already allocated; […] free; […] reserved in total by PyTorch)
Solution: This can mean several things:
- You have chosen too large image dimensions. The size 480x480px is enough (although in theory it could support up to 420000 pixels in total, i.e. ~648x648). To enlarge the dimensions of the image use the tools linked in Image resizers.
- You have run out of memory from using it for a long time. You will have to start a new session.
- Google has assigned you a low memory GPU (<15109MiB). You will have to start a new session.
RuntimeError […] is too long for context length X
RuntimeError Traceback (most recent call last) <ipython-input-10-f0ccea6d731d> in <module>() 46 for prompt in args.prompts: 47 txt, weight, stop = parse_prompt(prompt) ---> 48 embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float() 49 pMs.append(Prompt(embed, weight, stop).to(device)) 50 /content/CLIP/clip/clip.py in tokenize(texts, context_length) 188 for i, tokens in enumerate(all_tokens): 189 if len(tokens) > context_length: --> 190 raise RuntimeError(f"Input {texts[i]} is too long for context length {context_length}") 191 result[i, :len(tokens)] = torch.tensor(tokens) 192 RuntimeError: Input […] is too long for context length 77
Solution. The input text is too long. Enter a shorter text. It has to be less than 350 characters[r 11]. See Lettercount.
TypeError: randint() received an invalid combination of arguments
TypeError Traceback (most recent call last) <ipython-input-8-b8abd6a7071a> in <module>() 43 z, *_ = model.encode(TF.to_tensor(pil_image).to(device).unsqueeze(0) * 2 - 1) 44 else: ---> 45 one_hot = F.one_hot(torch.randint(n_toks, [toksY * toksX], device=device), n_toks).float() 46 if is_gumbel: 47 z = one_hot @ model.quantize.embed.weight TypeError: randint() received an invalid combination of arguments - got (int, list, device=torch.device), but expected one of: * (int high, tuple of ints size, *, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad) * (int low, int high, tuple of ints size, *, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad)
Solution: No solution is not yet, since it only appears with certain configurations (not by default). The solution temporarily should be to change the parameters until it works.
ValueError: could not convert string to float
ValueError Traceback (most recent call last) <ipython-input-8-f0ccea6d731d> in <module>() 45 46 for prompt in args.prompts: ---> 47 txt, weight, stop = parse_prompt(prompt) 48 embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float() 49 pMs.append(Prompt(embed, weight, stop).to(device)) <ipython-input-5-32991545ebb9> in parse_prompt(prompt) 129 vals = prompt.rsplit(':', 2) 130 vals = vals + ['', '1', '-inf'][len(vals):] --> 131 return vals[0], float(vals[1]), float(vals[2]) 132 133 ValueError: could not convert string to float: ' los leones la levantan y[…]'
Solution: The text contained a colon :
, illegal characterif not in a certain sentence (red:-1
, for example).
WARNING:root:kernel restarted
Even if you try to reconnect, or re-run the cells, the machine seems to be dead. For (for example) the error WARNING:root:kernel […] restarted
.
Solution: Force the session to end, by going to the top tab next to Conectar
(Connect
) → Gestionar sesiones
(Manage sessions
→ Finalizar
(Finalize
).
Examples
Images
- Without image entry:
-
Text:
greek temples in space
. 300 iterations. -
Text:
Templos griegos en el espacio
. 300 iterations. -
Text:
Chinese, bronze, funeral
. 300 iterations. -
Text:
Diplodocus, surf
. 300 iterations.
- With image input:
-
Image input: (Dussian), by Khang Le.
-
Output with of the image after 50 iterations and text "
Modern spaceship insect alien
". -
Output with 1500 iterations and text "
Modern spaceship insect alien
".
-
Input image: Eli, the mascot of menéame (a social Spanish site), by Meneame.net
-
Text:
orange elephant
. 400 iterations. -
Text:
orange elephant, photorealistic
. 850 iterations. -
Text:
orange elephant | photorealistic
. (same seed than the last one). 300 iterations.
Examples of video
See also
References
Las Referencias aluden a las relaciones de un artículo con la "vida real".- ↑ Note: An alternative to XMP metadata is data entered using steganography. This metadata can be viewed using the stegano python library (Steganography Online).
- ↑ Note 2: Even though the seed is identical the results will still vary a bit due to the "augmentations". Augments are pseudo-random variations introduced in each iteration. They can be disabled, which would make it a bit faster, but at the cost of losing quality. See How can I disable the augmentations?
- ↑ Hinge loss is a kind of specific and independent loss function for artificial intelligences. See hinge loss.
- In machine learning, the "hinge loss" is a loss function used for training classifiers. The hinge loss is used for "maximum-margin" classification, most notably for support vector machines (SVMs).
- ↑ In Goya and other artists it has been appreciated that the first iterations are better.
- ↑ I.e., the AI has been trained with many images, some of which were labeled "rendered in X" so it mimics those results.
- ↑ Vía Aran Komatsuzaki.
- ↑ In the section Asignar pesos shows a way to try to remove the Unreal Engine logo that sometimes appears floating across the image.
- ↑ Although the word "trending" evokes something current, the dataset is not updated every day. That is, that "trending" will be taking the data from the moment in which the dataset was made.
- ↑ Es decir, el número de elementos que utiliza el modelo para definir una única imagen see ¡Esta IA crea ARTE con tus TEXTOS! (y tú puedes usarla 👀) [Minuto 7:30].
- ↑ Referencia.
- ↑ As such there is no character limit, only tokens (which are a group of characters), but the tokens vary depending on what you write. As I understand it, there are a maximum of 75 tokens, approximately 350 characters.
- ↑ 1,0 1,1 1,2 Before mastering the program, it is recommended to wait for each cell to be executed (in order to not skip steps). But it is not really necessary. It can be programmed from start to finish (even video) by filling in the correct parameters. Obviously, if there is an error or the machine expires, the video will not be able to download or the images will not be generated correctly, so you have to take a look from time to time.
- ↑ 2,0 2,1 2,2 This shortcuts are activated by pressing the first to keys united by the + first and then the aditional letter.
- ↑ 3,0 3,1 3,2 3,3 La puntuación de distancia de inicio de Frechet, o FID para abreviar, es una métrica que calcula la distancia entre los vectores de características calculados para imágenes reales y generadas. Vía: How to Implement the Frechet Inception Distance (FID) for Evaluating GANs.
External links
Los enlaces externos no están avalados por esta wiki. No nos hacemos responsables de la caída o redirección de los enlaces.
Information
- https://bit.ly/VQGANenwiki. Short link of this article.
Videos
- Taming transformers for high-resolution image synthesis.
- Cogitare: De TEXTO a OBRA DE ARTE | Te enseño a usar esta IA revolucionaria, de Cogitare
.
- De dot_CSV:
- ¡Esta IA crea ARTE con tus TEXTOS! (y tú puedes usarla 👀), de dot_CSV.
.
- ¡Descubre Cómo la IA será MÁS POTENTE! - 👁🗨VISIÓN + LENGUAJE NATURAL💬, de dot_CSV.
.
- ¿Es esta IA el FIN de los DISEÑADORES GRÁFICOS? ¿Puede la IA ser CREATIVA?. DALL-E es una IA diferente a VQGAN.
.
- ¡Esta IA crea ARTE con tus TEXTOS! (y tú puedes usarla 👀), de dot_CSV.
- These Neural Networks Have Superpowers.
Technical information
- Taming transformers for high-resolution image synthesis.
- CLIP project in github.
- Redes Neuronales Generativas Adversarias.
.
Other guides
- Multimodal.art.
- Cómo crear imágenes artísticas fácilmente con una IA: VQGAN.
- Brutal aplicación de texto a imagen (VQGAN+CLIP).
Local execution
- Docker in orther to run locally.
- Docker image by xekex#5678.
- Gist in github with local installation.
Other tools
Resources for initial images
- Phylopic. Siluetas de animales. Dominio público.
- Drive colaborativo de DotHub.
- Garabatos a caras.
- Artrbreeder.
Image resizers
- Notebook ESRGAN. (Comparison with waifu2x).
- Waifu2x. Image resizer. Free.
- Installable. See waifu2x, waifu2x extension-gui, waifu2x-caffe, waifu2x-converter-cpp.
- DeepAI Waifu2x. Free.
- bigjpg. Image resizer. Only 20 free
- Upscaler. Image resizer. Only 3 free.
Metadata viewers
- Jeffrey's Image Metadata Viewer. Data viewer EXIF.
- Steganography Online. Viewer of data entered by steganography. (Use the tab "Decode").
More tools
- Imgops.com. Multiple imaging operations.
- Ratio calculator.
- Lettercount.
- Dowload all images for Chrome.
- Download All Images for Firefox.
Text input generators
- EleutherAI. Powerful text AI generator.
- GPT-J6B Demo.
- GPT-Neo.
- Noemata titlegen. No IA. Genera nombres "surrealistas". Pueden ser una entrada de texto interesante si no se te ocurre otra cosa.
Notebooks
- Lucid Sonic Dreams. A partir de una música hace una animación.
- BigGAN + CLIP.
- codebook sampling method (original).
- TheBigSleep.
- http://bit.ly/VQGAN. Enlace corto del notebook .
- DeepDaze (notebook original).
- DeepDaze (simplified notebook).
Social media
Thanks
Special thanks to Eleiber#8347 for answering my questions and providing corrections. Also to Abulafia#3734 for explaining techniques and elchampi#0893for sharing his doubts. And to many users of the DotHub discord who have shared their techniques or doubts. Also to the Reddit users who have helped me.
⚜️
![]() |
Artículo redactado por Jakeukalane Para proponer cualquier cambio o adición, consulte a los redactores. |
![]() |
---|
![]() |
Artículo redactado por Avengium Para proponer cualquier cambio o adición, consulte a los redactores. |
![]() |
---|