Introduction
The purpose of this quick test is to briefly check the capabilities of a generative neural network trained to synthesize images. This article focuses on the consumption of the network’s results and ignores aspects such as installation, hardware requirements, or process automation (you can find this information on the tool’s GitHub, link below).
Image generation tools
In today’s test, we will use a publicly available software on GitHub called Fooocus. This solution provides a simple (but packed with options) GUI, and its installation is very simple (basically, you need to clone the repository and run the appropriate script). After installing the dependencies and downloading the trained models (about 25GB), we are presented with a GUI to open in our browser.
First run with Fooocus (shoes)
Let’s try something simple first. We’ll try to generate an item that our entire civilization uses – shoes. For simplicity, we’ll choose a female model (I assume the database of models for learning is larger). Let’s try the following command (prompt):
female shoes, futuristic, energetic colors
After a few minutes of generating, we got 9 images (original resolution of each is 1024×1024 pixels):
I have to admit that the tool did a really good job. Of course, it’s hard to verify how much they differ from the input data, but somehow it seems to be at a high level. There are no artifacts visible on any of the images. The shapes, proportions and colors are realistic. This is the first command since starting the program, without any deeper “digging” in the settings, apart from selecting the “Ads Fashion Editorial” style and the aforementioned resolution (the default resolution is 1152×896).
Below is a close-up of one of the images:
Second run with Fooocus (logo)
Now for something more difficult – logo. Networks of this type usually can’t handle text generation. Additionally, the command will be longer, more abstract and the last phrase loses some of its meaning:
Logo for blog about technology. Blog domain is "bwrite.pl" and the main tech stack is Apache server, SEO, tools for developers and AI.
Here are the effects:
The shapes are interesting, but there’s something strange going on with the text. It doesn’t look good, but this is only the second command since starting the program. I chose the “Logo Design” style and a resolution of 1024×1024.
Third run with Fooocus (people)
Okay, now something very popular – people. Let’s try a simple, not too detailed prompt:
people in park, autumn, sunny weather
These are the results:
The weather and location match, but the “people” are only in the background, and we have one person in the foreground. It is possible that this is also the effect of training data, although it seems more likely that the command is too simple (not detailed).
How to run Fooocus without having hardware
If you don’t have a computer with sufficiently powerful parameters, an option is to run Fooocus in the cloud. The project’s website contains instructions for running it in Google Colab. However, it should be remembered that free resources are limited, both in size and time. It is worth considering purchasing an appropriate subscription if you plan to “play” with this or other solutions for a longer period of time.
Summary
As a free tool available to anyone with the right equipment (mid-range) or access to a computing cloud, the tool’s capabilities seem very rich. In literally a few minutes, we are able to generate photorealistic images in good resolution. The tool also has an upscaling option, so we can increase the resolution even more. No expert knowledge is required for this, and the GUI itself provides the right amount of settings.