Tuesday, April 02, 2024

stable diffusion experiments

Now that I have access to a GPU, I tried out some AI-based image resizing, video upscaling and image generation, using stable diffusion as noted in a previous post. This post notes some of the pros and cons of the technology as it stands now, in April 2024.

1. There doesn't seem to be a way to incrementally correct images as with ChatGPT-generated code. We need to fine-tune our prompts if we need better results, and run the generation once again.

2. On this machine - 1050TX GPU, quite slow - stable diffusion with the webui takes around a minute to generate a 512x512 default image with an img2img promt and the default 20 steps.

3. Following this guide on fddb, my results with generation were not so great. Changing "little girl" to "businessman with briefcase" did not result in a briefcase in the 4-5 iterations I tried out. Additionally, scaling up the image showed that the skyscrapers were not realistic at all. Perhaps this can be fixed by generating in the higher quality instead of first generating in 512x512 and then scaling up - but I can't do that, since I run into 'CUDA out of memory' errors.. Edit - further experiments in this post seem to indicate 1024x1024 seems to be the upper limit for most models.

Example - part of the fddb example image, upscaled to 4k using Lanczos resizing,

 and using R-ESRGAN-4x+, we see the cartoonish quality,



4. Trying to upscale a video which had a series of stills - something like a slideshow - resulted in lots of image flicker in the upscaled video. The reason for the flicker is some horrendous hallucination, close-ups from a couple of frames shown below.
Original - 

Upscaled - 


and the next frame has the shading which causes the flicker,


Decided to use Lanczos instead of AI in that case. But single image upscaling can give good results, especially if the image is a generic image and not an exact likeness of someone. In case of an exact likeness, some hallucination of features is seen. Example close up, something like spectacles is seen on the bridge of the nose, not present in the original.





 

No comments:

Post a Comment