May 15, 2024 acecloud
Artificial Intelligence (AI) is now intersecting human lives in myriads of previously unimaginable ways. Tremendous advancements are being made every day in the fields of Deep Learning (DL), Generative Artificial Intelligence (GAI) and Convolutional Neural Networks (CNNs).
Are you wondering what these jargon and abbreviations mean and how do they affect you? We’re here to answer that. AI is being used not only to automate and extrapolate, but also to create. Image recognition and object detection have been key areas of AI research for decades. And now, visual art itself has taken a substantial leap because of the emergence of Generative AI and synthetic image generation.
Prominent commercial applications of GAI include Dall-E, Parti and Midjourney which have gained explosive popularity by allowing users to quickly create highly realistic, but completely imaginary, photos.
But even among the array of unbelievably creative text-to-image generative AI tools, Stable Diffusion holds a special place. Unlike its predecessors which are licensed and available at a cost only via Cloud, Stable Diffusion is Opensource and can be used by absolutely anyone from their own personal computers! If you don’t believe us, try it here for free!
This article will provide an overview of Stable Diffusion and its uses. We will also compare the performance of Stable Diffusion AI on CPU and GPU and examine various GPU benchmarks for running SDAI.
What are Generative Adversarial Networks?
Invented in 2014, Generative Adversarial Networks (GANs) are AI algorithms that enable applications to synthesize new data from existing data samples. Utilizing various Deep Learning techniques, GAN algorithms extract meaning from the distributed data they’re fed during the model training phase. This comprehension is then used to generate various categories of synthetic data in response to user prompts.
Maximize Efficiency And Scale Up Your Business with
AceCloud
Know More
Every GAN has two components which work in tandem to generate artificial data. The first is a Deep Neural Network-based Generator engine which is employed to create the new data (image, text, voice, etc.). The second is a Discriminator tasked with responsibly differentiating between actual and generated data samples.
Both generator and discriminator are trained in a zero-sum game. The primary objective of the generator is to dupe the discriminator into believing that the generated data sample is actually real. The discriminator’s role, on the other hand, is to determine and classify given data as original or AI-generated.
The discriminator is penalized for misclassifying a real instance as fake or vice versa. The generator is penalized for failing to fool the discriminator. Either case, the learning is backpropagated to the GAN for upgrading its comprehension.
As the training progresses, the generator and the discriminator improve in creating realistic data and discerning generated data respectively.
Outline of GAN training framework (Source)
Training the GAN model is inherently difficult. The creative process of solely using noise to create high-quality images often culminates in glaring errors in the final output which the computer cannot overcome, such as five-legged animals or weird oddities in landscapes.
Nonetheless, the global Generative AI market is likely to cross USD 50 billion by 2028.
Enterprises have been using GAN for creating highly realistic marketing content, promotional media and even book covers. Disney+/ Marvel Comics, in fact, went all the way and used AI to develop the entire intro sequence of their latest offering Secret Invasion.
Students are using freely available GAN applications to write school essays, employees worldwide are increasingly becoming dependent on these to draft official emails/ letters/ legalese, and content creators are deriving more than just inspiration from AI-generated content. The use of AI to create spectacularly realistic images and voice-overs has unfortunately also gained notoriety for being used in deepfakes and morphed images of celebrities, businesspeople and politicians.
GAN can even generate realistic images of individuals who do not exist, such as historical or fictional characters. Fascinating, isn’t it?
Screengrab from the AI-generated intro of Marvel’s Secret Invasion series (Source)
What is Stable Diffusion Artificial Intelligence?
SDAI is an Opensource Deep Learning GAN application which falls under the category of Diffusion Probabilistic Learning model. Essentially, it is a text-to-image generator trained to output new data (mostly images and visuals) using its existing training datasets in response to a text-based input. It can also undertake image-to-image refinements based on given text prompts.
It was developed by a joint consortium of Stability AI, researchers at LM University of Munich, and several non-profit organizations.
The text input is received via CLIP ViT-L/14 text encoder, a neural network trained on numerous image-text pairs, which guides the noising/denoising processes involved in understanding text prompts and transforming them to desired images. In the case of SDAI, the DL neural network was trained on over 5 billion image-text pairs contained in the publicly available LAION-5B dataset. The U-Net Convolutional Neural Network was used for image segmentation and object identification.
How was SDAI trained?
SDAI, like all Latent Diffusion Models, is initially fed a noisy image and instructed to minimize the noise. The supplied image is also captioned so that the AI algorithms understand both the input image and the expected denoised output. Repeated denoising iterations result in a perfectly sharp image.
Once the model is trained, the denoising process is applied to nothing but noise in order to achieve text-to-image output! Confusing, right? The model now basically comprehends image-text pairs and given a specific text prompt begins to “see” the images underneath layers of noise. It knows images only in relation to their textual associations, i.e., unlike humans, it cannot decipher if two different entities/ images could or should co-exist. Denoising only noise, it now creates extremely realistic images from scratch, images which would never have existed in the first place without the AI “seeing” them.
And that folks is how you end up with the frog-version of Darth Vader hopping around in green shrubbery!
Created by yours truly using Huggingface’s SDAI prototype
The training was undertaken on 256 Nvidia A100 Graphics Processing Units (GPUs) over 150,000 GPU-hours. Now that the model has been developed, Stable Diffusion can be installed on off-the-shelf computers equipped with a graphics card with 4 GB or more VRAM. Around 12 GB space is required for installation, depending on the application version.
Can Stable Diffusion be used with CPU?
Yes. As outlined above, SDAI requires a minimum set of system resources to run. These can be supplied by most high-end CPU or ordinary GPU. However, note that with only the bare minimum processing resources, SDAI cannot generate high-quality images larger than 512×512 pixels. Creating quality images with more accuracy and clarity demands the tremendous parallel processing power of GPU.
Nonetheless, here’s how Stable Diffusion can be used with various available CPUs –
- Intel CPUs: Intel processors are not officially supported for high-end Stable Diffusion output. However, OpenVino can be used to enable SDAI on Intel Arc graphics cards.
- Apple M1 CPUs: These can generate slightly better images via SDAI, but they too require a fork called InvokeAI for generating high-quality images. InvokeAI needs a minimum of 12 GB RAM and 12 GB SSD/ HDD for installation.
- Ryzen CPUs: Only a limited number of Ryzen processors can handle SDAI. AMD Ryzen 5 5600G with integrated GPU (iGPU) is one of these. Forks and packages must be pre-installed on these as well for seamless integration. OpenVino also works well here.
Using Stable Diffusion with GPUs
Stable Diffusion works best with GPUs. No surprise there given that GPUs were designed to handle image processing tasks. They support high-quality image rendering, generate results faster, and facilitate quicker turnaround times when manipulating images further or inserting negative prompts.
SDAI can be used with GPUs from Nvidia, AMD and Intel. Nvidia GPUs excel here by creating the highest resolution, detail-rich images, shrugging off AMD and Intel GPUs by colossal margins. They’re also the fastest choice.
Intel Arc GPUs do not support seamless installation and take substantial time to generate outputs similar quality-wise to AMD GPUs. Furthermore, like Intel CPUs, the OpenVino fork is a pre-requisite to enable SDAI on Intel GPUs as well.
Though AMD GPUs support SDAI, the image rendering performance is inferior vis-a-vis Nvidia GPUs. The latest generations of AMD GPUs with at least 8GB VRAM provide ideal performance and can match Nvidia’s performance to some extent with additional steps.
SDAI performance on GPU and CPU hardware has been evaluated against multiple benchmarks in terms of speed, memory consumption, throughput, and quality of the output images. These evaluations remarkably illustrate that not a whole lot of GPU processing power is required to run SDAI, the difference in output being mere seconds among flagship GPUs. It is, however, against CPUs that GPUs demonstrate their unparalleled processing caliber.
What are some ways to Accelerate SDAI?
SD users must run the diffusion pipeline several times to develop a satisfactory image. Generating a full-fledged, highly detailed image is computationally intensive even for the most advanced GPU, especially when said image must come from nothingness, or blank noise.
Thus, engineers have come up with various methodologies to increase the performance and speed of SDAI. These include:
- Instructing the diffusion pipeline to run at float16 precision instead of the default float32 precision
- Reducing the number of inference steps involved in SDAI image generation (default = 50 steps)
- Executing the AI model on better optimized schedulers such as DPMSolverMultistepScheduler. SDAI normally runs on PNDMS Scheduler which requires 50 inference steps, however, DPMSolverMultistepScheduler can generate satisfactory image output in only 20-25 inference.
- Tweaking the SD model batch sizes to consume less memory can also improve pipeline performance.
What are some uses of Stable Diffusion?
Stable Diffusion AI is a publicly available tool. One can generate images and sell them for various purposes. These include –
- Creating marketable images for product catalogs, thereby eliminating the need for manual photography of products
- Professional logo design, reducing time and financial investment in creating multiple logo iterations
- Upscaling low-quality images by removing Gaussian noise, developing highly detailed visuals/ landscapes ground-up from basic designs/ mock-ups
- Detail-rich image-to-image translations using textual prompts as input
- Image segmentation applications involved in satellite imagery, remote sensing and medical imaging
- AI-based urban planning, public works improvement and traffic management
Conclusion
Diffusion models and AI-based image generation are becoming mainstream given their versatility and diverse applications across industries. Although these might be new and bewildering to some, many enterprises and enthusiasts have already figured out how to leverage their many capabilities for content generation and commercial ventures.
We’ve demonstrated that using these tools is significantly easier on GPU rather than CPU. GPUs deliver seamless performance, substantial acceleration and an all-around pleasant ease of use. Subscribing to a GPU over the Cloud can help unlock the potential of SDAI for your business. Reach out to our AI enthusiasts and see for yourself how we can assist you.