Kuwa v0.3.1 adds Kuwa Painter based on the Stable Diffusion image generation model,
You can generate an image by inputting a text, or upload an image and generate an image with a text.
Known issues and limitations
Hardware requirements
The default model uses stable-diffusion-2, and the VRAM consumed when running on GPU is as shown in the following table.
Model Name | VRAM requirement |
---|---|
stable-diffusion-2 | ~3GB |
stable-diffusion-xl-base-1.0 | ~8GB |
sdxl-turbo | ~8GB |
stable-diffusion-3-medium-diffusers | ~18 GB |
Known limitations
- sdxl-turbo throws an error while performing img2img
Build Painter Executor
Windows version startup steps
The Windows version should automatically execute Painter Executor by default. If it is not executed, please follow the steps below:
- Double-click
C:\kuwa\GenAI OS\windows\executors\painter\init.bat
to generate related execution settings - Restart Kuwa, or reload the Executor by inputting
reload
in the terminal window of Kuwa - An Executor named Painter should be added to your Kuwa system
Docker version startup steps
The Docker compose configuration file for Kuwa Speech Recognizer is located in docker/compose/painter.yaml
. You can refer to the following steps to start it:
- Add
"painter"
to the confs array indocker/run.sh
(copy from run.sh.sample if the file does not exist) - Execute
docker/run.sh up --build --remove-orphans --force-recreate
- An Executor named Painter should be added to your Kuwa system
Using Painter
Text to Image
You can input a text and let Kuwa Painter generate an image for you. It is important to note that the original Stable Diffusion model has a poor understanding of Chinese.
At this time, you can use the group chat and quoting functions of Kuwa to let other language models translate the user's Prompt first, and then ask the Stable Diffusion model to generate an image, which usually gives better results.
The first generated image in the figure below is based on the original Chinese User prompt (電影風格畫面。擁有雄偉鹿角的雄鹿,在翠綠的森林裡安靜地低頭吃草。
), and the second image is the Prompt translated by TAIDE (Film-inspired scene. A majestic stag with impressive antlers grazing serenely amidst a verdant forest.
) was used as the input to Stable Diffusion, and the quality difference between the two images is significant.
Image to Image
You can also upload a sketch, and then describe what you want to draw, and Kuwa Painter will draw it for you.