Hi everyone, Kuwa v0.3.1 is out, and this update mainly focuses on multimodal input and output, which now supports both speech and images. Combined with the previously launched Bot system and group chat functions, this allows for practical functions such as meeting summaries, speech summaries, simple image generation, and image editing:
- Supports the Whisper speech-to-text model, which can output transcripts from uploaded audio files, and features multi-speaker recognition and timestamps.
- Supports the Stable Diffusion image generation model, which can generate images from text input or modify uploaded images based on user instructions.
- Huggingface executor supports integration with vision-language models such as Phi-3-Vision and LLaVA.
- RAG supports direct parameter adjustment through the Web UI and Modelfile, simplifying the calibration process.
- RAG supports displaying original documents and cited passages, making it easier to review search results and identify hallucinations.
- Supports importing pre-built RAG vector databases, facilitating knowledge sharing across different systems.
- Simplified selection of various open models during installation.
- Multi-chat Web UI supports direct export of chat records in PDF, Doc/ODT formats.
- Multi-chat Web UI supports Modelfile syntax highlighting, making it easy to edit Modelfiles.
- Kernel API supports passing website language, allowing the Executor to customize based on user language.
- The Executor removes the default System prompt to avoid compromising model performance.
info
kuwa-v0.3.1 Download information: https://github.com/kuwaai/genai-os/releases/tag/v0.3.1 kuwa-v0.3.1 Single executable download link: https://github.com/kuwaai/genai-os/releases/download/v0.3.1/Kuwa-GenAI-OS-v0.3.1.exe