Hi everyone, Kuwa v0.3.1 is out, and this update mainly focuses on multimodal input and output, which now supports both speech and images. Combined with the previously launched Bot system and group chat functions, this allows for practical functions such as meeting summaries, speech summaries, simple image generation, and image editing:
- Supports the Whisper speech-to-text model, which can output transcripts from uploaded audio files, and features multi-speaker recognition and timestamps.
- Supports the Stable Diffusion image generation model, which can generate images from text input or modify uploaded images based on user instructions.
- Huggingface executor supports integration with vision-language models such as Phi-3-Vision and LLaVA.
- RAG supports direct parameter adjustment through the Web UI and Modelfile, simplifying the calibration process.
- RAG supports displaying original documents and cited passages, making it easier to review search results and identify hallucinations.
- Supports importing pre-built RAG vector databases, facilitating knowledge sharing across different systems.
- Simplified selection of various open models during installation.
- Multi-chat Web UI supports direct export of chat records in PDF, Doc/ODT formats.
- Multi-chat Web UI supports Modelfile syntax highlighting, making it easy to edit Modelfiles.
- Kernel API supports passing website language, allowing the Executor to customize based on user language.
- The Executor removes the default System prompt to avoid compromising model performance.
kuwa-v0.3.1 Download information: https://github.com/kuwaai/genai-os/releases/tag/v0.3.1 kuwa-v0.3.1 Single executable download link: https://github.com/kuwaai/genai-os/releases/download/v0.3.1/Kuwa-GenAI-OS-v0.3.1.exe
Here are the detailed user guide documents:
- Whisper Speech-to-Text Model User Guide (including speaker recognition): https://kuwaai.tw/blog/whisper-tutorial
- Stable Diffusion Image Generation Model User Guide: https://kuwaai.tw/blog/diffusion-tutorial
- Vision and Language Model Integration Tutorial: https://kuwaai.tw/blog/vlm-tutorial
We welcome your feedback after trying out the new version, and please feel free to contact us through the community or other channels if you encounter any difficulties.
Official Kuwa website: https://kuwaai.tw/
Introduction to Kuwa GenAI OS
Kuwa GenAI OS is a free, open, secure, and privacy-focused open-source system that provides a user-friendly interface for generative AI and a new-generation generative AI orchestrator system that supports rapid development of LLM applications. Kuwa provides an end-to-end solution for multilingual and multi-model development and deployment, empowering individuals and industries to use generative AI on local laptops, servers or the cloud, develop applications, or open stores and provide services externally. Here is a brief description of version v0.3.1:
Usage Environment
- Supports multiple operating systems including Windows, Linux, and MacOS, and provides easy installation and software update tools, such as a single installation executable for Windows, an automatic installation script for Linux, a Docker startup script, and a pre-installed VM virtual machine.
- Supports a variety of hardware environments, from Raspberry Pi, laptops, personal computers, and on-premises servers to virtual hosts, public and private clouds, with or without GPU accelerators.
User Interface
- The integrated interface can select any model, knowledge base, or GenAI application, and combine them to create single or group chat rooms.
- The chat room can be self-directed, citing dialogue, specifying group chat or direct private chat, switching between continuous Q&A mode or single-question Q&A mode
- Controllable crossings at any time, import prompt scripts or upload files, you can also export complete chat room conversation scripts, directly output files in formats such as PDF, Doc/ODT, plain text, or share web pages
- Supports text, image generation, speech, and visual recognition multimodal language models, and can highlight syntax such as programming and Markdown, or quickly use system gadgets.
Development Interface
- Users can skip coding by connecting existing models, knowledge bases, or Bot applications, adjusting system prompts and parameters, presetting scenarios, or creating prompt templates to create personalized or more powerful GenAI applications.
- Users can create their own knowledge base by simple drag and drop, or import existing vector databases, and can use multiple knowledge bases for GenAI applications at the same time.
- Users can create and maintain their own shared app Store, and users can also share bot apps
- The Kuwa extension model and RAG advanced functions can be adjusted and enabled through the Ollama modelfile.
Deployment Interface
- Supports multiple languages, can customize the interface and messages, and directly provide services for external deployment.
- Existing accounts can be connected or registered with an invitation code. When the password is forgotten, it can be reset with Email.
- System settings can modify system announcements, terms of service, warnings, etc., or perform group permission management, user management, model management, etc.
- The dashboard supports feedback management, system log management, security and privacy management, message query, etc.
Development Environment
- Integrates a variety of open-source generative AI tools, including Faiss, HuggingFace, Langchain, llama.cpp, Ollama, vLLM, and various Embedding and Transformer-related packages. Developers can download, connect, and develop various multimodal LLMs and applications.
- RAG Toolchain includes multiple search-augmented generation application tools such as DBQA, DocumentQA, WebQA, and SearchQA, which can be connected with search engines and automatic crawlers, or integrated with existing corporate databases or systems, facilitating the development of advanced customized applications.
- Open source allows developers to create their own custom systems based on their own needs.