General Introduction Minima is an open source RAG (Retrieval-Augmented Generation) solution that supports local deployment and integration with ChatGPT. The project is maintained by dmayboroda and aims ...
Comprehensive Introduction HealthGPT is a state-of-the-art medical grand visual language model designed to enable unified medical visual understanding and generation capabilities through heterogeneous knowledge adaptation. The goal of the project is to integrate medical visual understanding and generation capabilities into a unified autoregressive framework that significantly improves the medical graph...
Comprehensive Introduction Step-Video-T2V is an advanced text-to-video conversion model by StepFun AI (StepFun Star). The model has 3 billion parameters and is capable of generating videos up to 204 fps. With a deeply compressed Variable Auto-Encoder (VAE), the model...
Comprehensive Introduction Step-Audio is an open source intelligent speech interaction framework designed to provide out-of-the-box speech understanding and generation capabilities for production environments. The framework supports multi-language dialog (e.g., Chinese, English, Japanese), emotional speech (e.g., happy, sad), regional dialects (e.g., Cantonese, Szechuan ...
General Introduction Watermark Removal is an open source project that utilizes machine learning and deep learning techniques for image restoration, specifically for removing watermarks from images. The project was developed by Chimzuruoke Okafor and is inspired by Con...
General Description Whisper Input is an open source voice transcription tool that allows users to start recording voice by pressing the Option button and end the recording by lifting the button. The tool calls Groq Whisper Large V3 Turbo ...
Comprehensive Introduction TTS Importer is an open source project designed to easily import Azure TTS (Text-to-Speech) speech synthesis services into a variety of reading software. The tool supports several popular reading programs, including Read (legado...