Comprehensive Introduction FunASR is an open source speech recognition toolkit developed by Alibaba's Dharma Institute to bridge academic research and industrial applications. It supports a wide range of speech recognition features, including speech recognition (ASR), voice endpoint detection (VAD), punctuation recovery, language modeling, speaking...
Comprehensive Introduction MaskGCT (Masked Generative Codec Transformer) is a fully non-autoregressive Text-to-Speech (TTS) model jointly introduced by Funky Maru Technology and The Chinese University of Hong Kong. The model does not require explicit text-to-speech ...
General Introduction Wav2Lip is an open source high-precision lip sync generation tool designed to accurately synchronize arbitrary audio with lip sync in video. The tool was developed by Rudrabha Mukhopadhyay et al. in ACM Multimedia 20...
General Introduction Deep Live Cam is an open source artificial intelligence tool designed to enable real-time face replacement and deep fake video generation from a single photo. The tool utilizes advanced deep learning algorithms to enable real-time face replacement in live streams or video calls, protecting user privacy and adding fun...
Comprehensive Introduction Dify is an open source generative AI application development platform designed to help developers rapidly build and operate native AI applications based on Large Language Models (LLMs). The platform provides everything from Agent building to AI workflow orchestration, RAG retrieval...