多模态实时互动产品

Sorting

RealtimeVoiceChat: low-latency natural spoken conversation with AI

General Introduction RealtimeVoiceChat is an open source project focused on real-time, natural conversations with artificial intelligence via voice. Users use a microphone to input their voice, and the system captures the audio through a browser, quickly converts it to text, and a large-scale language model (LLM) generates back...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

2mos ago

0541

Stepsailor: Integrating AI Command Bars in Existing SaaS Offerings

综合介绍 Stepsailor 是一个专为开发者打造的工具，核心是一个 AI 命令栏。开发者可以用它让自己的软件产品听懂用户的话，比如用户说“添加新任务”，软件就自动执行。它通过简单的 SDK 集成到...

Latest AI tools # Professional Productivity Tools # Multimodal Real-Time Interactive Products

3mos ago

0534

OpenAvatarChat: a modularly designed digital human conversation tool

General Introduction OpenAvatarChat is an open source project developed by the HumanAIGC-Engineering team and hosted on GitHub. It is a modular digital human conversation tool that allows users to run on a single PC...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

3mos ago

0618

VideoMind: video by timestamp positioning content and Q&A open source project

General Introduction VideoMind is an open source multimodal AI tool focused on inference, Q&A and summary generation for long videos. It was developed by Ye Liu of the Hong Kong Polytechnic University and a team from Show Lab at the National University of Singapore. The tool mimics human understanding of video...

Latest AI tools # AI Java Open Source Projecct # AI Text and Audio/Video Summarization Tool # AI audio/video editor

1mos ago

0773

MoshiVis: an open source model for real-time speech dialog and image understanding

General Introduction MoshiVis is an open source project developed by Kyutai Labs and hosted on GitHub. It is based on the Moshi speech-to-text model (7B parameters), with about 206 million new adaptation parameters and frozen Pal...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

4mos ago

0684

Qwen2.5-Omni: an end-measurement model for multimodal input and real-time speech interaction

综合介绍 Qwen2.5-Omni 是阿里巴巴云 Qwen 团队开发的一款开源多模态 AI 模型。它能处理文本、图像、音频和视频等多种输入，并实时生成文本或自然语音响应。这款模型于 2025 年 3 ...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

4mos ago

01.1K

xiaozhi-esp32-server: Xiaozhi AI chatbot open source back-end services

综合介绍 xiaozhi-esp32-server 是一个为小智AI聊天机器人（xiaozhi-esp32）提供后端服务的工具。它用 Python 编写，基于 WebSocket 协议，帮助用户快速...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

4mos ago

01.1K

Baichuan-Audio: an end-to-end audio model supporting real-time voice interaction

综合介绍 Baichuan-Audio 是由百川智能（baichuan-inc）开发的一个开源项目，托管于 GitHub 上，专注于端到端的语音交互技术。该项目提供了一个完整的音频处理框架，能够将语音...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

5mos ago

0732

PowerAgents: AI Intelligent Body Platform for Timing Web Tasks

综合介绍 PowerAgents 是一个专注于网页自动化任务的AI智能体平台，用户可以通过它创建并部署能够点击、输入和提取数据的AI智能体。该平台支持将任务设置为按小时、天或周自动运行，用户还能实时观...

Latest AI tools # Multimodal Real-Time Interactive Products

5mos ago

01K

Step-Audio: a multimodal voice interaction framework that recognizes speech and communicates using cloned speech, among other features

Comprehensive Introduction Step-Audio is an open source intelligent speech interaction framework designed to provide out-of-the-box speech understanding and generation capabilities for production environments. The framework supports multi-language dialog (e.g., Chinese, English, Japanese), emotional speech (e.g., happy, sad), regional dialects (e.g., Cantonese, Szechuan ...

Latest AI tools # AI Java Open Source Projecct # AI voice cloning # Multimodal Real-Time Interactive Products

5mos ago

01.1K

Gemini Cursor：基于Gemini构建的AI桌面智能助手，能看、能听、能说

Gemini Cursor: an AI desktop smart assistant built on Gemini that can see, hear and speak

General Introduction Gemini Cursor is a desktop intelligent assistant based on Google's Gemini 2.0 Flash (experimental) model. It enables visual, auditory, and voice interactions through a multimodal API, providing real-time low-latency use...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

5mos ago

01.2K

DeepSeek-VL2: an expert visual language model for advanced multimodal understanding

Comprehensive Introduction DeepSeek-VL2 is a series of advanced Mixture-of-Experts (MoE) visual language models that significantly improve the performance of its predecessor, DeepSeek-VL. The models are useful in visual question and answer, optical character recognition, text...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

5mos ago

01.2K

AI Web Operator：浏览器自动化操作，OpenAI Operator的开源实现

AI Web Operator: Browser Automation, an Open Source Implementation of OpenAI Operator

综合介绍 AI Web Operator 是一个开源的 AI 浏览器操作工具，旨在通过集成多种 AI 技术和 SDK，简化用户在浏览器中的操作体验。该工具基于 Browserbase 和 Vercel...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

6mos ago

01.1K

SpeechGPT 2.0-preview: an end-to-end anthropomorphic speech dialog grand model for real-time interaction

综合介绍 SpeechGPT 2.0-preview 是 OpenMOSS 推出的首个拟人化实时交互系统，基于百万小时级语音数据训练而成。该系统具备拟人口语化表达与百毫秒级低延迟响应，支持自然流畅的实...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

6mos ago

01.1K

OpenAI Realtime Agents：多智能体语音交互应用（OpenAI示例）

OpenAI Realtime Agents: A Multi-Intelligent Body Speech Interaction Application (OpenAI Example)

综合介绍 OpenAI Realtime Agents是一个开源项目，旨在展示如何利用OpenAI的实时API来构建多智能体的语音应用。它提供了高级的智能体模式（借鉴 OpenAI Swarm），允许...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

6mos ago

01.3K

Bailing: a low-latency open source voice dialog assistant that easily realizes natural conversational exchanges

综合介绍百聆（Bailing）是一个开源的语音对话助手，旨在通过语音与用户进行自然的对话。该项目结合了语音识别（ASR）、语音活动检测（VAD）、大语言模型（LLM）和语音合成（TTS）技术，实现了...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

6mos ago

01.3K

Weebo: a real-time voice chatbot that provides a natural language conversational experience

General Introduction Weebo is an open source real-time voice chatbot that utilizes Whisper Small for speech recognition, Llama 3.2 for natural language generation, and Kokoro-82M for speech synthesis. The project was developed by Aman...

Latest AI tools # AI Java Open Source Projecct # Multimodal Real-Time Interactive Products

6mos ago

01.3K