SegAnyMo:从视频中自动分割任意运动物体的开源工具

SegAnyMo: open source tool to automatically segment arbitrary moving objects from video

General Introduction SegAnyMo is an open source project developed by a team of researchers at UC Berkeley and Peking University, including members such as Nan Huang. This tool focuses on video processing and can automatically recognize and segment arbitrary moving objects in a video, such as people, animals or...
2mos ago
04810
MakeSense:免费使用的图像标注工具,提升计算机视觉项目效率

MakeSense: a free-to-use image annotation tool to improve computer vision project efficiency

General Introduction Make Sense is a free online image annotation tool designed to help users quickly prepare datasets for computer vision projects. It requires no complicated installation, just open a browser access to use it, supports multiple operating systems, and is perfect for small deep learning projects. Users can...
3mos ago
06870
HealthGPT:支持医学图像分析与诊断问答的医疗大模型

HealthGPT: A Medical Big Model to Support Medical Image Analysis and Diagnostic Q&A

Comprehensive Introduction HealthGPT is a state-of-the-art medical grand visual language model designed to enable unified medical visual understanding and generation capabilities through heterogeneous knowledge adaptation. The goal of the project is to integrate medical visual understanding and generation capabilities into a unified autoregressive framework that significantly improves the medical graph...
3mos ago
06210
MedRAX: 利用多模态大模型进行胸部X光片分析的智能体

MedRAX: A Smart Body for Chest X-ray Analysis Using Multimodal Large Models

Comprehensive Introduction MedRAX is a state-of-the-art AI intelligence designed for chest radiograph (CXR) analysis. It integrates state-of-the-art CXR analysis tools and multimodal large language models to dynamically process complex medical queries without additional training.MedRAX, through its modular design...
3mos ago
07790
CogVLM2:开源多模态模型,支持视频理解与多轮对话

CogVLM2: Open Source Multimodal Modeling with Support for Video Comprehension and Multi-Round Dialogue

Comprehensive Introduction CogVLM2 is an open source multimodal model developed by the Tsinghua University Data Mining Research Group (THUDM), based on the Llama3-8B architecture, and designed to provide performance comparable to or even better than GPT-4V. The model supports image understanding, multi-round dialogs, and visual ...
4mos ago
06960
Twelve Labs:理解视频内容的多模态AI解决方案,视频搜索、生成、嵌入API服务

Twelve Labs: multimodal AI solution for understanding video content, video search, generation, embedding API services

General Introduction Twelve Labs is a multimodal AI company focused on video understanding, dedicated to helping users understand and process large amounts of video content through advanced AI technologies. Its core technologies include video search, generation, and embedding, which are able to extract key features from video such as actions, objects...
4mos ago
08550