DeepSeek released the first open source version of its v3 model, now with the strongest code capabilities (in China)
DeepSeek-V3是一款强大的混合专家(Mixture-of-Experts, MoE)语言模型,拥有6710亿总参数,针对每个token激活37亿参数。该模型采用了一种创新的多头潜在注意力(Mu...
CogAgent: Smart Spectrum's open source intelligent visual language model for automating graphical interfaces
综合介绍 CogAgent是由清华大学数据挖掘研究组(THUDM)开发的开源视觉语言模型,旨在实现跨平台的图形用户界面(GUI)自动化操作。该模型基于CogVLM(GLM-4V-9B),支持中英文双语...
Dharma Institute's "Searchlight" Video Creation Platform Full Review
今天早些时候收到“寻光”内测申请通过的通知,睡前发个简单的评测。 该平台定位是达摩院的“视觉技术能力应用平台”,目前应用较少(对比发布会)期待逐步开放更多视觉类应用。 寻光分为两个地址: https...
DisPose: generating videos with precise control of human posture, creating dancing ladies
综合介绍 DisPose是一个创新的开源人工智能项目,专注于可控的人物图像动画生成。该项目由研究团队开发并在GitHub上开源,采用先进的深度学习技术,通过分解骨骼姿态信息来实现精确的人物动画控制。D...
Smolagents: open source project for rapid development of AI intelligences and lightweight construction of intelligences
Comprehensive Introduction Smolagents is a lightweight intelligent agent library developed by HuggingFace that focuses on simplifying the development process of AI agent systems. The project is known for its clean design philosophy, with only about 1000 lines of core code, yet provides powerful feature integration capabilities. It is most ...
Combined cue word commands by visually extracting documents as Markdown formatted documents
该指令来源于 Vision Parse 项目,分为两步提取markdown文档。 图像分析提示词 (img_analysis.prompt): Analyze this image and retur...
Napkin AI Chinese Beginner's Guide
How to start generating visual content with Napkin AI ? (Account creation, visual generation, export to pdf or image file...) Welcome to Napkin AI, a tool that makes it easy to convert your text into beautiful visuals. This guide will walk...
Vision Parse: Intelligent Conversion of PDF Documents to Markdown Format Using Visual Language Models
综合介绍 Vision Parse是一个革命性的文档处理工具,它巧妙地结合了最先进的视觉语言模型(Vision Language Models)技术,能够将PDF文档智能转换为优质的Markdown格...
InvSR: Open source image super-resolution project to improve the quality of image resolution
General Introduction InvSR is an innovative open-source image super-resolution project based on diffusion inversion techniques capable of converting low-resolution images into high-quality, high-resolution images. The project utilizes the rich a priori knowledge of images embedded in pre-trained large-scale diffusion models to support, through a flexible sampling mechanism, the...
Infinity: bitwise autoregressive modeling for generating high-resolution images for unlimited high-resolution image generation
综合介绍 Infinity是一个开创性的高分辨率图像生成框架,由FoundationVision团队开发。该项目通过创新的位级视觉自回归建模方法,突破了传统图像生成模型的限制。Infinity的核心特...