loading…
Search for a command to run...
loading…
Provides advanced image analysis capabilities including object recognition, OCR text extraction, and multi-turn visual dialogues using OpenAI-compatible APIs. I
Provides advanced image analysis capabilities including object recognition, OCR text extraction, and multi-turn visual dialogues using OpenAI-compatible APIs. It supports both local files and Base64 inputs with additional features for session persistence and web-based configuration management.
提供图像分析能力的 MCP 服务器,支持图像识别、文字提取、多轮对话等功能。
# 克隆仓库
git clone https://github.com/YOUR_USERNAME/mcp-vision-server.git
cd mcp-vision-server
# 创建虚拟环境
python -m venv venv
source venv/Scripts/activate # Windows Git Bash
# 安装依赖
pip install -e .
cp .env.example .env
.env 文件,填入您的 API 配置:# 必填配置
VISION_API_KEY=your-api-key-here
VISION_BASE_URL=https://open.bigmodel.cn/api/paas/v4/
VISION_MODEL=glm-4v
mcp-vision-server
或直接运行:
python -m mcp_vision.server
启动 Web 配置界面,支持热加载配置:
mcp-vision-config
或指定端口:
mcp-vision-config --host 127.0.0.1 --port 8080
访问 http://127.0.0.1:7860 即可打开配置界面。
功能特性:
分析图像内容并返回详细描述。
# 基础用法
analyze_image(
image="C:/path/to/image.png",
prompt="详细描述这张图片"
)
# OCR 文字提取
analyze_image(
image="C:/docs/scan.png",
prompt="提取图片中的所有文字"
)
# 代码识别
analyze_image(
image="C:/code/snippet.png",
prompt="识别并转录图片中的代码,保持格式"
)
基于图像进行两轮问答。
# 第一轮对话
result1 = chat_vision(
image="C:/chart.png",
question="这个图表显示什么数据?"
)
session_id = result1["session_id"]
# remaining_turns = 1, can_continue = True
# 第二轮对话(追问细节,对话结束后无法继续)
if result1["remaining_turns"] > 0:
result2 = chat_vision(
image="C:/chart.png",
question="数据有什么趋势?",
session_id=session_id
)
# remaining_turns = 0, can_continue = False
# 开始新对话
result3 = chat_vision(
image="C:/another.png",
question="描述这张图",
is_new_conversation=True
)
获取服务器运行状态。
status = get_status()
# 返回: 服务器名称、模型信息、会话状态等
支持两种图像输入格式:
image="C:/Users/name/Pictures/screenshot.png"
image="/home/user/images/photo.jpg"
# 纯 Base64
image="iVBORw0KGgoAAAANSUhEUgAA..."
# Data URL 格式
image="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAA..."
| 变量名 | 说明 | 默认值 |
|---|---|---|
VISION_API_KEY |
API 密钥 | - |
VISION_BASE_URL |
API 基础 URL | - |
VISION_MODEL |
模型名称 | glm-4v |
VISION_MAX_IMAGE_SIZE |
最大图像大小(字节) | 20971520 (20MB) |
VISION_TIMEOUT |
请求超时(秒) | 120 |
VISION_TEMPERATURE |
温度参数 | 0.7 |
VISION_MAX_TOKENS |
最大输出 tokens | 4096 |
VISION_LOG_LEVEL |
日志级别 | INFO |
VISION_MAX_HISTORY |
对话历史最大保存数 | 50 |
VISION_ENABLE_PERSISTENCE |
启用持久化 | true |
VISION_HISTORY_PATH |
历史文件路径 | ~/.mcp-vision/history.json |
mcp-vision-server/
├── src/mcp_vision/
│ ├── __init__.py # 包初始化
│ ├── server.py # MCP 服务器主文件
│ ├── config.py # 配置管理
│ ├── vision_client.py # 视觉 API 客户端
│ ├── image_processor.py # 图像处理
│ ├── chat_manager.py # 对话管理器
│ ├── web_config.py # Web 配置工具
│ └── utils.py # 工具函数
├── tests/
├── .env.example
├── pyproject.toml
└── README.md
编辑 Claude Code 配置文件,添加 MCP 服务器:
{
"mcpServers": {
"vision": {
"command": "mcp-vision-server",
"env": {
"VISION_API_KEY": "your-api-key",
"VISION_BASE_URL": "https://open.bigmodel.cn/api/paas/v4/",
"VISION_MODEL": "glm-4v"
}
}
}
}
MIT License
Run in your terminal:
claude mcp add mcp-vision-server -- npx Yes, Vision Server MCP is free — one-click install via Unyly at no cost.
No, Vision Server runs without API keys or environment variables.
Self-hosted: the server runs locally on your machine via the install command above.
Open Vision Server on unyly.org, pick your client tab (Claude Desktop, Claude Code, Cursor) and press Install — the config is generated automatically, no JSON editing.
Transcripts, channel stats, search
by YouTubeAI image generation using various models.
by modelcontextprotocolUnified GPU inference API with 30 AI services (LLM, image gen, video, TTS, whisper, embeddings, reranking, OCR) as MCP tools. Pay-per-use via x402 USDC or API k
by gpu-bridgeA powerful image generation tool using Google's Imagen 3.0 API through MCP. Generate high-quality images from text prompts with advanced photography, artistic,
by hamflxNot sure what to pick?
Find your stack in 60 seconds
Author?
Embed badge for your README
Browse similar
All media MCPs