loading…
Search for a command to run...
loading…
Web scraping MCP server for AI agents. 6 tools: clean content extraction, structured scraping with CSS selectors, full-page screenshots via Playwright, link ext
Web scraping MCP server for AI agents. 6 tools: clean content extraction, structured scraping with CSS selectors, full-page screenshots via Playwright, link extraction, metadata extraction (OG/Twitter cards), and Google search. Free tier with x402 micropayments.
The #1 most requested utility for AI agents — professional web scraping, screenshots, and content extraction via MCP + REST API.
🌐 Clean Content Extraction — Extract readable text/markdown from any webpage (like Readability)
🎯 Structured Scraping — Extract specific data using CSS selectors
📸 Screenshots — Capture full-page or viewport screenshots with Playwright
🔗 Link Extraction — Get all links from a page with optional regex filtering
📋 Metadata Extraction — Extract title, description, Open Graph tags, favicon, etc
🔍 Google Search — Search Google and get results programmatically
Add to your MCP settings file (cline_mcp_settings.json or similar):
{
"mcpServers": {
"agent-scraper": {
"url": "https://agent-scraper-mcp.onrender.com/mcp"
}
}
}
Base URL: https://agent-scraper-mcp.onrender.com
curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/scrape_url \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/article",
"format": "markdown"
}'
Response:
{
"success": true,
"url": "https://example.com/article",
"title": "Article Title",
"content": "# Article Title\n\nClean markdown content...",
"format": "markdown"
}
curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/scrape_structured \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product",
"selectors": {
"title": "h1.product-title",
"price": ".price",
"reviews": ".review-text"
}
}'
Response:
{
"success": true,
"url": "https://example.com/product",
"data": {
"title": "Product Name",
"price": "$29.99",
"reviews": ["Great product!", "Worth the money"]
}
}
curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/screenshot_url \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"width": 1280,
"height": 720,
"full_page": false
}'
Response:
{
"success": true,
"url": "https://example.com",
"image": "iVBORw0KGgoAAAANSUhEUgAA...",
"width": 1280,
"height": 720,
"full_page": false
}
curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/extract_links \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"filter": "https://example.com/blog/.*"
}'
curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/extract_meta \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com"}'
curl -X POST https://agent-scraper-mcp.onrender.com/api/v1/search_google \
-H "Content-Type: application/json" \
-d '{
"query": "python web scraping",
"num_results": 10
}'
After free tier exhausted:
Payment via HTTP 402 with crypto wallet:
0x8E844a7De89d7CfBFe9B4453E65935A22F146aBBX-Payment header with payment proofscrape_urlExtract clean, readable content from any webpage (like Readability).
Parameters:
url (string, required): URL to scrapeformat (string, optional): Output format — text, markdown, or html (default: markdown)Returns: {success, url, title, content, format}
scrape_structuredExtract specific data using CSS selectors.
Parameters:
url (string, required): URL to scrapeselectors (object, required): Dict of name → CSS selectorReturns: {success, url, data}
Example selectors:
{
"title": "h1.post-title",
"author": ".author-name",
"price": "span.price",
"images": "img.product-image"
}
screenshot_urlCapture a screenshot of any webpage.
Parameters:
url (string, required): URL to screenshotwidth (int, optional): Viewport width (default: 1280)height (int, optional): Viewport height (default: 720)full_page (bool, optional): Capture full scrollable page (default: false)Returns: {success, url, image, width, height, full_page}
Image is base64-encoded PNG.
extract_linksExtract all links from a webpage.
Parameters:
url (string, required): URL to scrapefilter (string, optional): Regex pattern to filter URLsReturns: {success, url, links, count}
Links array contains {text, href} objects.
extract_metaExtract metadata from a webpage.
Parameters:
url (string, required): URL to scrapeReturns: {success, url, meta}
Meta object includes:
title: Page titledescription: Meta descriptioncanonical: Canonical URLfavicon: Favicon URLog: Open Graph tagstwitter: Twitter Card tagssearch_googleSearch Google and get results.
Parameters:
query (string, required): Search querynum_results (int, optional): Number of results (default: 10)Returns: {success, query, results, count}
Results array contains {title, url, snippet} objects.
# Clone repo
git clone https://github.com/aparajithn/agent-scraper-mcp.git
cd agent-scraper-mcp
# Install dependencies
pip install -e ".[dev]"
# Install Playwright browsers
playwright install chromium --with-deps
# Run server
uvicorn src.main:app --reload --port 8080
# Initialize
curl -X POST http://localhost:8080/mcp -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"1.0.0"}}}'
# List tools
curl -X POST http://localhost:8080/mcp -d '{"jsonrpc":"2.0","id":2,"method":"tools/list","params":{}}'
# Call scrape_url
curl -X POST http://localhost:8080/mcp -d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"scrape_url","arguments":{"url":"https://example.com","format":"markdown"}}}'
# Build
docker build -t agent-scraper-mcp .
# Run
docker run -p 8080:8080 -e PUBLIC_HOST=localhost agent-scraper-mcp
Deployed on Render (free tier):
agent-scraper-mcpaparajithn/agent-scraper-mcpEnvironment variables:
PUBLIC_HOST: agent-scraper-mcp.onrender.comX402_WALLET_ADDRESS: 0x8E844a7De89d7CfBFe9B4453E65935A22F146aBBMIT License — see LICENSE for details.
Built for AI agents by AI engineers 🤖
Add this to claude_desktop_config.json and restart Claude Desktop.
{
"mcpServers": {
"aparajithn-agent-scraper-mcp": {
"command": "npx",
"args": []
}
}
}