快速开始
XCrawl 可以快速将搜索结果和网站内容转换为 LLM 友好的格式
欢迎使用 XCrawl
XCrawl 是一项 API 服务,可以接收 URL 和关键词进行抓取与搜索,并将结果以包括 Markdown 在内的多种 LLM 友好格式进行交付。我们的 API 支持单页面抓取、整站抓取,以及基于关键词发现相关页面。
- Scrape:抓取单个 URL,并输出 Markdown / HTML / Links / Summary / Screenshot / JSON(结构化抽取)
- Crawl:按规则批量抓取站点内容(异步任务)
- Map:快速获取站点内 URL 列表
- Search:基于关键词与地区/语言参数获取搜索结果(支持高级 Google SERP 参数)
使用方法
你可以先在个人中心的 Playground 中快速体验,再结合文档接入到正式流程。
建议先阅读:
API Key
要使用 XCrawl API,你首先需要一个 XCrawl API Key,完成注册后在仪表盘获取。
功能介绍
- Scrape: 抓取一个 URL,并以 LLM 友好的格式获取其内容(Markdown、HTML、截图、链接列表、摘要、提取后的 JSON)
- Crawl: 抓取网站内的所有 URL,并以 LLM 友好的格式获取其内容
- Map: 获取网站的 URL 清单,以便快速了解网站结构
- Search: 根据关键词获取特定地区和语言的搜索结果
抓取
通过 Scrape 接口,你可以抓取单个 URL 的内容,并按需返回指定格式的结果。
curl -s -X POST 'https://run.xcrawl.com/v1/scrape' \
-H 'Authorization: Bearer $XCRAWL_API_KEY'\
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com",
"output": {
"formats": ["markdown"]
}
}'响应示例:
{
"scrape_id": "01KKE88ETDN4RE9J7EPC5HR89B",
"endpoint": "scrape",
"version": "dca0d4b3bff035e4",
"status": "completed",
"url": "https://example.com",
"data": {
"markdown": "# Example Domain\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n[Learn more](https://iana.org/domains/example)",
"metadata": {
"content_type": "text/html",
"final_url": "https://example.com/",
"status_code": 200,
"title": "Example Domain"
},
"traffic_bytes": 1410,
"credits_used": 1
},
"started_at": "2026-03-11T10:49:39Z",
"ended_at": "2026-03-11T10:49:44Z",
"total_credits_used": 1
}批量爬取
Crawl 接口提供了自动发现和抓取网站内所有 URL 的功能。你可以通过传入入口 URL 和抓取规则,获取网站内所有页面内容。
curl -s -X POST 'https://run.xcrawl.com/v1/crawl' \
-H 'Authorization: Bearer $XCRAWL_API_KEY'\
-H 'Content-Type: application/json' \
-d '{
"url": "https://docs.xcrawl.com/doc/",
"crawler": {
"limit": 1,
"max_depth": 1
},
"output": {
"formats": ["markdown"]
}
}'响应示例:
{
"crawl_id": "01KKE8BNNVQH9PCYEEKJGXKE07",
"endpoint": "crawl",
"version": "dca0d4b3bff035e4",
"status": "crawling"
}获取爬取进度和结果
当你发起一个 crawl 请求后,会得到一个 crawl_id,你可以通过这个 ID 查询爬取的状态和结果。
curl -s -X GET 'https://run.xcrawl.com/v1/crawl/01KKE8BNNVQH9PCYEEKJGXKE07' \
-H 'Authorization: Bearer $XCRAWL_API_KEY'响应示例:
{
"crawl_id": "01KKE8BNNVQH9PCYEEKJGXKE07",
"endpoint": "crawl",
"version": "dca0d4b3bff035e4",
"status": "completed",
"completed": 1,
"total": 1,
"url": "https://docs.xcrawl.com/doc/",
"data": [
{
"markdown": "[ Skip to content ](https://docs.xcrawl.com/doc/developer-guides/proxies/#VPContent)\n# Proxy Setup\nXCrawl supports proxy configuration to choose an exit region or reuse sticky sessions.\n...",
"metadata": {
"statusCode": 200,
"title": "Proxy Setup",
"url": "https://docs.xcrawl.com/doc/developer-guides/proxies/"
},
"traffic_bytes": 1129,
"credits_used": 1
}
],
"started_at": "2026-03-11T10:51:24Z",
"ended_at": "2026-03-11T10:51:37Z",
"total_credits_used": 1
}搜索
Search 接口允许你根据关键词获取特定地区和语言的搜索结果,并支持高级 Google SERP 参数。
curl -s -X POST 'https://run.xcrawl.com/v1/search' \
-H 'Authorization: Bearer $XCRAWL_API_KEY'\
-H 'Content-Type: application/json' \
-d '{
"query": "site:docs.xcrawl.com XCrawl API",
"location": "US",
"language": "en",
"limit": 2
}'响应示例:
{
"search_id": "01KKE8BNMEKRHJB9GEWXPYQ8E1",
"endpoint": "search",
"version": "dca0d4b3bff035e4",
"status": "completed",
"query": "site:docs.xcrawl.com XCrawl API",
"data": {
"credits_used": 2,
"data": [
{
"position": 1,
"title": null,
"url": "https://docs.xcrawl.com/"
},
{
"position": 2,
"title": null,
"url": "https://docs.xcrawl.com/doc/developer-guides/authentication/"
}
],
"status": "success"
},
"started_at": "",
"ended_at": "",
"total_credits_used": 2
}AI 抓取
XCrawl 的 Scrape 接口允许你在不提供 JSON Schema 的情况下直接从网页中提取 JSON,只需要在请求中用自然语言描述你的需求即可。
curl -s -X POST 'https://run.xcrawl.com/v1/scrape' \
-H 'Authorization: Bearer $XCRAWL_API_KEY'\
-H 'Content-Type: application/json' \
-d '{
"url": "https://example.com",
"output": {
"formats": ["json"]
},
"json": {
"prompt": "Extract the page title and main CTA link."
}
}'响应示例:
{
"scrape_id": "01KKE8JTFD20YP1KXPVJHGCS28",
"endpoint": "scrape",
"version": "dca0d4b3bff035e4",
"status": "completed",
"url": "https://example.com",
"data": {
"json": {
"description": "This domain is for use in documentation examples without needing permission. Avoid use in operations.",
"domain": {
"name": "example",
"permissions_required": false,
"purpose": "documentation examples"
},
"links": [
{
"text": "Learn more",
"url": "https://iana.org/domains/example"
}
],
"title": "Example Domain"
},
"credits_used": 5,
"credits_detail": {
"base_cost": 1,
"traffic_cost": 0,
"json_extract_cost": 4
}
},
"started_at": "2026-03-11T10:55:19Z",
"ended_at": "2026-03-11T10:55:26Z",
"total_credits_used": 5
}