快速开始

XCrawl 可以快速将搜索结果和网站内容转换为 LLM 友好的格式

欢迎使用 XCrawl

XCrawl 是一项 API 服务，可以接收 URL 和关键词进行抓取与搜索，并将结果以包括 Markdown 在内的多种 LLM 友好格式进行交付。我们的 API 支持单页面抓取、整站抓取，以及基于关键词发现相关页面。

Scrape：抓取单个 URL，并输出 Markdown / HTML / Links / Summary / Screenshot / JSON（结构化抽取）
Crawl：按规则批量抓取站点内容（异步任务）
Map：快速获取站点内 URL 列表
Search：基于关键词与地区/语言参数获取搜索结果（支持高级 Google SERP 参数）

使用方法

你可以先在个人中心的 Playground 中快速体验，再结合文档接入到正式流程。

建议先阅读：

API Key

要使用 XCrawl API，你首先需要一个 XCrawl API Key，完成注册后在仪表盘获取。

功能介绍

Scrape: 抓取一个 URL，并以 LLM 友好的格式获取其内容（Markdown、HTML、截图、链接列表、摘要、提取后的 JSON）
Crawl: 抓取网站内的所有 URL，并以 LLM 友好的格式获取其内容
Map: 获取网站的 URL 清单，以便快速了解网站结构
Search: 根据关键词获取特定地区和语言的搜索结果

抓取

通过 Scrape 接口，你可以抓取单个 URL 的内容，并按需返回指定格式的结果。

curl -s -X POST 'https://run.xcrawl.com/v1/scrape' \
  -H 'Authorization: Bearer $XCRAWL_API_KEY'\
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com",
    "output": {
      "formats": ["markdown"]
    }
  }'

响应示例：

{
  "scrape_id": "01KKE88ETDN4RE9J7EPC5HR89B",
  "endpoint": "scrape",
  "version": "dca0d4b3bff035e4",
  "status": "completed",
  "url": "https://example.com",
  "data": {
    "markdown": "# Example Domain\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n[Learn more](https://iana.org/domains/example)",
    "metadata": {
      "content_type": "text/html",
      "final_url": "https://example.com/",
      "status_code": 200,
      "title": "Example Domain"
    },
    "traffic_bytes": 1410,
    "credits_used": 1
  },
  "started_at": "2026-03-11T10:49:39Z",
  "ended_at": "2026-03-11T10:49:44Z",
  "total_credits_used": 1
}

批量爬取

Crawl 接口提供了自动发现和抓取网站内所有 URL 的功能。你可以通过传入入口 URL 和抓取规则，获取网站内所有页面内容。

curl -s -X POST 'https://run.xcrawl.com/v1/crawl' \
  -H 'Authorization: Bearer $XCRAWL_API_KEY'\
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://docs.xcrawl.com/doc/",
    "crawler": {
      "limit": 1,
      "max_depth": 1
    },
    "output": {
      "formats": ["markdown"]
    }
  }'

响应示例：

{
  "crawl_id": "01KKE8BNNVQH9PCYEEKJGXKE07",
  "endpoint": "crawl",
  "version": "dca0d4b3bff035e4",
  "status": "crawling"
}

获取爬取进度和结果

当你发起一个 crawl 请求后，会得到一个 crawl_id，你可以通过这个 ID 查询爬取的状态和结果。

curl -s -X GET 'https://run.xcrawl.com/v1/crawl/01KKE8BNNVQH9PCYEEKJGXKE07' \
  -H 'Authorization: Bearer $XCRAWL_API_KEY'

响应示例：

{
  "crawl_id": "01KKE8BNNVQH9PCYEEKJGXKE07",
  "endpoint": "crawl",
  "version": "dca0d4b3bff035e4",
  "status": "completed",
  "completed": 1,
  "total": 1,
  "url": "https://docs.xcrawl.com/doc/",
  "data": [
    {
      "markdown": "[ Skip to content ](https://docs.xcrawl.com/doc/developer-guides/proxies/#VPContent)\n# Proxy Setup\nXCrawl supports proxy configuration to choose an exit region or reuse sticky sessions.\n...",
      "metadata": {
        "statusCode": 200,
        "title": "Proxy Setup",
        "url": "https://docs.xcrawl.com/doc/developer-guides/proxies/"
      },
      "traffic_bytes": 1129,
      "credits_used": 1
    }
  ],
  "started_at": "2026-03-11T10:51:24Z",
  "ended_at": "2026-03-11T10:51:37Z",
  "total_credits_used": 1
}

搜索

Search 接口允许你根据关键词获取特定地区和语言的搜索结果，并支持高级 Google SERP 参数。

curl -s -X POST 'https://run.xcrawl.com/v1/search' \
  -H 'Authorization: Bearer $XCRAWL_API_KEY'\
  -H 'Content-Type: application/json' \
  -d '{
    "query": "site:docs.xcrawl.com XCrawl API",
    "location": "US",
    "language": "en",
    "limit": 2
  }'

响应示例：

{
  "search_id": "01KKE8BNMEKRHJB9GEWXPYQ8E1",
  "endpoint": "search",
  "version": "dca0d4b3bff035e4",
  "status": "completed",
  "query": "site:docs.xcrawl.com XCrawl API",
  "data": {
    "credits_used": 2,
    "data": [
      {
        "position": 1,
        "title": null,
        "url": "https://docs.xcrawl.com/"
      },
      {
        "position": 2,
        "title": null,
        "url": "https://docs.xcrawl.com/doc/developer-guides/authentication/"
      }
    ],
    "status": "success"
  },
  "started_at": "",
  "ended_at": "",
  "total_credits_used": 2
}

AI 抓取

XCrawl 的 Scrape 接口允许你在不提供 JSON Schema 的情况下直接从网页中提取 JSON，只需要在请求中用自然语言描述你的需求即可。

curl -s -X POST 'https://run.xcrawl.com/v1/scrape' \
  -H 'Authorization: Bearer $XCRAWL_API_KEY'\
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com",
    "output": {
      "formats": ["json"]
    },
    "json": {
      "prompt": "Extract the page title and main CTA link."
    }
  }'

响应示例：

{
  "scrape_id": "01KKE8JTFD20YP1KXPVJHGCS28",
  "endpoint": "scrape",
  "version": "dca0d4b3bff035e4",
  "status": "completed",
  "url": "https://example.com",
  "data": {
    "json": {
      "description": "This domain is for use in documentation examples without needing permission. Avoid use in operations.",
      "domain": {
        "name": "example",
        "permissions_required": false,
        "purpose": "documentation examples"
      },
      "links": [
        {
          "text": "Learn more",
          "url": "https://iana.org/domains/example"
        }
      ],
      "title": "Example Domain"
    },
    "credits_used": 5,
    "credits_detail": {
      "base_cost": 1,
      "traffic_cost": 0,
      "json_extract_cost": 4
    }
  },
  "started_at": "2026-03-11T10:55:19Z",
  "ended_at": "2026-03-11T10:55:26Z",
  "total_credits_used": 5
}