Skill+CLI

本页当前适用于 @xcrawl/cli 版本 0.2.5。

XCrawl Skill+CLI 用于在终端中直接执行 scrape、search、map、crawl、账号查询和本地配置等操作。

除了给开发者直接在终端中使用，它也适合被 AI Agent 作为可调用工具层接入到自动化工作流中。

本页内容基于当前本地 CLI 源码、包版本信息和命令帮助输出整理。

安装

Node.js 要求：

>=18

使用 npx 直接运行：

npx -y @xcrawl/cli@0.2.5 doctor

全局安装：

npm install -g @xcrawl/cli
xcrawl --help

认证

将 API Key 保存到本地

xcrawl login --api-key <your_api_key>

CLI 会将 API Key 保存到：

~/.xcrawl/config.json

使用环境变量

export XCRAWL_API_KEY=<your_api_key>

CLI 的运行时配置优先级如下：

CLI 参数
环境变量
本地配置文件 ~/.xcrawl/config.json
内置默认值

快捷行为

xcrawl https://example.com 会被当作 xcrawl scrape https://example.com
xcrawl crawl https://example.com 会被当作 xcrawl crawl start https://example.com
xcrawl -V 或 xcrawl --version 用于输出当前 CLI 版本
xcrawl help <command> 用于查看指定命令帮助

默认值

项目	默认值
API Base URL	`https://run.xcrawl.com`
默认输出格式	`markdown`
批量抓取默认输出目录	`.xcrawl`
请求超时	`30000` ms
Debug	`false`

环境变量

变量	说明
`XCRAWL_API_KEY`	API Key
`XCRAWL_API_BASE_URL`	覆盖 API Base URL
`XCRAWL_DEFAULT_FORMAT`	默认抓取格式
`XCRAWL_OUTPUT_DIR`	默认批量输出目录
`XCRAWL_TIMEOUT_MS`	请求超时毫秒数
`XCRAWL_DEBUG`	Debug 模式；支持 `1`、`true`、`yes`、`on`、`0`、`false`、`no`、`off`

命令说明

将 XCrawl API Key 保存到本地配置。

xcrawl login --api-key <your_api_key>

参数：

参数	说明
`--api-key <key>`	要保存的 XCrawl API Key
`--json`	以机器可读 JSON 输出

`logout`

清除本地保存的 API Key。

xcrawl logout

参数：

参数	说明
`--json`	以机器可读 JSON 输出

`status`

查看账号信息和积分套餐状态。

xcrawl status

参数：

参数	说明
`--api-key <key>`	覆盖 API Key
`--timeout <ms>`	请求超时毫秒数
`--debug`	开启调试输出
`--json`	以机器可读 JSON 输出
`--output <path>`	将输出保存到文件

0.2.5 当前行为：

status 固定调用 https://api.xcrawl.com/web_v1/user/credit-user-info
鉴权通过查询参数 app_key=<your_api_key> 传递
该命令刻意不提供 --api-base-url

`doctor`

执行本地诊断和连通性检查。

xcrawl doctor

参数：

参数	说明
`--api-key <key>`	覆盖 API Key
`--api-base-url <url>`	覆盖 API Base URL
`--timeout <ms>`	请求超时毫秒数
`--debug`	开启调试输出
`--json`	以机器可读 JSON 输出
`--output <path>`	将输出保存到文件

检查项包括：

Node.js 版本
~/.xcrawl/config.json 的读写权限
在存在 API Key 时检查 API 连通性

0.2.5 当前行为：

当 Base URL 为 https://run.xcrawl.com 时，如果账户状态端点返回 404，CLI 会把它视为公共 API 可达

`scrape`

抓取一个或多个 URL。

xcrawl scrape [url...] [options]

位置参数：

参数	说明
`[url...]`	一个或多个 `http` / `https` URL

命令参数：

参数	说明
`--api-key <key>`	覆盖 API Key
`--api-base-url <url>`	覆盖 API Base URL
`--timeout <ms>`	请求超时毫秒数
`--debug`	开启调试输出
`--json`	以机器可读 JSON 输出
`--output <path>`	保存输出；多 URL 时会被视为目录
`--format <format>`	帮助信息中列出的格式为 `markdown`、`json`、`html`、`screenshot`
`--wait-for <selector>`	CLI 接受的 wait-for 参数
`--headers <k:v,k2:v2>`	额外请求头
`--cookies <cookies>`	Cookie 字符串，例如 `a=1; b=2`
`--proxy <proxy>`	代理值
`--input <path>`	从换行分隔文件读取 URL
`--concurrency <n>`	并发抓取数；批量模式默认 `3`

示例：

xcrawl scrape https://example.com --format markdown
xcrawl https://example.com
xcrawl scrape --input ./urls.txt --concurrency 3 --json
xcrawl scrape https://example.com --headers "Accept-Language:en-US,X-Test:1"

批量行为：

必须至少提供一个 URL，来源可以是位置参数或 --input
输入文件需要是按行分隔的 URL 列表；空行和以 # 开头的行会被忽略
多 URL 且带 --json、未设置 --output 时，会直接向标准输出打印 JSON 数组
多 URL 且未使用 --json 时，会为每个 URL 写一个文件
多 URL 且未设置 --output 时，文件默认写入 .xcrawl/

0.2.5 当前行为：

实现层其实还接受 text 作为 --format 值，但帮助信息没有列出它
text 会回退为偏文本化的 markdown 内容输出
--wait-for 虽然被命令层接收，但当前并不会被真正传入 XCrawl scrape 请求
--proxy 在当前实现里会被转发到 proxy.location；相比字面意义上的代理 URL，更适合传入类似 US 这样的地区值

`search`

执行网页搜索。

xcrawl search <query...> [options]

位置参数：

参数	说明
`<query...>`	搜索词；多个位置参数会自动用空格拼接

命令参数：

参数	说明
`--api-key <key>`	覆盖 API Key
`--api-base-url <url>`	覆盖 API Base URL
`--timeout <ms>`	请求超时毫秒数
`--debug`	开启调试输出
`--json`	以机器可读 JSON 输出
`--output <path>`	将输出保存到文件
`--limit <n>`	结果数量；默认 `10`
`--country <country>`	国家代码，例如 `US`
`--language <language>`	语言代码，例如 `en`

示例：

xcrawl search "xcrawl cli" --limit 10
xcrawl search "site:docs.xcrawl.com CLI" --country US --language en

0.2.5 当前行为：

--country 会被转发到 XCrawl Search API 的 location 字段

`map`

为站点生成链接列表。

xcrawl map <url> [options]

位置参数：

参数	说明
`<url>`	目标 `http` / `https` URL

命令参数：

参数	说明
`--api-key <key>`	覆盖 API Key
`--api-base-url <url>`	覆盖 API Base URL
`--timeout <ms>`	请求超时毫秒数
`--debug`	开启调试输出
`--json`	以机器可读 JSON 输出
`--output <path>`	将输出保存到文件
`--max-depth <n>`	最大遍历深度
`--limit <n>`	最大链接数量

示例：

xcrawl map https://example.com --limit 100

0.2.5 当前行为：

CLI 暴露了 --max-depth，但当前请求体实际上只转发了 url 和 limit

`crawl`

管理 Crawl 任务。

根命令：

xcrawl crawl [command]

可用子命令：

xcrawl crawl start <url>
xcrawl crawl status <job-id>

`crawl start`

启动一个 Crawl 任务。

xcrawl crawl start <url> [options]
xcrawl crawl <url> [options]

位置参数：

参数	说明
`<url>`	目标 `http` / `https` URL

命令参数：

参数	说明
`--wait`	轮询直到任务进入 `completed` 或 `failed`
`--interval <ms>`	轮询间隔；默认 `2000`
`--wait-timeout <ms>`	轮询超时；默认 `60000`
`--max-pages <n>`	最大抓取页面数
`--api-key <key>`	覆盖 API Key
`--api-base-url <url>`	覆盖 API Base URL
`--timeout <ms>`	请求超时毫秒数
`--debug`	开启调试输出
`--json`	以机器可读 JSON 输出
`--output <path>`	将输出保存到文件

示例：

xcrawl crawl https://example.com --wait --interval 2000 --wait-timeout 60000

`crawl status`

根据任务 ID 查询 Crawl 状态。

xcrawl crawl status <job-id> [options]

位置参数：

参数	说明
`<job-id>`	Crawl 任务 ID

命令参数：

参数	说明
`--api-key <key>`	覆盖 API Key
`--api-base-url <url>`	覆盖 API Base URL
`--timeout <ms>`	请求超时毫秒数
`--debug`	开启调试输出
`--json`	以机器可读 JSON 输出
`--output <path>`	将输出保存到文件

0.2.5 当前行为：

Crawl 状态中的 queued 和 running 会被归一化成 pending 和 crawling
completedPages 是用返回的页面数组长度推导出来的
failedPages 当前固定为 0

`config`

读取和更新本地 CLI 配置。

xcrawl config [command]

可用子命令：

xcrawl config get <key>
xcrawl config set <key> <value>
xcrawl config keys

支持的配置键：

键名	类型	说明
`api-key`	string	已保存的 API Key
`api-base-url`	string	API 命令默认 Base URL
`default-format`	string	可选值：`markdown`、`json`、`html`、`screenshot`、`text`
`output-dir`	string	默认批量抓取目录
`timeout-ms`	integer	必须大于 `0`
`debug`	boolean	使用 `config set` 时支持 `true` / `false` 或 `1` / `0`

`config get`

xcrawl config get <key> [options]

参数：

参数	说明
`--json`	以机器可读 JSON 输出

`config set`

xcrawl config set <key> <value> [options]

参数：

参数	说明
`--json`	以机器可读 JSON 输出

`config keys`

xcrawl config keys [options]

参数：

参数	说明
`--json`	以机器可读 JSON 输出

`init`

项目初始化占位命令。

xcrawl init

0.2.5 当前行为：

该命令已经存在，但目前只会输出 init 计划在后续阶段实现

输出处理

默认输出为人类可读文本
--json 用于返回机器可读 JSON
--output <path> 会将结果写入文件，并输出 Saved output: <path>
批量 scrape 时，输出文件名由清洗后的 URL 生成

校验规则

URL 必须以 http:// 或 https:// 开头
--timeout、--limit、--concurrency、--interval、--wait-timeout、--max-pages 这类正整数参数必须大于 0
请求头字符串必须采用 Key:Value,Key2:Value2 这种格式

SERP

引擎列表

Skill+CLI

安装

认证

将 API Key 保存到本地

使用环境变量

快捷行为

默认值

环境变量

命令说明

`logout`

`status`

`doctor`

`scrape`

`search`

`map`

`crawl`

`crawl start`

`crawl status`

`config`

`config get`

`config set`

`config keys`

`init`

输出处理

校验规则

相关链接