Scrape API
Overview
POST /v1/scrape fetches a URL and supports sync and async modes:
- Sync:
mode=sync(default). Completes within the request and returns results. - Async:
mode=async. Returnsscrape_idimmediately. Query via Scrape Result API, or configurewebhookcallbacks.
Request
Endpoint
POST https://run.xcrawl.com/v1/scrape
Headers
Content-Type: application/jsonAuthorization: Bearer <api_key>
Request body
Top-level fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | Yes | - | Target URL |
mode | string | No | sync | sync or async |
proxy | object | No | - | Proxy config |
request | object | No | - | Request config |
js_render | object | No | - | JS rendering config |
output | object | No | - | Output config |
webhook | object | No | - | Async webhook config (async only) |
proxy
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
location | string | No | US | ISO-3166-1 alpha-2 country code, e.g. US / JP / SG |
sticky_session | string | No | Auto-generated | Sticky session ID; same ID attempts to reuse exit |
request
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
locale | string | No | en-US,en;q=0.9 | Affects Accept-Language |
device | string | No | desktop | desktop / mobile, affects UA & viewport |
cookies | object map | No | - | Cookie key/value pairs |
headers | object map | No | - | Header key/value pairs |
only_main_content | boolean | No | true | Return main content only |
block_ads | boolean | No | true | Attempt to block ad resources |
skip_tls_verification | boolean | No | true | Skip TLS verification |
js_render
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
enabled | boolean | No | true | Enable browser rendering |
wait_until | string | No | load | load / domcontentloaded / networkidle |
viewport.width | integer | No | - | Viewport width (desktop 1920, mobile 402) |
viewport.height | integer | No | - | Viewport height (desktop 1080, mobile 874) |
output
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
formats | string[] | No | ["markdown"] | Output formats, see enum below |
screenshot | string | No | viewport | full_page / viewport; only when formats includes screenshot |
json.prompt | string | No | - | Extraction prompt |
json.json_schema | object | No | - | JSON Schema (https://json-schema.org/docs) |
output.formats enum:
htmlraw_htmlmarkdownlinkssummaryscreenshotjson
webhook
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
url | string | No | - | Callback URL |
headers | object map | No | - | Custom callback headers |
events | string[] | No | ["started","completed","failed"] | Events: started / completed / failed |
webhook.events meanings:
started: task startedcompleted: task completed successfullyfailed: task failed
Response
Sync response (mode=sync)
| Field | Type | Description |
|---|---|---|
scrape_id | string | Task ID |
endpoint | string | Always scrape |
version | string | Version |
status | string | completed / failed |
url | string | Target URL |
data | object | Result data |
started_at | string | Start time (ISO 8601) |
ended_at | string | End time (ISO 8601) |
total_credits_used | integer | Total credits used |
data fields (based on output.formats):
html: cleaned body HTMLraw_html: raw HTMLmarkdown: Markdown conversionlinks: extracted linksmetadata: page metadata (title,status_code,content_type,proxy_location,proxy_sticky_session, etc; may expand)screenshot: screenshot URLsummary: AI summaryjson: structured extraction resulttraffic_bytes: traffic usedcredits_used: credits usedcredits_detail: credit breakdown
credits_detail fields:
| Field | Type | Description |
|---|---|---|
base_cost | integer | Base scrape cost |
traffic_cost | integer | Traffic cost |
json_extract_cost | integer | JSON extraction cost |
Async response (mode=async)
| Field | Type | Description |
|---|---|---|
scrape_id | string | Task ID |
endpoint | string | Always scrape |
version | string | Version |
status | string | Typically pending immediately after task creation |
Examples
Sync request example
{
"url": "https://example.com",
"output": {
"formats": ["markdown"]
}
}Sync response example
{
"scrape_id": "01KKE88ETDN4RE9J7EPC5HR89B",
"endpoint": "scrape",
"version": "dca0d4b3bff035e4",
"status": "completed",
"url": "https://example.com",
"data": {
"markdown": "# Example Domain\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n[Learn more](https://iana.org/domains/example)",
"metadata": {
"content_type": "text/html",
"final_url": "https://example.com/",
"proxy_location": "US",
"proxy_sticky_session": "sticky-da8ab47a",
"requested_url": "https://example.com",
"status_code": 200,
"title": "Example Domain",
"viewport": "width=device-width, initial-scale=1"
},
"traffic_bytes": 1410,
"credits_used": 1,
"credits_detail": {
"base_cost": 1,
"traffic_cost": 0,
"json_extract_cost": 0
}
},
"started_at": "2026-03-11T10:49:39Z",
"ended_at": "2026-03-11T10:49:44Z",
"total_credits_used": 1
}Async request example
{
"url": "https://docs.xcrawl.com/doc/introduction/",
"mode": "async",
"output": {
"formats": ["markdown"]
}
}Async response example
{
"scrape_id": "01KKE89TSDTP063R5PEX6XWSE1",
"endpoint": "scrape",
"version": "dca0d4b3bff035e4",
"status": "pending"
}