Scrape API

Overview

POST /v1/scrape fetches a URL and supports sync and async modes:

Sync: mode=sync (default). Completes within the request and returns results.
Async: mode=async. Returns scrape_id immediately. Query via Scrape Result API, or configure webhook callbacks.

Request

Endpoint

POST https://run.xcrawl.com/v1/scrape

Headers

Content-Type: application/json
Authorization: Bearer <api_key>

Request body

Top-level fields

Field	Type	Required	Default	Description
`url`	string	Yes	-	Target URL
`mode`	string	No	`sync`	`sync` or `async`
`proxy`	object	No	-	Proxy config
`request`	object	No	-	Request config
`js_render`	object	No	-	JS rendering config
`output`	object	No	-	Output config
`webhook`	object	No	-	Async webhook config (async only)

`proxy`

Field	Type	Required	Default	Description
`location`	string	No	`US`	ISO-3166-1 alpha-2 country code, e.g. `US` / `JP` / `SG`
`sticky_session`	string	No	Auto-generated	Sticky session ID; same ID attempts to reuse exit

`request`

Field	Type	Required	Default	Description
`locale`	string	No	`en-US,en;q=0.9`	Affects `Accept-Language`
`device`	string	No	`desktop`	`desktop` / `mobile`, affects UA & viewport
`cookies`	object map	No	-	Cookie key/value pairs
`headers`	object map	No	-	Header key/value pairs
`only_main_content`	boolean	No	`true`	Return main content only
`block_ads`	boolean	No	`true`	Attempt to block ad resources
`skip_tls_verification`	boolean	No	`true`	Skip TLS verification

`js_render`

Field	Type	Required	Default	Description
`enabled`	boolean	No	`true`	Enable browser rendering
`wait_until`	string	No	`load`	`load` / `domcontentloaded` / `networkidle`
`viewport.width`	integer	No	-	Viewport width (desktop `1920`, mobile `402`)
`viewport.height`	integer	No	-	Viewport height (desktop `1080`, mobile `874`)

`output`

Field	Type	Required	Default	Description
`formats`	string[]	No	`["markdown"]`	Output formats, see enum below
`screenshot`	string	No	`viewport`	`full_page` / `viewport`; only when `formats` includes `screenshot`
`json.prompt`	string	No	-	Extraction prompt
`json.json_schema`	object	No	-	JSON Schema (https://json-schema.org/docs)

output.formats enum:

html
raw_html
markdown
links
summary
screenshot
json

`webhook`

Field	Type	Required	Default	Description
`url`	string	No	-	Callback URL
`headers`	object map	No	-	Custom callback headers
`events`	string[]	No	`["started","completed","failed"]`	Events: `started` / `completed` / `failed`

webhook.events meanings:

started: task started
completed: task completed successfully
failed: task failed

Response

Sync response (`mode=sync`)

Field	Type	Description
`scrape_id`	string	Task ID
`endpoint`	string	Always `scrape`
`version`	string	Version
`status`	string	`completed` / `failed`
`url`	string	Target URL
`data`	object	Result data
`started_at`	string	Start time (ISO 8601)
`ended_at`	string	End time (ISO 8601)
`total_credits_used`	integer	Total credits used

data fields (based on output.formats):

html: cleaned body HTML
raw_html: raw HTML
markdown: Markdown conversion
links: extracted links
metadata: page metadata (title, status_code, content_type, proxy_location, proxy_sticky_session, etc; may expand)
screenshot: screenshot URL
summary: AI summary
json: structured extraction result
traffic_bytes: traffic used
credits_used: credits used
credits_detail: credit breakdown

credits_detail fields:

Field	Type	Description
`base_cost`	integer	Base scrape cost
`traffic_cost`	integer	Traffic cost
`json_extract_cost`	integer	JSON extraction cost

Async response (`mode=async`)

Field	Type	Description
`scrape_id`	string	Task ID
`endpoint`	string	Always `scrape`
`version`	string	Version
`status`	string	Typically `pending` immediately after task creation

Examples

Sync request example

{
  "url": "https://example.com",
  "output": {
    "formats": ["markdown"]
  }
}

Sync response example

{
  "scrape_id": "01KKE88ETDN4RE9J7EPC5HR89B",
  "endpoint": "scrape",
  "version": "dca0d4b3bff035e4",
  "status": "completed",
  "url": "https://example.com",
  "data": {
    "markdown": "# Example Domain\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n[Learn more](https://iana.org/domains/example)",
    "metadata": {
      "content_type": "text/html",
      "final_url": "https://example.com/",
      "proxy_location": "US",
      "proxy_sticky_session": "sticky-da8ab47a",
      "requested_url": "https://example.com",
      "status_code": 200,
      "title": "Example Domain",
      "viewport": "width=device-width, initial-scale=1"
    },
    "traffic_bytes": 1410,
    "credits_used": 1,
    "credits_detail": {
      "base_cost": 1,
      "traffic_cost": 0,
      "json_extract_cost": 0
    }
  },
  "started_at": "2026-03-11T10:49:39Z",
  "ended_at": "2026-03-11T10:49:44Z",
  "total_credits_used": 1
}

Async request example

{
  "url": "https://docs.xcrawl.com/doc/introduction/",
  "mode": "async",
  "output": {
    "formats": ["markdown"]
  }
}

Async response example

{
  "scrape_id": "01KKE89TSDTP063R5PEX6XWSE1",
  "endpoint": "scrape",
  "version": "dca0d4b3bff035e4",
  "status": "pending"
}