Scrape
Scrape fetches a single URL and returns LLM-friendly formats (Markdown, summaries, structured JSON, and more).
- Convert main content to Markdown for LLMs
- Extract links for targeted crawling
- Generate AI summaries
- Use
jsonmode for structured extraction (pricing, tables, product info, etc.) - Take screenshots to verify rendering
For details, see Scrape API Reference
Use XCrawl to scrape a URL
/scrape endpoint
Usage
curl -s -X POST 'https://run.xcrawl.com/v1/scrape' -H 'Authorization: Bearer $XCRAWL_API_KEY' -H 'Content-Type: application/json' -d '{
"url": "https://example.com",
"output": {
"formats": ["markdown"]
}
}'Response example
{
"scrape_id": "01KKE88ETDN4RE9J7EPC5HR89B",
"endpoint": "scrape",
"version": "dca0d4b3bff035e4",
"status": "completed",
"url": "https://example.com",
"data": {
"markdown": "# Example Domain\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n[Learn more](https://iana.org/domains/example)",
"metadata": {
"content_type": "text/html",
"final_url": "https://example.com/",
"status_code": 200,
"title": "Example Domain"
},
"traffic_bytes": 1410,
"credits_used": 1
},
"started_at": "2026-03-11T10:49:39Z",
"ended_at": "2026-03-11T10:49:44Z",
"total_credits_used": 1
}For parameter details, see Scrape API Reference
Output formats
Use output.formats to control what gets returned:
markdown- Extract content as Markdownhtml- Return cleaned HTMLraw_html- Return raw HTMLlinks- Extract links from the pagesummary- AI-generated summaryscreenshot- Page screenshotjson- Structured data extraction with prompt/json_schema
See: Output Formats
JS rendering
JS rendering is enabled by default. You can tune viewport size or set enabled to false to speed up static pages.
curl -s -X POST 'https://run.xcrawl.com/v1/scrape' -H 'Authorization: Bearer $XCRAWL_API_KEY' -H 'Content-Type: application/json' -d '{
"url": "https://xcrawl.com",
"js_render": {
"enabled": true,
"wait_until": "load",
"viewport": {"width": 1920, "height": 1080}
},
"output": {
"formats": ["markdown", "html"]
}
}'See: JS Rendering
Structured data extraction
With json output format and a prompt (optionally json_schema), you can extract structured data from a page.
curl -s -X POST 'https://run.xcrawl.com/v1/scrape' -H 'Authorization: Bearer $XCRAWL_API_KEY' -H 'Content-Type: application/json' -d '{
"url": "https://example.com",
"output": {
"formats": ["json"]
},
"json": {
"prompt": "Extract the page title and main CTA link."
}
}'Response example:
{
"scrape_id": "01KKE8JTFD20YP1KXPVJHGCS28",
"endpoint": "scrape",
"version": "dca0d4b3bff035e4",
"status": "completed",
"url": "https://example.com",
"data": {
"json": {
"description": "This domain is for use in documentation examples without needing permission. Avoid use in operations.",
"domain": {
"name": "example",
"permissions_required": false,
"purpose": "documentation examples"
},
"links": [
{
"text": "Learn more",
"url": "https://iana.org/domains/example"
}
],
"title": "Example Domain"
},
"credits_used": 5,
"credits_detail": {
"base_cost": 1,
"traffic_cost": 0,
"json_extract_cost": 4
}
},
"started_at": "2026-03-11T10:55:19Z",
"ended_at": "2026-03-11T10:55:26Z",
"total_credits_used": 5
}When you need a strict schema, provide json_schema to constrain the output.
curl -s -X POST 'https://run.xcrawl.com/v1/scrape' -H 'Authorization: Bearer $XCRAWL_API_KEY' -H 'Content-Type: application/json' -d '{
"url": "https://xcrawl.com",
"output": {
"formats": ["json"]
},
"json": {
"prompt": "Extract product name and price from the page.",
"json_schema": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price": {"type": "string"}
},
"required": ["product_name", "price"]
}
}
}'See Scrape API Reference for json response fields and schema options.
