Scrape

Scrape fetches a single URL and returns LLM-friendly formats (Markdown, summaries, structured JSON, and more).

Convert main content to Markdown for LLMs
Extract links for targeted crawling
Generate AI summaries
Use json mode for structured extraction (pricing, tables, product info, etc.)
Take screenshots to verify rendering

Use XCrawl to scrape a URL

/scrape endpoint

Usage

curl -s -X POST 'https://run.xcrawl.com/v1/scrape'   -H 'Authorization: Bearer $XCRAWL_API_KEY'  -H 'Content-Type: application/json'   -d '{
    "url": "https://example.com",
    "output": {
      "formats": ["markdown"]
    }
  }'

Response example

{
  "scrape_id": "01KKE88ETDN4RE9J7EPC5HR89B",
  "endpoint": "scrape",
  "version": "dca0d4b3bff035e4",
  "status": "completed",
  "url": "https://example.com",
  "data": {
    "markdown": "# Example Domain\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n[Learn more](https://iana.org/domains/example)",
    "metadata": {
      "content_type": "text/html",
      "final_url": "https://example.com/",
      "status_code": 200,
      "title": "Example Domain"
    },
    "traffic_bytes": 1410,
    "credits_used": 1
  },
  "started_at": "2026-03-11T10:49:39Z",
  "ended_at": "2026-03-11T10:49:44Z",
  "total_credits_used": 1
}

For parameter details, see Scrape API Reference

Output formats

Use output.formats to control what gets returned:

markdown - Extract content as Markdown
html - Return cleaned HTML
raw_html - Return raw HTML
links - Extract links from the page
summary - AI-generated summary
screenshot - Page screenshot
json - Structured data extraction with prompt/json_schema

See: Output Formats

JS rendering

JS rendering is enabled by default. You can tune viewport size or set enabled to false to speed up static pages.

curl -s -X POST 'https://run.xcrawl.com/v1/scrape'   -H 'Authorization: Bearer $XCRAWL_API_KEY'  -H 'Content-Type: application/json'   -d '{
    "url": "https://xcrawl.com",
    "js_render": {
      "enabled": true,
      "wait_until": "load",
      "viewport": {"width": 1920, "height": 1080}
    },
    "output": {
      "formats": ["markdown", "html"]
    }
  }'

See: JS Rendering

Structured data extraction

With json output format and a prompt (optionally json_schema), you can extract structured data from a page.

curl -s -X POST 'https://run.xcrawl.com/v1/scrape'   -H 'Authorization: Bearer $XCRAWL_API_KEY'  -H 'Content-Type: application/json'   -d '{
    "url": "https://example.com",
    "output": {
      "formats": ["json"]
    },
    "json": {
      "prompt": "Extract the page title and main CTA link."
    }
  }'

Response example:

{
  "scrape_id": "01KKE8JTFD20YP1KXPVJHGCS28",
  "endpoint": "scrape",
  "version": "dca0d4b3bff035e4",
  "status": "completed",
  "url": "https://example.com",
  "data": {
    "json": {
      "description": "This domain is for use in documentation examples without needing permission. Avoid use in operations.",
      "domain": {
        "name": "example",
        "permissions_required": false,
        "purpose": "documentation examples"
      },
      "links": [
        {
          "text": "Learn more",
          "url": "https://iana.org/domains/example"
        }
      ],
      "title": "Example Domain"
    },
    "credits_used": 5,
    "credits_detail": {
      "base_cost": 1,
      "traffic_cost": 0,
      "json_extract_cost": 4
    }
  },
  "started_at": "2026-03-11T10:55:19Z",
  "ended_at": "2026-03-11T10:55:26Z",
  "total_credits_used": 5
}

When you need a strict schema, provide json_schema to constrain the output.

curl -s -X POST 'https://run.xcrawl.com/v1/scrape'   -H 'Authorization: Bearer $XCRAWL_API_KEY'  -H 'Content-Type: application/json'   -d '{
    "url": "https://xcrawl.com",
    "output": {
      "formats": ["json"]
    },
    "json": {
        "prompt": "Extract product name and price from the page.",
        "json_schema": {
            "type": "object",
            "properties": {
            "product_name": {"type": "string"},
            "price": {"type": "string"}
            },
            "required": ["product_name", "price"]
        }
      }
    }'

See Scrape API Reference for json response fields and schema options.