Crawl

Crawl fetches site pages in bulk based on rules. XCrawl discovers links within a site from the given URL and crawls them.

Build a knowledge base for a site
Targeted data collection
Generate a site map

For details, see Crawl API Reference

Crawl a site with XCrawl

/crawl endpoint

Usage

curl -s -X POST 'https://run.xcrawl.com/v1/crawl'   -H 'Authorization: Bearer $XCRAWL_API_KEY'  -H 'Content-Type: application/json'   -d '{
    "url": "https://docs.xcrawl.com/doc/",
    "crawler": {
      "limit": 1,
      "max_depth": 1
    },
    "output": {
      "formats": ["markdown"]
    }
  }'

Response example

{
  "crawl_id": "01KKE8BNNVQH9PCYEEKJGXKE07",
  "endpoint": "crawl",
  "version": "dca0d4b3bff035e4",
  "status": "crawling"
}

Crawl controls

Use the crawler field to scope what gets crawled:

include: only URLs matching patterns (regex supported)
exclude: exclude URLs matching patterns (regex supported)
max_depth: max crawl depth
limit: max number of pages
include_entire_domain: crawl full site instead of only subpaths of the start URL
include_subdomains: include subdomains
include_external_links: include external links
sitemaps: whether to use the site's sitemap