Crawl
Crawl fetches site pages in bulk based on rules. XCrawl discovers links within a site from the given URL and crawls them.
- Build a knowledge base for a site
- Targeted data collection
- Generate a site map
For details, see Crawl API Reference
Crawl a site with XCrawl
/crawl endpoint
Usage
curl -s -X POST 'https://run.xcrawl.com/v1/crawl' -H 'Authorization: Bearer $XCRAWL_API_KEY' -H 'Content-Type: application/json' -d '{
"url": "https://docs.xcrawl.com/doc/",
"crawler": {
"limit": 1,
"max_depth": 1
},
"output": {
"formats": ["markdown"]
}
}'Response example
{
"crawl_id": "01KKE8BNNVQH9PCYEEKJGXKE07",
"endpoint": "crawl",
"version": "dca0d4b3bff035e4",
"status": "crawling"
}Crawl controls
Use the crawler field to scope what gets crawled:
include: only URLs matching patterns (regex supported)exclude: exclude URLs matching patterns (regex supported)max_depth: max crawl depthlimit: max number of pagesinclude_entire_domain: crawl full site instead of only subpaths of the start URLinclude_subdomains: include subdomainsinclude_external_links: include external linkssitemaps: whether to use the site's sitemap
