Crawler Options
The crawl screen contains an expanding set of crawl options, revealed by clicking the (cog icon) near the crawl controls. Some of these options were global in previous versions, as of version 2.4, they are independently controlled per project.
Crawl Delay. Delay in seconds before resuming crawl after processing a page. Lower allows for a faster crawl, at the expense of placing more stress on the server. Default is 0.
Network Timeout. Network response in seconds before giving up on request. Increasing the timeout will allow for capturing slow responses, but can significantly slow the crawler. Default is 15.
JavaScript Rendering. Beta (Windows-only). Load pages with JavaScript, providing a comprehensive index of dynamic content at the expense of slower, more resource-hungry crawling. Webpages external to the project URL will continue to use the classic crawler. This option is necessary for websites using JavaScript frameworks that don't provide server-side rendering. Default is off.
Download Assets Without Content-Length. Whether to GET media BLOBs (video, audio, and other non-image binaries) without a content-length header supplied. True will allow for accurate resource size data, but can significantly slow the crawler. Default is off.
HTTP Request Headers. Override options for a handful of authentication and identity HTTP headers. Authentication headers will only apply to requests within the project domain and changing the User-Agent header will not overrule robots.txt, as it applies to InterroBot.
Directory/Path Exclusions. You can treat certain URLs as robots-excluded paths, that is, they will not be downloaded. Useful in avoiding known time-sink directories. Limit is 10 additional rules.
InterroBot is a web crawler and developer tool for Windows, macOS, and Android.
Want to learn more? Check out our help section or
download the latest build.