InterroBot is a desktop application for Windows 10/11. You'll need to download and run the InterroBot installer for Windows x64. Once you've got that up and running, it's time to dive in.
Understanding the Navigation Bar
At all times within the application, there are 5 navigation buttons on the left margin. These can be inactive, as is the case with Search and Crawl on the home screen. The buttons become active when a project is loaded.
|Back button. Active when a back state exists.|
|Home button. Navigates to the projects listing.|
|Crawl button. Navigates to the crawler. Active when a project is loaded.|
|Search button. Navigates to the search. Active when a project is loaded.|
|Options button. Navigates to app options/preferences.|
Creating a New Project
At the home screen, you'll want to add your website (or website path) so it can be indexed. Just make sure your project URL contains, at minimum, http(s) and a domain. For example,
https://interro.bot/aberdeen/ are acceptable. Once you've added your site, the crawler will get to work immediately.
Crawling Your Project
InterroBot crawler is like a curious explorer, reading pages and following links to discover new territory. If you need to take a break, no problem! You can pause the crawl at any time and even search the partial crawl. However, allowing the crawl to finish will yield the best search results.
The time required to crawl a website is variable. Small sites can index in minutes, while larger (10,000+ page) sites will take hours. The search index generated from the crawl is stable and reusable as long as it is useful. You can reindex at any time.
The crawl screen has the pause and continue controls at the top left, the status of the crawl is organized below, with a crawl log to the right.
|Play button. Start or continue active crawl.|
|Pause button. Pauses an active crawl.|
|Network status. Turns red when internet is unavailable.|
|Robots.txt status. Turns red when project URL is uncrawlable.|
Searching Your Indexed Website
Full-text search against the source HTML allows for details such as CSS class usage, element types, and attribute usage to be filtered. The inclusion of *.pdf and *.docx files allows wide-band search coverage for people, places, and things you can't afford to miss. Of course, field search is available to filter a variety of useful data.
- HTTP Headers, e.g. headers: application/pdf
- URL, e.g. url: /path/of/interest/
- HTTP Status, e.g. status: 500
- Size (bytes), e.g. size: >1000
- Download Time (ms), e.g. time: >5000
- Redirect, e.g. redirect: true
- Norobots, e.g. norobots: true
The following search help topics delve into more detail: