Advanced Search
Full-text search covers a lot of ground, but often you need to go deeper. InterroBot captures a variety of field data that can be queried independently. The syntax to search against a particular field is as follows:
fieldname: query
While full-text always searches for text/strings, field search supports one of three data types: string, number, or boolean, depending on the field used.
HTTP headers and URL Filtering
The HTTP headers contain useful information relative to caching, content types, security flags, and more.
If you wanted to find all PDFs hosted on a website, full-text search would lead you astray. In returning both useful and useless results, full-text can frustrate your ability to obtain search precision. A more precise way would be to query against headers for a content-type of application/pdf. The name/value pair nature of the headers data makes it less prone to false positives than full-text.
headers: application/pdf
URL filtering is ostensibly about finding all pages/assets within a particular HTTP path. This, in itself, can be a practical tool. More useful, however, is the ability to use URL filtering with other queries (including full-text!). Field queries can be combined with an uppercase AND.
headers: application/pdf AND url: /archive/
Please note that while the boolean AND is supported, other boolean operators such as OR, NOT, and parenthetical blocks are currently unsupported.
HTTP Status, Download Size, and Response Time
Where searching against headers and full-text requires strings, status and download values are captured as numbers. This means you can search using greater-than, lesser-than, and equality operators.
If you've used the Client Errors and Server Errors buttons, you are already familiar with searching status. You can search for either a particular status code, or a range. If you wanted all errors, both client and server, you could search the following:
status: >=400
Filtering on the download size is an easy way to identify bloated and uncompressed images, HTML, or other assets. The search is in bytes (1,048,576 bytes per megabyte). If you wanted to find all images over half a megabyte, you could search:
size: >500000 AND headers: image
Redirects and Robots Exclusion
Querying for redirected resources and robots exclusion is filtered as a boolean (true or false). Say you wanted to clean up all redirects that were the result of links to HTTP (presumably forwarding to HTTPS). This could be achieved with the following query:
redirect: true AND url: http
Finally, the robots exclusion status captures the index status. Robots-excluded pages and assets are added to the index (as a connected resource), but the contents are not downloaded.
norobots: true
InterroBot is a web crawler and developer tool for Windows, macOS, and Android.
Want to learn more? Check out our help section or
download the latest build.