Advanced Search

Home » Help »

Full-text search covers a lot of ground, but often you need to go deeper. InterroBot captures a variety of field data that can be queried independently. The syntax to search against a particular field is as follows:

fieldname: query

While full-text always searches for text/strings, field search supports one of three data types: string, number, or boolean, depending on the field used.

Filtering to a path (aka directory).

HTTP headers and URL Filtering

The HTTP headers contain useful information relative to caching, content types, security flags, and more.

If you wanted to find all PDFs hosted on a website, full-text search would lead you astray. In returning both useful and useless results, full-text can frustrate your ability to obtain search precision. A more precise way would be to query against headers for a content-type of application/pdf. The name/value pair nature of the headers data makes it less prone to false positives than full-text.

headers: application/pdf

URL filtering is ostensibly about finding all pages/assets within a particular HTTP path. This, in itself, can be a practical tool. More useful, however, is the ability to use URL filtering with other queries (including full-text!). Field queries can be combined with an uppercase AND.

headers: application/pdf AND url: /archive/

Please note that while the boolean AND is supported, other boolean operators such as OR, NOT, and parenthetical blocks are currently unsupported.

HTTP Status, Download Size, and Response Time

Where searching against headers and full-text requires strings, status and download values are captured as numbers. This means you can search using greater-than, lesser-than, and equality operators.

If you've used the Client Errors and Server Errors buttons, you are already familiar with searching status. You can search for either a particular status code, or a range. If you wanted all errors, both client and server, you could search the following:

status: >=400

Filtering on the download size is an easy way to identify bloated and uncompressed images, HTML, or other assets. The search is in bytes (1,048,576 bytes per megabyte). If you wanted to find all images over half a megabyte, you could search:

size: >500000 AND headers: image

Narrowing down to large images using field search.

Redirects and Robots Exclusion

Querying for redirected resources and robots exclusion is filtered as a boolean (true or false). Say you wanted to clean up all redirects that were the result of links to HTTP (presumably forwarding to HTTPS). This could be achieved with the following query:

redirect: true AND url: http

Finally, the robots exclusion status captures the index status. Robots-excluded pages and assets are added to the index (as a connected resource), but the contents are not downloaded.

norobots: true

InterroBot is a web crawler and developer tool for Windows, macOS, and Android.
Want to learn more? Check out our help section or download the latest build.