Website Full-text Search

Home » Help »

Research shows that knowledge workers spend up to 30% of their time searching for information. If you're going to spend that much searching, you should at least have solid tools.

InterroBot's fulltext index captures HTML page contents in its entirety: elements, attributes, text, and meta tags. This is useful when querying for usage of a particular file, CSS classname, or other HTML pattern.

You can filter by:

  • page text
  • image filenames and alt-text
  • linked filenames (CSS, JS, etc.)
  • HTML element names, attribute names/values (classnames!)
  • SEO keywords

For the developer or CMS administrator, full-text search is how you can turn a vaguely documented issue into a clear target. On the developer end, it's nice to have confidence when retiring a CSS or JavaScript file, that it is no longer being loaded via a link or script element.

For the CMS admin, it affords conveniences like providing quick validation that all instances of someone's name have been changed, or verifying that outdated product descriptions have been fully updated across your site.

Querying against the full-text index, insights abound.

Common Search Patterns

Finding asset usage. Search for logo.png or analytics.js to see every page that references a specific file. This is invaluable before removing deprecated assets or updating linked references.

Tracking content changes. After updating terminology across your site, search for the old term to verify you caught everything. Especially useful for brand name changes or product renamings.

Identifying HTML patterns. Search for class names like class="deprecated" or data-tracking to locate pages using specific markup patterns. Developers can use this to audit technical debt or prepare for framework migrations.

Locating SEO elements. Find pages missing meta descriptions, or search for specific keywords to verify content coverage. Pair this with Advanced Search to filter by HTTP status codes or URL patterns for deeper SEO audits.

Beyond HTML: PDF and DOCX Support

InterroBot indexes not just HTML, but also the full text content of PDF and DOCX files linked from your site. You can search across whitepapers, documentation, and downloadable resources just as easily as webpage content. If you're managing a content-heavy site with lots of documents, this feature turns InterroBot into a unified search engine for your entire digital presence.

For integration with AI tools like ChatGPT or Claude, see AI Data Access to learn how to export filtered search results. New to search? Start with Getting Started to crawl your first site, or explore Crawler Options to fine-tune what gets indexed.


InterroBot is a web crawler and developer tool for Windows, macOS, Linux, iOS, and Android.
Want to learn more? Check out our help section or download the latest build.