Command Line Interface (CLI)

Home » Help »

InterroBot introduced the command line interface in v2.12, bringing crawling, search, and core reporting inside the the terminal. The command line interface is available for Windows, Linux, and macOS. Due to restrictions on unlisted programs (CLI apps are not listed in the Windows start menu), the CLI is not currently available for the Windows Store version.

The CLI shares a database with the GUI application. Projects created in one are visible in the other, crawl data is shared, and the reports all run against the same seach index. You can set up projects in the GUI, run scheduled crawls from a cron job, and review results in whichever interface suits the moment.

Every command supports -h for detailed help. When in doubt, start there.

Using InterroBot on the command line

Platform Variation

The executable is called differently, depending on platform. For Windows and standard Linux packages, simply use interrobot -h.

macOS users will need to point to InterroBot, nested in the applications directory. If you use the CLI regularly, you can add InterroBot to PATH by symlinking to /usr/local/bin/interrobot.

/Applications/InterroBot.app/Contents/MacOS/InterroBot -h

Linux Flatpak users can use the CLI by wrapping the command in accommodate the sandboxing, it's a little clunky, but it works reliably.

flatpak run --command=sh com.pragmar.interrobot -c "interrobot -h"

Project Management

Projects are the fundamental unit of organization in InterroBot. Each project corresponds to a website (or set of URLs) and its crawl data. The CLI provides four commands for managing them: list, create, update, and delete.

Listing projects gives you an overview of projects in your database. This is often your first step when finding the project you wish to work with.

~$ interrobot list
 ID   Type         Name          URL                    Pages   Assets
  1   CrawledUrl   example.com   https://example.com/       2        0
  2   CrawledUrl   interro.bot   https://interro.bot/     311       92

Creating a project requires a name and a URL. InterroBot supports two types of project, CrawledUrl is the classic site crawler that follows links recursively from a seed URL, where CrawledList takes a newline-separated list of specific URLs to crawl, or optionally a file containing the URLs, one per line.

~$ interrobot create -n example.com -u https://example.com
 Project (ID=1) created in 00:00.423.
~$ interrobot create -n MyList -t CrawledList -u "@urls.txt"
 Project (ID=2) created in 00:00.388.

This is particularly useful when managing a curated set of pages across different domains, or when you want to crawl specific sections without following every link. For more on how project types affect crawl behavior, see Crawler Options.

Updating a project lets you rename it, change its URL list for CrawledList type projects, and configure crawler settings.

~$ interrobot update -p 1 --crawler-thread-model gentle --crawler-crawl-delay 1

The update command displays the full settings table after each change, so you can verify the current configuration at a glance. You can reference projects by either ID or name throughout the CLI. For details on what each crawler setting controls, see Crawler Options.

Deleting a project removes all project data, including crawl history.

~$ interrobot delete -p blogs
 Deleted Project (ID=3) in 00:00.089.

Crawling

The crawl command runs a crawl to completion and reports the results. The CLI crawl is a blocking operation that finishes before returning, but you can cancel at any time with Ctrl+C. This is the equivalent of pausing via the GUI. The optional -l flag writes a crawl log to disk.

The crawl respects all project settings, including thread model, crawl delay, and path exclusions. If you need to tune these before crawling, use update first. For a deeper look at how crawl timing and settings interact, see Crawler Options.

~$ interrobot crawl -p example.com
 Crawl (ID=1) completed in 00:01.743.

 ID   Project ID   Modified            Completed    Time   Pages   Assets
 42            1   2026-04-22T03:26Z        true   00:01       2        0

Querying the API

The api command provides access to InterroBot's data API, the same API behind the plugin system. Every API method has its own set of options and its own -h help page with full examples. The API output defaults to JSON, with several methods also supporting CSV and WARC output for use with classic scripting or AI tooling.

GetProjects and GetCrawls

These methods retrieve metadata about your projects and their crawl history. GetProjects returns project details, optionally filtered by ID. GetCrawls returns the crawl history for a given project, with filters for completion status and sorting.

~$ interrobot api -m GetProjects --fields "urls|created"
~$ interrobot api -m GetCrawls -p example.com --complete true

GetResources

GetResources is the CLI equivalent of InterroBot's full-text search. It returns crawled resources for a project, with support for search queries, type filtering, field selection, and sorting.

This method supports the same advanced search syntax available in the GUI, including field queries (status:, headers:, url:, size:), boolean operators (AND, OR, NOT), and parenthetical grouping. External and norobots resources are excluded by default; add --external or --norobots flags to include them. Results are capped at 100 per page, but you can use --offset and --perpage to paginate through larger result sets.

Pipe-separated field names control which data comes back, beyond the basic fields (id, url, status).

~$ interrobot api -m GetResources -p 2 --query "headers: application/pdf AND size: >500000" --fields "links|status|size"

GetExporterReport

The exporter packages crawled content into JSON, CSV, or WARC format, ready for scripting, or to process with ChatGPT, Claude, or any other LLM. For more on AI integration, see AI Data Access.

The --warc-type option controls how page content is encoded in the archive. Markdown encoding strips away HTML complexity and uses significantly fewer tokens, allowing you to fit more content into an AI's context window. HTML encoding preserves full page structure for technical analysis. For most content review and AI workflows, markdown is the better choice.

~$ interrobot api -m GetExporterReport -p 1 -f csv --fields "time|status|size" -o ~/export.csv
~$ interrobot api -m GetExporterReport -p 1 -f warc --warc-type markdown -o ~/crawl.warc

GetLinkCheckReport

This method generates the same broken link report available in the GUI. It returns all internal and external URLs with their HTTP status codes, making it easy to identify 404s, missing media, and related errors.

The --distinct flag collapses duplicate destination URLs, so a single broken page referenced from ten different places shows up once rather than ten times. This is usually what you want when building a fix list.

~$ interrobot api -m GetLinkCheckReport -p 1 --distinct
~$ interrobot api -m GetLinkCheckReport -p 1 -f csv -o ~/links.csv

GetScraperReport

The scraper extracts structured data from crawled pages using CSS selectors, XPath expressions, or regular expressions. Selectors follow the pattern type:result:selector or type:selector, where type is css, xpath, or regex, and result is text or html.

You can pass multiple selectors at once using newline separation. This is a fast way to pull structured data from across an entire site, scraping every H1, every meta description, or every instance of a particular pattern in one pass.

~$ interrobot api -m GetScraperReport -p 1 --selector "xpath://h1"
~$ interrobot api -m GetScraperReport -p 1 --selector "css:text:h1"
~$ interrobot api -m GetScraperReport -p 1 --selector "xpath://h1
  regex:2\d\d\d"

GetSpellCheckReport

Spell checking runs against crawled page content and returns flagged terms with context snippets and occurrence counts. The dictionary is conservative and will flag trade names, acronyms, and jargon as potential errors. Use --ignore-numbers and --ignore-punctuation to cut down on noise. Only en_US is supported via the CLI. For additional languages, the Internationalized Spell Check GUI plugin covers many more languages.

~$ interrobot api -m GetSpellCheckReport -p 2 --ignore-numbers --ignore-punctuation

GetCannibalizationReport

Keyword cannibalization occurs when multiple pages on the same site compete for the same search terms, diluting ranking. This report identifies pages that share keyword focus, ranked by TF-IDF relevance. For more on the SEO implications, see the Keyword Cannibalization plugin. You can check multiple keyword phrases in a single call using newline separation. Each phrase runs independently, and results are merged into one response.

~$ interrobot api -m GetCannibalizationReport -p 1 --keywords "pricing"
~$ interrobot api -m GetCannibalizationReport -p 1 -f csv --keywords "app store
macOS"

Output Formats

Most commands default to human-readable text output. Adding -f json switches to JSON, which includes a __meta__ object with request metadata alongside the results array. API methods default to JSON, while project management commands (list, create, update, delete) default to text.

Reports that support CSV (-f csv) can be written to disk with -o for a file you can open in Excel, Google Sheets, or feed into other tools. WARC output (available via GetExporterReport) requires an explicit output path.

Paid Features

Some crawler options require an active trial or license. These are marked with $ in the CLI help and include HTTP request header overrides (Authorization, X-Auth-Token, User-Agent), path exclusions, URL rewrites, and the follow-all directive. For licensing details, see Licensing & Terms.

The API methods, project management commands, and core crawl functionality are all available in Community Edition.

Scripting Recipes

The CLI is designed to compose with standard Unix tools. A few patterns to get you started:

Scheduled crawl with cron, re-crawling a project nightly and logging results:

0 2 * * * /path/to/interrobot crawl -p mysite -l /var/log/interrobot/crawl.log

Export broken links to CSV after a fresh crawl:

~$ interrobot crawl -p mysite && interrobot api -m GetLinkCheckReport -p mysite -f csv --distinct -o ~/broken-links.csv

Pipe resource data to jq for custom filtering:

~$ interrobot api -m GetResources -p 1 --fields "status|size" --type page | jq '.results[] | select(.status != 200)'

Export site content for AI analysis:

~$ interrobot api -m GetExporterReport -p 1 -f warc --warc-type markdown -o ~/site.warc

The WARC file can then be loaded directly into Claude, ChatGPT, or any LLM that accepts file attachments. For more on integrating crawl data with AI tools, see AI Data Access.

For developers building more complex integrations around the CLI, the API and Plugin Development guide covers the full data model and plugin architecture.

InterroBot is a web crawler and developer tool for Windows, macOS, Linux, iOS, and Android.
Want to learn more? Check out our help section or download the latest build.