Finding Broken Links
Link rot is a fact of life, but all hope is not lost. Broken links and missing media are both easy-to-locate problems when assisted by a web crawler such as InterroBot.
There are two ways to identify broken links. Using the Link Checker report is recommended way to generate a spreadsheet of issues fast, while leveraging search is better for self-styled approaches to link cleanup. Once you've identified the troublespots, there are various strategies on fixing links.
Using the Link Checker Report
The Link Checker is one of InterroBot's Core Reports. It is available out of the box. The report is run after crawling, and a spreadsheet of broken links is generated. From there, you would generally work off the spreadsheet in your CMS admin.
Locating Broken Resources with Search
From the project's search page, there are two canned searches, the buttons Client Errors and Server Errors located under the search form. These results are the direct way to filter down missing and broken web content.
Client Errors. Filters to client HTTP errors (400-451). While 404 Not Found errors tend to dominate the results, 403 Forbidden and 400 Bad Request (often poorly pasted/keyed links) are regulars.
Server Errors. Filters to server HTTP errors (500-511+). These include the dreaded 500 Internal Server Error, along with canonical and custom error codes dealing with timeouts, SSL certificate failures, and more.
Locating References to the Broken Resource
Client and server errors occur not only on linked webpages but on linked assets as well. Images, JavaScript, and CSS files are all capable of producing HTTP errors.
The easiest way to identify pages that contain inbound links to the problematic resource is to click on the result and look at the Inlinks panel from the context of the broken resource.
The pages under Inlinks will contain the href/src attributes that need fixing, assuming the linked reference isn't coming back.
How to Fix the Link
In all but a handful of cases, these HTTP error responses are not what your users expected. On their end, the experience is that of a "bad link." The fix? Well, that's up to you. Here are some common approaches.
- Fix the web server/application
- If a server error, and it's on your site
- If the problem is widespread due to a webpage template/theme
- Bring the content back online
- If it was misplaced/unpublished/what-have-you
- Add a HTTP redirect, forwarding to similar content
- If you are an SEO maximalist
- Unlink the webpages generating errors on the inbound links side
- If the content shouldn't be linked in the first place
- If the page is intentionally retired (also consider 410 Gone)
- If the source of the error is external/outside of your control
- Find a suitable replacement URL on the inbound links side
- If link preservation is important
- If page content was shuffled in a reorganization
- Check archive.org's Wayback Machine for an old snapshot
InterroBot is a web crawler and developer tool for Windows, macOS, and Android.
Want to learn more? Check out our help section or
download the latest build.