Today I wanted to check if my blog contains some broken links and it actually did. I used the Go-based tool muffet which crawls a website and checks for HTTP errors when following links. To speed up the process, I started a local Hugo server and used the following command for muffet (after installing it):
muffet -e ".*jlelse\.blog.*" -e ".*indieauth\.com.*" -e ".*aperture.*" -e ".*quill\.p3k\.io.*" -e ".*addtoany\.com.*" -t 60 -c 10 -f -x -s http://localhost:1313/
-e you can exclude sites using regular expressions,
-t sets the timeout and
-c sets the number of concurrent connections. With
-f I disabled the check for URL fragments because somehow that gave me errors for some sites that worked fine in the browser.
-x skips the TLS verification (that probably increases the crawl speed) and
-s makes muffet only check sites from the sitemap file.
Sometimes muffet shows HTTP 403 errors for pages that work fine in the browser, that’s probably because those sites are blocking crawlers. What you should focus on are those HTTP 404 errors.
Of course, you can use muffet for online pages too, but I used the local Hugo server because that’s a lot faster.