jlelse's Blog

Thoughts, stories and ideas

Find broken links on your website with muffet

in 👨‍💻 Dev
Share  Subscribe 

Today I wanted to check if my blog contains some broken links and it actually did. I used the Go-based tool muffet which crawls a website and checks for HTTP errors when following links. To speed up the process, I started a local Hugo server and used the following command for muffet (after installing it):

muffet -e ".*jlelse\.blog.*" -e ".*indieauth\.com.*" -e ".*aperture.*" -e ".*quill\.p3k\.io.*" -e ".*addtoany\.com.*" -t 60 -c 10 -f -x -s http://localhost:1313/

Using -e you can exclude sites using regular expressions, -t sets the timeout and -c sets the number of concurrent connections. With -f I disabled the check for URL fragments because somehow that gave me errors for some sites that worked fine in the browser. -x skips the TLS verification (that probably increases the crawl speed) and -s makes muffet only check sites from the sitemap file.

Sometimes muffet shows HTTP 403 errors for pages that work fine in the browser, that’s probably because those sites are blocking crawlers. What you should focus on are those HTTP 404 errors.

Of course, you can use muffet for online pages too, but I used the local Hugo server because that’s a lot faster.

5

Jan-Lukas Else
Interactions
You can also create an anonymous comment.