Miniflux scraper rules
Speaking of web comics that I follow via RSS, ATOM or JSON feed: ideally I would like to see the comics directly in my feed reader (Miniflux). Some feeds already show the images directly in the content, others do not.
For the feeds that don’t show the content directly, Miniflux provides an option to extract the content from the corresponding website. Combined with the ability to configure extraction (scraper) rules.
An example I had to create an extraction rule for the other day is The Joy of Tech.
So in the extraction rule, all you really need to do is set a CSS selector, which then selects the HTML elements to be extracted.
CSS selectors are a powerful tool, although like regular expressions, it sometimes takes a bit of research or training to find the appropriate selector. For The Joy of Tech, the following rule works:
p.Maintext a[href="support.html"] img[src$=".png"]
This selector selects all image tags whose src attribute ends in
.png and is in an anchor element referencing
support.html, which in turn is in a paragraph with the class
Maintext. Complicated, but it works and when extracted Miniflux now shows me only the actual comic.
Tags: Comics CSS Miniflux