Wednesday, March 16, 2022

webpage with links to ebook using RSS and Calibre

Calibre has the feature of "news feed" which will fetch the links listed in an rss feed and create an ebook out of them. Though "custom recipes" can be written to scrape pages and get links from there, I tried to make things easier for me by using an automated tool to create RSS from HTML, and then use calibre's "add new recipe" to "create a recipe from scratch" using just the RSS feed. The Hindu's Young World archives did not have an RSS feed, but I could create one quite easily with feed43 ("Feed for free") using the simple item search pattern
<h3><a href="{%}"{*}class="">{%}</a></h3>
with
Item link template = {%1} and 
Item Title template = {%2}

feed43's free feed is limited to 20 items, so for all the older links, I would need something else. Found mkfeed, which uses the same syntax as feed43. So I just needed to install mkfeed and slightly tweak the provided example to create an RSS file locally - 
URL="https://www.thehindu.com/topic/The_Hindu_Young_World/"
wget -q -O - "$URL" | mkfeed \
    --pattern-item '<h3><a href="{%}"{*}class="">{%}</a>' \
    --feed-title 'YW Archive1' \
    --feed-link "$URL" \
    --feed-desc 'Older Young World articles' \
    --item-title '{%2}' \
    --item-link '{%1}' \
    --item-desc '{%2}' > yw1.rss

Calibre didn't seem to be able to fetch local files as RSS feeds when I gave the local path as /local/path/to/yw1.rss as the feed url. So, ran a simple webserver using python as described here, but changing the port to the less privileged 8000, with
import os
from http.server import HTTPServer, CGIHTTPRequestHandler
# Make sure the server is created at current directory
os.chdir('.')
# Create server object listening the port 8000
server_object = HTTPServer(server_address=('', 8000), RequestHandlerClass=CGIHTTPRequestHandler)
# Start the web server
server_object.serve_forever()

Then it was just a matter of pointing a new recipe in calibre to http://localhost:8000/yw1.rss and calibre did the rest. This might be an easier set of steps to do compared with my earlier manual method using HtmlAsText.

No comments:

Post a Comment