Monday, January 07, 2008

SQLite table to read Atom feeds

Ah, Christmas Holidays! Time to take a break from the daily chores... to spend some time with family... but also time to catch up on reading and spend some hours on some fun hacking.

When catching up on my reading of Dr. Dobb's Journal, I came across an interesting article by Michael Owens about writing virtual tables for SQLite, which got me thinking about a small hack I've wanted to do for a while: a table that reads an RSS/Atom feed and presents the data to the query engine. Originally, I was planning to implement this as a MySQL Storage Engine, but since I was reading this article and the interface seems easy enough to work with, I decided to just whip together a simple prototype for SQLite instead. Since I currently don't have a good place to publish the repository, I have a distro available at http://www.kindahl.net/pub/sqlite-feedme-0.01.tar.gz.

After building and installing, the table can be created as simple as this:

mats@romeo:~/proj/feedme$ sqlite3
SQLite version 3.4.2
Enter ".help" for instructions
sqlite> .load libfeedme.so
sqlite> create virtual table onlamp
   ...> using feedme('http://www.oreillynet.com/pub/feed/8');
sqlite> select title from onlamp;
PyMOTW: weakref
What the Perl 6 and Parrot Hackers Did on their Christmas Vacation
Least Appropriate Uses of Perl You've Seen
YAP6 Operator: Filetests?
WILFZ (What I Learned From Zope):  Buildout
TPT(Tiny Python Tip):  Watch Jeff Rush's Videos
PyCon 2008 Talks and Tutorials Finalized
TPT(Tiny Python Tip):  Python for Bash Scripters
What the X-Files Taught Us about Real Aliens
Python Web Framework Comparison:  Documentation and Marketing
Python Web Framework Comparison:  Documentation and Marketing
PyMOTW: mmap
Improving Test Performance
YAP6 Operator: Reduce Operators - Part II
WSGI:  Python Web Development's Howard Roark
Note that it is still a prototype. My plans are to at least:
  • Read the entire feed into memory and parse it from there instead of writing the feed to disk before parsing it. Reading it to disk was the default for cURL, so I just stuck to that for the prototype (yeah, yeah. I know I'm lazy.)
  • Allow the feed format to automatically be detected and set the parser accordingly. Right now, it can just handle Atom feeds, and does not do a great job at that either.
  • Figure out a way to present multiple entries data in a useful way. For example, an entry can hold several links, but which one is really the interesting one?