Monday, January 26, 2009

Feeding The Monster

I have many different Web 2.0 personas and I'd love to pull together all the various postings and updates I do into one stream and display that. Of course, this all has a name and it is Lifestreaming. Personally, I think it is a dumb name. I like the web, and do a lot of things on the web, but trust me, it ain't my life. I prefer to call it "Webstreaming".


For the most part, Webstreaming is based on RSS feeds. But it's a complicated thing to do, as nearly every feed has some customization required to get it to display correctly. There are links, text, microblogs, audio, video, etc. There are s few sites that try to do it for you, but of course for us geeks, it has to be self hosted.


So I downloaded and installed Sweetcron, a PHP-based blog software. There are some really interesting implementations of this, most especially Tom Beardshaw, who does a lot of work on it. Unfortunately, it does require a lot of customization, done all in PHP, which is no longer a favorite language of mine. I used to like it, but now find it too idiosyncratic and prone to code breakage. I do have a rough version of it running on my new domain, IHieronym.us (I use the nom de plume Hieronymus on a few Web 2.0 sites).


As I really would rather do something in Python or, even better, Django, I went on a quest. As I said, most social media sites will report activity via an RSS stream, so I needed to find a python-based RSS parser and the most common one seems to be the Universal Feed Parser, which can deal with nearly any kind of RSS / Atom feed.


But on top of that I found Feedjack, a full blown feed "aggregator" built using both Django and Feedparser! Pretty cool stuff, although installation is a very strange thing. I'm not intimately familiar with python or Django installation issues, and found it weird how the feedjack page talks about "your Django", as Django is a library, not an installation. But what they really mean is "your Django project". So, after getting Feedparser and Feedjack installed (for openSUSE, I installed Feedparser via YaST & Feedjack via the python setup.py install process, while I on my FreeBSD box I was able to find both of them in the ports), I did the following basic steps to get a Feedjack site up and running:


$ django-admin.py startproject ihieronymus
$ cd ihieronymus
$ ls
__init__.py manage.py settings.py urls.py
__init__.pyo manage.pyo settings.pyo urls.pyo

[ after editing setting.py and urls.py as suggested here to add the admin site ]

$ python manage.py syncdb
Creating table auth_permission
Creating table auth_group
Creating table auth_user
Creating table auth_message
Creating table django_content_type
Creating table django_session
Creating table django_site
Creating table django_admin_log

You just installed Django's auth system, which means you don't have any superusers defined.
Would you like to create one now? (yes/no): yes
Username (Leave blank to use 'jdarnold'):
E-mail address: jdarnold@buddydog.org
Password:
Password (again):
Superuser created successfully.
Installing index for auth.Permission model
Installing index for auth.Message model
Installing index for admin.LogEntry model
$ sudo python manage.py runserver 207.22.41.217:80
Password:
Validating models...
0 errors found

Django version 1.1 pre-alpha, using settings 'ihieronymus.settings'
Development server is running at http://207.22.41.217:80/
Quit the server with CONTROL-C.
[26/Jan/2009 13:05:31] "GET / HTTP/1.1" 404 1921
^C
$


So at this point, we have a very basic djano site with the admin part all setup. Now we need to add in the Feedjack values into the settings.py file:



MEDIA_ROOT = '/www/data/'
MEDIA_URL = 'http://www.myserver.com'
INSTALLED_APPS = (
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.sites',
'django.contrib.admin',
'feedjack',
)


Now I need to create a link to Feedjack's static images on my 'regular' web server. Remember, the builtin Django debug server doesn't serve up static images and pages, so you need host them somewhere else. Also, on many Apache sites, it won't allow serving up pages outside of the Apache folders, so you may have to copy the folder into the Apache data folder rather than just using a symbolic link.



 $ sudo ln -s /usr/local/lib/python2.5/site-packages/Feedjack-0.9.16-py2.5.egg/feedjack/static/feedjack /www/data/feedjack


Now we should be able to run syncdb to add in Feedjacks db files:



$ python manage.py syncdb
Traceback (most recent call last):
File "manage.py", line 11, in
execute_manager(settings)
File "/usr/local/lib/python2.5/site-packages/django/core/management/__init__.py", line 340, in execute_manager
utility.execute()
File "/usr/local/lib/python2.5/site-packages/django/core/management/__init__.py", line 295, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/local/lib/python2.5/site-packages/django/core/management/base.py", line 195, in run_from_argv
self.execute(*args, **options.__dict__)
File "/usr/local/lib/python2.5/site-packages/django/core/management/base.py", line 221, in execute
self.validate()
File "/usr/local/lib/python2.5/site-packages/django/core/management/base.py", line 249, in validate
num_errors = get_validation_errors(s, app)
File "/usr/local/lib/python2.5/site-packages/django/core/management/validation.py", line 28, in get_validation_errors
for (app_name, error) in get_app_errors().items():
File "/usr/local/lib/python2.5/site-packages/django/db/models/loading.py", line 128, in get_app_errors
self._populate()
File "/usr/local/lib/python2.5/site-packages/django/db/models/loading.py", line 57, in _populate
self.load_app(app_name, True)
File "/usr/local/lib/python2.5/site-packages/django/db/models/loading.py", line 72, in load_app
mod = __import__(app_name, {}, {}, ['models'])
File "/usr/local/lib/python2.5/site-packages/Feedjack-0.9.7-py2.5.egg/feedjack/models.py", line 19, in
class Link(models.Model):
File "/usr/local/lib/python2.5/site-packages/Feedjack-0.9.7-py2.5.egg/feedjack/models.py", line 20, in Link
name = models.CharField(maxlength=100, unique=True)
TypeError: __init__() got an unexpected keyword argument 'maxlength'
$


Darn, not quite there. This 'maxlength' problem is usually indicative of some kind of Django version mismatch. And yup, the Freebsd ports version of Feedjack is 0.9.7, while the Django version is 1.1 pre-alpha. So I grabbed and installed Feedjack v0.9.16 and now:



$ python manage.py syncdb
Creating table feedjack_link
Creating table feedjack_site
Creating table feedjack_feed
Creating table feedjack_tag
Creating table feedjack_post
Creating table feedjack_subscriber
Installing index for feedjack.Post model
Installing index for feedjack.Subscriber model
$


Spot on! Now when I run the server and go to the admin web page, I see Feedjacks new entries. Next is to try and decipher the obscure references for how to grab new feeds, as Feedjack is pretty tied to its paradigm of grabbing feeds from other People. First you add a "Site", which basically defines what your site is going to look like. Then you add some feeds, giving it the RSS url. Unfortunately, it isn't nearly as good at figuring out the correct RSS URL as is Sweetcron and many Web 2.0 sites make it far too hard to track it down. Then you need to add a "Subscriber", which links the Site to the Feed. All of this is because you can have multiple "Planets" hosted with a single Feedjack installation, but it seems to me to be overkill, as most sites, including all the ones I looked at in their links, only have one.



Now that you have a few sites, you run the feedjack_update.py script to go get the RSS feeds and drag them in. This requires a little bit of environment dancing around, as otherwise it can't find the Django info it needs:



$ pwd
/home/jdarnold/django/ihieronymus
$ export PYTHONPATH=/home/jdarnold/django
$ export DJANGO_SETTINGS_MODULE=ihieronymus.settings
$ feedjack_update.py
* BEGIN: 2009-01-26 14:43:49.014353
[2] Processing feed http://anaze.tumblr.com/rss
[2] Processed http://anaze.tumblr.com/rss in 0:00:00.341290 [ok] [new=0 updated=0 same=20 error=0]
[1] Processing feed http://linuxlove.tumblr.com/rss
[1] Processed http://linuxlove.tumblr.com/rss in 0:00:00.528515 [ok] [new=0 updated=0 same=20 error=0]
* END: 2009-01-26 14:43:49.941449 (no threadpool module available, no parallel fetching)


Oh, and you should probably get memcached running, as it really helps speed the database access. feedjack does a good job of using the cache. You can also install threadpool, as discussed on the web site, but I haven't tried that yet.



Next, of course, is the never ending customization battle. First though, I need to decide if this is the path I want to go down. Feedjack comes with two "themes", neither of which is as expressive as I want, especially for my music feeds like blip.fm and last.fm. And especially after I came across soup.io, which looks like it might already do all I need. Check out my page here: hieronymus.soup.io.