Django, Memcached and the EZTV twitter feed

Back in the days I wrote a HTML-page that used javascript, AJAX and jsonp to parse the eztv-it twitter timeline to something useful (can be found here: http://eztv-mirror.appspot.com/), unfortunately it does not work anymore.

As a fun project I considered implementing a replacement in Django, using the django caching framework and some "nice to know" libraries (eg requests).

The main problem with the old version is that it does not receive a response from the twitter REST api. When you hit the URL directly you however, get a response. I suspect the problem has to do with the callback used to wrap the content from the server side not working correctly (not wrapping the json response in the callback function or similar).

Using the python requests library I was able to get the data I needed, the data can be found on the following URL:
https://api.twitter.com/1/statuses/user_timeline.json?screen_name=eztv_it

Since this project is about farmiliarising myself with a lot of the "nice to know" libraries commonly used, the list is the following (from requirements.txt):

Django==1.5.1 - Obviously :)
python-dateutil==2.1 - Dateutil has a nice parser method, that attempts to parse the date using common formats, this way I most likely won't notice if Twitter changes their datetime format.
pytz==2013b - required by python-dateutil, but Django uses it internally as well (if available).
requests==1.2.0 - used to get the data from Twitter, it's a wrapper around the urllib/urlib2/httplib mess.
python-memcached==1.51 - One of the two popular memcached binding for python.
django-memcached==0.1.2 - Django bindings to the python memcached module (above).

Most of the above is convenience, to reduce the number of lines required, it makes the code easier to read and reduces the chance of bugs.

In django you define the caching used in the settings module using something similar to (example is local memcached, over TCP):

CACHES = {
 'default': {
 'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
 'LOCATION': '127.0.0.1:11211',
 }
}

The backend can be switched to something else that implements the django cache interface, without much notice. You can also use multiple caching backends simultaneous.

The django caching framework is extremely easy to use, here's my take on it (from app/eztv.py):

from django.core.cache import cache
...
# Key of the cache.
CACHE_ENTRY_NAME = 'eztv_cache'
# Timeout of the cache (in seconds)
CACHE_ENTRY_TIMEOUT = 600
...
def update_cache():
    ...
    data = list(yield_data())

    cache.set(CACHE_ENTRY_NAME, data, CACHE_ENTRY_TIMEOUT)
    return data

def get_cache():
    ...
    _cache = cache.get(CACHE_ENTRY_NAME)

    # If cache is empty or outdated, update it.
    if _cache is None:
        _cache = update_cache()

    return _cache

As can be seen, the django cache framework is a simple import (after defining the backend in settings) and works along the lines of a key/value store, where the elements can have a timeout.

The backend can even be memory-only (normally only for development) or simply use files (as keys) on the filesystem.

One of the caveats to the method I use is of course the problem when the cache is outdated and needs to be refreshed, since this takes several seconds. One way to solve this is by using a cronjob for updating the cache. This however brings more complexity and more dependencies to the system as a whole.

It coult be interesting to try and send 100 requests quickly after the cache in outdated or missing. I do not know if this is a problematic case that django solves for me by keeping track of get/set calls to the cache per request, or if it's a problem I have to solve myself. It is however easy to prove.

Git repository can be found here: https://bitbucket.org/dennishedegaard/eztv/
A running site can be found here: http://ez.dhedegaard.dk/