Saturday 16 August 2008

Caching static web content with Seaside



Given that it looks as though ircbrowse has gone away permanently, I decided to write my own little client for viewing the logs of the #squeak IRC channel. Fortunately most of the hard work had already been done at http://tunes.org/~nef/logs/squeak/, but as the site itself notes: "these logs are purposely "raw" and are intended to be parsed/reformated/wrapped before viewing."

So I put together a client in Squeak and Seaside, currently viewable at http://mykdavies.seasidehosting.st/seaside/my/irclog to do that for me. While I was doing so, I got a bit carried away and added some extra features:
  • All lines can be colourised based on the author's nick
  • It attempts to recognise HTTP URLs and converts them to links
  • Each message timestamp is an anchor, so you can link to a given message
  • At certain times there's a lot of chat in Spanish on the channel, so I added in-page translation capabilities using the Google Ajax Translate API.
  • Key session parameters are added to the URL, so that they are maintained even when the session has expired.
Now, once you've added all these great features, you end up with a massive page that's complex to generate, but once it is generated most of the content remains static. There's not a great deal that I can do about the initial generation (except get rid of some of the functionality), but surely I can cache that content?
I searched the Seaside mailing lists, but generally responses tended to point out that caching of static content should be handed off to the front-end web server. There were two problems here -- first, the page wasn't going to be totally static, just the content from a single component. Second, I wanted to put this application on seasidehosting.st, so I wouldn't have this option available to me anyway.
They say that once you get a hammer, every problem looks like a nail, so I wrote a Seaside solution: a simple caching component using the decorator pattern that checks to see if the content requested is already in the cache, and if not it asks the owner component to generate it. This proved to be very effective; the profiler showed that a non-cached page was taking around 2-3 seconds to generate, but with the HTML cached, this dropped to a handful of milliseconds. 
Given that this approach can substantially reduce the server load even when the page as a whole isn't static, I was surprised not to find something like this in Seaside already, is it too trivial? -- please let me know in the comments if I missed something obvious.
There's nothing particularly clever about the class once you've worked out that all the html that has been generated at any point in time is accessible via html context document stream. However, this approach requires A WARNING -- no session information must be used within the cached components -- no forms, no dynamic links, no Scriptaculous functions. Also, be careful about use of html nextId in the cached component - there is no guarantee that such IDs will be unique when incorporated into a document from the cache.
Anyway here's how it works. The component in question is wrapped in an application-specific subclass of a caching decorator component that:
  1. Maintains a cache dictionary as a class instance variable.
  2. Requires the owner class to implement a #key method that provides a unique key for each page, and also an inst-var called shouldCache (with associated accessors).
  3. Implements #renderContentOn: that 
    1. Checks if the requested page is in the dictionary (using the #key method).
    2. If so, stream the cached content to the current renderer and return.
    3. If not, notes the current html context document stream position.
    4. Sends #renderContentOn: to the owner.
    5. In this method, the owner then renders as normal, and resets shouldCache if it doesn't want the current page to be cached (ie if it is still subject to change).
  4. Control then returns to the decorator. If shouldCache is still set, this checks the html context document stream to find all the content that the child added, and copies it into its own cache, and exits.
And that's it!