Dreams of a Scorpion: 2008

Saturday 16 August 2008

Caching static web content with Seaside

Given that it looks as though ircbrowse has gone away permanently, I decided to write my own little client for viewing the logs of the #squeak IRC channel. Fortunately most of the hard work had already been done at http://tunes.org/~nef/logs/squeak/, but as the site itself notes: "these logs are purposely "raw" and are intended to be parsed/reformated/wrapped before viewing."

So I put together a client in Squeak and Seaside, currently viewable at http://mykdavies.seasidehosting.st/seaside/my/irclog to do that for me. While I was doing so, I got a bit carried away and added some extra features:

All lines can be colourised based on the author's nick
It attempts to recognise HTTP URLs and converts them to links
Each message timestamp is an anchor, so you can link to a given message
At certain times there's a lot of chat in Spanish on the channel, so I added in-page translation capabilities using the Google Ajax Translate API.
Key session parameters are added to the URL, so that they are maintained even when the session has expired.

Now, once you've added all these great features, you end up with a massive page that's complex to generate, but once it is generated most of the content remains static. There's not a great deal that I can do about the initial generation (except get rid of some of the functionality), but surely I can cache that content?

I searched the Seaside mailing lists, but generally responses tended to point out that caching of static content should be handed off to the front-end web server. There were two problems here -- first, the page wasn't going to be totally static, just the content from a single component. Second, I wanted to put this application on seasidehosting.st, so I wouldn't have this option available to me anyway.

They say that once you get a hammer, every problem looks like a nail, so I wrote a Seaside solution: a simple caching component using the decorator pattern that checks to see if the content requested is already in the cache, and if not it asks the owner component to generate it. This proved to be very effective; the profiler showed that a non-cached page was taking around 2-3 seconds to generate, but with the HTML cached, this dropped to a handful of milliseconds.

Given that this approach can substantially reduce the server load even when the page as a whole isn't static, I was surprised not to find something like this in Seaside already, is it too trivial? -- please let me know in the comments if I missed something obvious.

There's nothing particularly clever about the class once you've worked out that all the html that has been generated at any point in time is accessible via html context document stream. However, this approach requires A WARNING -- no session information must be used within the cached components -- no forms, no dynamic links, no Scriptaculous functions. Also, be careful about use of html nextId in the cached component - there is no guarantee that such IDs will be unique when incorporated into a document from the cache.

Anyway here's how it works. The component in question is wrapped in an application-specific subclass of a caching decorator component that:

Maintains a cache dictionary as a class instance variable.
Requires the owner class to implement a #key method that provides a unique key for each page, and also an inst-var called shouldCache (with associated accessors).
Implements #renderContentOn: that

Checks if the requested page is in the dictionary (using the #key method).
If so, stream the cached content to the current renderer and return.
If not, notes the current html context document stream position.
Sends #renderContentOn: to the owner.
In this method, the owner then renders as normal, and resets shouldCache if it doesn't want the current page to be cached (ie if it is still subject to change).

Control then returns to the decorator. If shouldCache is still set, this checks the html context document stream to find all the content that the child added, and copies it into its own cache, and exits.

And that's it!

Thursday 17 July 2008

My first video: creating a Hello World class in Squeak

Given the recent push to get Squeak video tutorials available, I decided to have a go myself. I took as my starting point my post from a few months back, intended to act as a quick introduction to developers coming to Squeak for the first time.

My first problem was to find a good screen capture utility. Unfortunately, Wink isn't available for OS X, but a bit of searching uncovered Snapz Pro X. Despite the terrible name, it's a very nice piece of software that makes recording video and screenshots very easy. It comes with a 14-day fully-functional trial, so if I get a taste for this, I may end up having to cough up the $69 soon. If anyone knows of other software I could use, please let me know.

You can view the video at vimeo.com, or find it in the new Squeak Smalltalk group that Randal Schwartz has set up. Have a look at the video and let me know what you think: too fast, too slow, too much like Ricky Gervais, whatever.

Monday 14 July 2008

Using Apache as a front-end for Seaside

I'll admit it, configuring Apache scares the bejeezus out of me. The documentation seems to be so focused on the trees, that the wood becomes an impenetrable, gloomy forest. I guess I'm not alone in this, which makes Ramon Leon's posts on configuring Apache with Seaside(1, 2, 3) so useful.

Despite this, I've still steered clear of going near Apache, until Ramon posted a sample extract of configuration text. Now, cut-and-paste is something I can do, so I decided to give it a go.

I'm on Mac OS X, so Apache is installed and running by default. Despite my earlier protestations, I have played with Apache before, so I knew that httpd.conf was the key file to control how Apache runs. A bit of poking about in man files uncovered the location of the file I needed: /private/etc/apache2/http.conf.

I opened a Terminal and cd'ed to /private/etc/apache2/ where I could execute sudo vi httpd.conf which prompted me to enter my password to edit this administrator file, and then allowed me to start hacking. I wanted to leave my existing configuration as untouched as possible, so below the line:

Listen 80

I added a new line:

Listen 81

which would start Apache listening on port 81 as well as port 80. I could then use port 81 for my experimentation.

Typing sudo /usr/sbin/apachectl restart caused Apache to restart, hopefully loading my change.

I then tried browsing http://localhost:81/

Success (so far) - Apache is trying to deal with my request, and using its default handler.

Now, I'd noticed an interesting line at the bottom of httpd.conf:

Include /private/etc/apache2/other/*.conf

which meant that I could make any other changes in a separate file. So sudo vi other/seaside.conf opened a new .conf file, into which I typed:

virtualhost>  
    #ServerName yoursite.com
    #DocumentRoot /var/www/myExamplePath
    RewriteEngine On
    ProxyRequests Off
    ProxyPreserveHost On
    UseCanonicalName Off
    # if the path doesn't exist, rewrite it to be a Seaside file ref
    RewriteRule ^/seaside/files(.*)$ http://localhost:8080/seaside/files$1 [P,L]
    RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f

    # redirect all requests to seaside - configure this as required
    RewriteRule ^/(.*)$ http://localhost:8080/seaside/$1 [P,L]

/virtualhost>

Which (when you put back the angle brackets that Blogger eats on the first and last lines) is taken directly from Ramon's post. It does two useful things. It takes any requests coming in to port 81, and first checks to see if you're requesting a physical file, and if so, serves it up (useful for the static parts of your site). If not, it passes them on to Seaside on the appropriate port, adding the standard /seaside prefix (so this is hidden from the outside world, which is a nice touch). Restarting Apache again and this time browsing http://localhost:81/examples/counter got me:

Again, good news. However, when I tried to navigate around, or use applications that made use of built-in #style and #script methods, things started to go wrong. The reason here is that Seaside prefixes its generated relative URLs with /seaside. So I simply added the following:

    # re-write all requests starting with seaside - these links are exposed
# by the application internally. Alternatively, you can
# re-configure Seaside to change the urls it generates.
RewriteRule ^/seaside/(.*)$ http://localhost:8080/seaside/$1 [P,L]

into the virtualhost directive, and re-started Apache once more. This time it all worked like clockwork.

Twenty minutes of not very challenging experimentation, and I now have Apache working as my front-end webserver. That means, all the handling of static files can be done outside of Squeak, I can configure my responses if the Squeak images crashes, I get access to all of Apache's features such as logging, SSL etc. Not a bad half-hour's work really.

Tuesday 8 July 2008

Digging into the functionality behind morphs

Someone on the Seaside mailing list asked how to find out the meanings of all the icons against each method in the browser. The answer is quite interesting, as it helps you understand the importance of the "live environment" that Squeaks gives you.

Looking at a typical class, you'll see that many method definitions have a little icon by the name:

(nb alt- and cmd- prefixes used below may need to be changed depending on your image preferences).

In order to work out why things happen in morphs, you can easily access the underlying code. In this case, alt-click on the offending item repeatedly to bring up halo menus for more targeted morphs. The closest in you get is OBLazyList - if you go too far, you'll notice that you cycle round to the original morph again. Notice in this case that this morph is much bigger than its containing morph, and so much of its content is hidden from the user:

When you're on the morph you're after, you can then find out more about it by clicking on the "spanner" (=debug) icon, and selecting "browse morph class", which will open a browser on OBLazyListMorph.

Look in the "drawing" methods of this class, and you'll get your first clue in #display:atRow:on: which selects and draws the icons with the help of OBMorphicIcons class.

Selecting the name OBMorphicIcons and pressing Cmd-B will bring up a browser on that class, and you'll see that each icon is defined by an instance method. Click on one of these (I chose #arrowUpAndDown because that looks pretty likely to be unique to this usage), and press Cmd-n to find its senders.

Bingo! OBInheritanceFilter method #icon:forNode: decides which icon to display based on a number of tests, including a test for #halt: messages that puts a red flag against a method.

You'll also see that this code accidentally red-flags itself because it fails to distinguish between #halt: used as a symbol rather than as a message.

Friday 23 May 2008

Ideal language for the JVM?

Charles Nutter (JRuby specialist at Sun) said recently:

"The CLR kind of grew up on the static language side of the world, from the C++ folks, whereas Java grew up on the Smalltalk side of the world. It was actually grown out of a Smalltalk VM. So what we have on the JVM, oddly enough, is a dynamic language runtime under the covers powering a statically typed language. The work that we are doing in future versions of the JVM and the work that we're doing with JRuby and the current set of dynamic languages is essentially exposing that capability to run dynamic languages extremely well to all of the different language implementations."

Makes you wonder what would be the ideal dynamic language to run on the JVM, doesn't it?

Thursday 22 May 2008

Scripting with Smalltalk - updated

My post yesterday attracted more attention than I expected, with Paolo Bonzini and Randal Schwartz both being able to make out the code well enough to comment on it. Paolo was able to identify a number of improvements to the code, both in terms of identifying more appropriate approaches, and in identifying where the code was spending its time. As a result, here's a much faster version. It's also noticeably shorter at 31 lines excluding spaces and comments, but it's still very wide.

This version also uses a few more methods not found in core Squeak including #fold: #gather: #copyReplaceFrom:to: .

As Randal pointed out, I'm monkey-patching core classes with gay abandon, and subclassing would probably be safer, though the over-ride of #at: that (rightly) alarmed him is now gone.


"Inspired by http://norvig.com/spell-correct.html"
"s = SpellCheck new. s initialize. s correct: 'misplet'"
Collection extend [
    ifEmpty: block [ self isEmpty ifTrue: [ ^ block value].  ^ self ] "not in gst by default" 
    maxUsing: block [ ^ self fold: [ :a :b | ((block value: a) > (block value: b)) ifTrue: [ a ] ifFalse: [ b ] ] ] ]

String extend [
    swapAt: i [ ^ self copyReplaceFrom: i+1 to: i+2 with: {self at: i+2. self at: i+1} ]
    removeAt: i [ ^ self copyReplaceFrom: i+1 to: i+1 with: #() ]
    insert: l at: i [ ^ self copyReplaceFrom: i+1 to:i with: {l} ]
    replace: l at: i [ ^ (self copy) at: i put: l; yourself ] 
    findWords [ ^ (self asString tokenize: '[^a-zA-Z]+') collect: [ :each | each asLowercase ] ] ]

Object subclass: SpellCheck [ 
    | nwords alphabet |
    initialize [ | lines |
        lines := (File name: 'westminster.txt') contents.
        nwords := lines findWords asBag.
        alphabet := 'abcdefhgijklmnopqrstuvwxyz' asArray ]

    edits1: word [ | n |
        n := word size.
        ^ Array join: {
            0 to: (n - 2) collect: [ :i | word swapAt: i ].
            1 to: (n - 1) collect: [ :i | word removeAt: i ].
            alphabet gather: [ :letter | 1 to: n collect: [ :i | word replace: letter at: i ] ].
            alphabet gather: [ :letter | 0 to: n collect: [ :i | word insert: letter at: i ] ] } ]

    knownEdits2: word [ 
        ^ (self edits1: word) gather: [ :e1 |
            self known: (self edits1: e1) ] ]
 
    known: words [ ^ ( words select: [ :each | nwords includes: each ] ) ]
 
    correct: word [ | candidates |
        candidates := (self known: {word}) ifEmpty: [ 
            (self known: (self edits1: word)) ifEmpty: [
                (self knownEdits2: word) ifEmpty: [ {word} ] ] ].
        ^ candidates maxUsing: [ :d | nwords occurrencesOf: d ] ]
]

Wednesday 21 May 2008

Scripting with Smalltalk

When I saw this post by Peter Norvig, and especially the lines-of-code comparison towards the end of the page, I thought it would be interesting to see how Smalltalk compared. Given the line-of-code metric, this was an ideal chance for me to play with GNU Smalltalk, which positions itself as a scripting-friendly Smalltalk. It proved to be quite a straightforward exercise, and the code came out surprisingly compactly (but quite wide!):


"Inspired by http://norvig.com/spell-correct.html"
"s = SpellCheck new. s initialize. s correct: 'misplet'"
Dictionary extend [
   at: key [ ^ self at: key ifAbsent: [ 1 ] ] "make it a defaulting at:"
   incrAt: key [ self at: key put: ((self at: key) + 1) ] "increments value" ]

Collection extend [
   ifEmpty: block [ self isEmpty ifTrue: [ ^ block value].  ^ self ] "not in gst by default"
   bestUsing: d [ ^ self inject: 'xxx' into: [ :sofar :this | ((d at: sofar) > (d at: this)) ifTrue: [ sofar ] ifFalse: [ this ] ] ] ]

String extend [
   swapAt: i [ ^ (self copyFrom: 1 to: i), (self at: (i+2)) asString, (self at: (i+1)) asString, (self copyFrom: (i+3) to: self size) ]
   removeAt: i [ ^ (self copyFrom: 1 to: i), (self copyFrom: (i+2) to: self size) ]
   insert: l at: i [ ^ (self copyFrom: 1 to: i), l asString, (self copyFrom: (i+1) to: self size) ]
   replace: l at: i [ ^ (self copy) at: i put: l; yourself ]
   findWords [ ^ (self asString tokenize: '[^a-zA-Z]+') collect: [ :each | each asLowercase ] ] ]

Object subclass: SpellCheck [
   | nwords alphabet |
   initialize [ | lines |
        lines := (File name: 'bigtext.txt') contents.
        nwords := self train: lines findWords.
        alphabet := 'abcdefhgijklmnopqrstuvwxyz' ]

   train: features [ | model |
        model := Dictionary new.
        features do: [ :each | model incrAt: each asLowercase ].
        ^ model ]

   edits1: word [ | s n |
        s := Set new.
        n := word size.
        0 to: (n - 2) do: [ :i | s add: (word swapAt: i) ].
        1 to: (n - 1) do: [ :i | s add: (word removeAt: i) ].
        alphabet do: [ :letter |
             1 to: n do: [ :i | s add: (word replace: letter at: i) ].
             0 to: n do: [ :i | s add: (word insert: letter at: i) ] ].
        ^ s asArray ]

   known_edits2: word [ | s |
        s := Set new.
        (self edits1: word) do: [ :e1 |
           (self edits1: e1) do: [ :e2 |
              (nwords keys includes: e2) ifTrue: [ s add: e2 ] ] ].
        ^ s asArray ]

   known: words [ ^ ( words select: [ :each | nwords keys includes: each ] ) asSet asArray ]

   correct: word [ | candidates |
     candidates := (self known: {word}) ifEmpty: [
         (self known: (self edits1: word)) ifEmpty: [
              (self known_edits2: word) ifEmpty: [ {word} ] ] ].
     ^ candidates bestUsing: nwords ]
]

54 lines of not very idiomatic Smalltalk including blank lines and comments (such as they are). More than twice as many lines as Peter Norvig's Python solution, but 1/6th of the size of the Java solution!

It felt very strange - and frustrating - to be developing in Smalltalk without the full support of the traditional environment, especially as I was trying to pick up the subtle differences from Squeak Smalltalk, but the ability to 'extend' the core classes from within the scripts means that it will be very easy to be able to hack out scripts using this tool.

Sunday 11 May 2008

Using OpenDBX with Squeak

A team of students from UTN (National Technological University in Argentina) co-ordinated by Estaban Lorenzano has just been doing some work on SqueakDBX, a package to allow Squeak to access OpenDBX functionality, which gives a lighter-weight alternative to ODBC for connecting to databases including Firebird, Interbase, MS SQL Server, MySQL, Oracle, PostgreSQL, SQLite, SQLite3 and Sybase.

This uses the FFI (Foreign Function Interface) package (available through Package Universe), and requires access to the libopendbx library. Now I had a bit of fun and games working out how to do this on Mac OS X, so here's the magic that worked for me.

Firstly, I was interested in using PostgreSQL to compare with the Squeak postgres support which is already installed on my machine. I downloaded the OpenDBX source -- it's a UNIX tool, so distributed as source needing compilation -- you open a Terminal and change into the newly downloaded directory, and execute configure command as directed by the README: ./configure --with-backends="sqlite3 pgsql", but in my case this barfed because it couldn't find my postgres libraries. This was solved by manually pointing to the postgres library and include directory, as follows:

CPPFLAGS="-I/usr/local/pgsql/include/" LDFLAGS="-L/usr/local/pgsql/lib/" ./configure --with-backends="sqlite3 pgsql"

You'll see that I also set things up for SQLite3 as it's installed by default on OS X, and is increasingly being used to hold configuration files in OS X and Firefox. I know that there's a SQLite3 package for Squeak already, but I thought it would be interesting to compare the two.

Anyway, the configure stage now worked perfectly, and sudo make install installed the libraries for me. All I needed to do now was make Squeak FFI see the libraries. Ha! After a lot of poking about, I found this email from Jon McIntosh, which advised that creating a symbolic link to the library "in the right place is helpful". Thanks for the pointer John, but where's the right place???

Well, after a bit of experimentation, I found that I needed to create the link in the Resources directory of the SqueakVM package:

cd /Applications/Squeak/Squeak 3.8.18beta1U.app/Contents/Resources
ln -s /usr/local/lib/libopendbx.dylib opendbx

And that's it. I'm now able to access my PostgreSQL database with code like this:

conn := DBXConnection new
backend: DBXBackend postgresql;
  connect: '127.0.0.1' port: 'x';
  open: 'x' name: 'x' password: 'x' method: 0.
  resultSet := conn query: 'select * from airline'.
DBXTranscript show: resultSet.
conn disconnect.
Smalltalk garbageCollect.

It's still very early days for the project, and it is not yet production-ready, so I've not tried doing anything much with it, but I notice that Esteban and colleagues appear to be building in GLORP support from the outset, so it's worth keeping an eye on this project.

Thursday 13 March 2008

Snippet: how to trigger events on a checkbox in Seaside

Richard Eng was having trouble using a checkbox to trigger a change in the contents of a textarea.

Lukas responded that:

To trigger a callback of a form element, you need to specify this form element with #triggerFormElement:. As the comment of this method says, this does not work for multi-select lists and checkboxes, as those two form elements internally depend on another hidden form element. So for your checkbox you need to trigger the whole form. Give it an (unique) id and use #triggerForm: with this id.

(my italics)

Gerhard Obermann kindly provided a working sample:


renderContentOn: html
 "requires 'set' to be defined as an instvar"
 | formId |
 formId := html nextId.
 html form id: formId;
     with: [
         html checkbox value: false; callback: [ :v | set := v ];
             onClick: (
                 html updater id: 'text';
                     triggerForm: formId;
                     callback: [ :r |
                         set
                             ifTrue: [ r text: 'My address' ]
                             ifFalse: [ r text: '' ]]).

         html textArea id: 'text';
             with: [ html text: 'Empty' ]].

This was a great help to Richard, prompting him to ask why there isn't more information like this out there - much of what he found while teaching himself Squeak and Seaside was out-of-date and wrong. Maybe a snippets library would be a good starting point?

Dreams of a Scorpion