Dreams of a Scorpion

Friday, 15 May 2009

Nice quote about Smalltalk

"The thing that I really hate about Smalltalk though, is the fact that every time I wish C++ or Java did something differently it turns out that Smalltalk does it the way I want it to. I've never even used Smalltalk on a real project. I just learned it so that I could read source code, now I keep running into things that would be easier if I were using it. It's really annoying."

Attributed to Phil Goodwin, but I've no clue as to the origin.

Saturday, 16 August 2008

Caching static web content with Seaside

Given that it looks as though ircbrowse has gone away permanently, I decided to write my own little client for viewing the logs of the #squeak IRC channel. Fortunately most of the hard work had already been done at http://tunes.org/~nef/logs/squeak/, but as the site itself notes: "these logs are purposely "raw" and are intended to be parsed/reformated/wrapped before viewing."

So I put together a client in Squeak and Seaside, currently viewable at http://mykdavies.seasidehosting.st/seaside/my/irclog to do that for me. While I was doing so, I got a bit carried away and added some extra features:

All lines can be colourised based on the author's nick
It attempts to recognise HTTP URLs and converts them to links
Each message timestamp is an anchor, so you can link to a given message
At certain times there's a lot of chat in Spanish on the channel, so I added in-page translation capabilities using the Google Ajax Translate API.
Key session parameters are added to the URL, so that they are maintained even when the session has expired.

Now, once you've added all these great features, you end up with a massive page that's complex to generate, but once it is generated most of the content remains static. There's not a great deal that I can do about the initial generation (except get rid of some of the functionality), but surely I can cache that content?

I searched the Seaside mailing lists, but generally responses tended to point out that caching of static content should be handed off to the front-end web server. There were two problems here -- first, the page wasn't going to be totally static, just the content from a single component. Second, I wanted to put this application on seasidehosting.st, so I wouldn't have this option available to me anyway.

They say that once you get a hammer, every problem looks like a nail, so I wrote a Seaside solution: a simple caching component using the decorator pattern that checks to see if the content requested is already in the cache, and if not it asks the owner component to generate it. This proved to be very effective; the profiler showed that a non-cached page was taking around 2-3 seconds to generate, but with the HTML cached, this dropped to a handful of milliseconds.

Given that this approach can substantially reduce the server load even when the page as a whole isn't static, I was surprised not to find something like this in Seaside already, is it too trivial? -- please let me know in the comments if I missed something obvious.

There's nothing particularly clever about the class once you've worked out that all the html that has been generated at any point in time is accessible via html context document stream. However, this approach requires A WARNING -- no session information must be used within the cached components -- no forms, no dynamic links, no Scriptaculous functions. Also, be careful about use of html nextId in the cached component - there is no guarantee that such IDs will be unique when incorporated into a document from the cache.

Anyway here's how it works. The component in question is wrapped in an application-specific subclass of a caching decorator component that:

Maintains a cache dictionary as a class instance variable.
Requires the owner class to implement a #key method that provides a unique key for each page, and also an inst-var called shouldCache (with associated accessors).
Implements #renderContentOn: that

Checks if the requested page is in the dictionary (using the #key method).
If so, stream the cached content to the current renderer and return.
If not, notes the current html context document stream position.
Sends #renderContentOn: to the owner.
In this method, the owner then renders as normal, and resets shouldCache if it doesn't want the current page to be cached (ie if it is still subject to change).

Control then returns to the decorator. If shouldCache is still set, this checks the html context document stream to find all the content that the child added, and copies it into its own cache, and exits.

And that's it!

Thursday, 17 July 2008

My first video: creating a Hello World class in Squeak

Given the recent push to get Squeak video tutorials available, I decided to have a go myself. I took as my starting point my post from a few months back, intended to act as a quick introduction to developers coming to Squeak for the first time.

My first problem was to find a good screen capture utility. Unfortunately, Wink isn't available for OS X, but a bit of searching uncovered Snapz Pro X. Despite the terrible name, it's a very nice piece of software that makes recording video and screenshots very easy. It comes with a 14-day fully-functional trial, so if I get a taste for this, I may end up having to cough up the $69 soon. If anyone knows of other software I could use, please let me know.

You can view the video at vimeo.com, or find it in the new Squeak Smalltalk group that Randal Schwartz has set up. Have a look at the video and let me know what you think: too fast, too slow, too much like Ricky Gervais, whatever.

Monday, 14 July 2008

Using Apache as a front-end for Seaside

I'll admit it, configuring Apache scares the bejeezus out of me. The documentation seems to be so focused on the trees, that the wood becomes an impenetrable, gloomy forest. I guess I'm not alone in this, which makes Ramon Leon's posts on configuring Apache with Seaside(1, 2, 3) so useful.

Despite this, I've still steered clear of going near Apache, until Ramon posted a sample extract of configuration text. Now, cut-and-paste is something I can do, so I decided to give it a go.

I'm on Mac OS X, so Apache is installed and running by default. Despite my earlier protestations, I have played with Apache before, so I knew that httpd.conf was the key file to control how Apache runs. A bit of poking about in man files uncovered the location of the file I needed: /private/etc/apache2/http.conf.

I opened a Terminal and cd'ed to /private/etc/apache2/ where I could execute sudo vi httpd.conf which prompted me to enter my password to edit this administrator file, and then allowed me to start hacking. I wanted to leave my existing configuration as untouched as possible, so below the line:

Listen 80

I added a new line:

Listen 81

which would start Apache listening on port 81 as well as port 80. I could then use port 81 for my experimentation.

Typing sudo /usr/sbin/apachectl restart caused Apache to restart, hopefully loading my change.

I then tried browsing http://localhost:81/

Success (so far) - Apache is trying to deal with my request, and using its default handler.

Now, I'd noticed an interesting line at the bottom of httpd.conf:

Include /private/etc/apache2/other/*.conf

which meant that I could make any other changes in a separate file. So sudo vi other/seaside.conf opened a new .conf file, into which I typed:

virtualhost>  
    #ServerName yoursite.com
    #DocumentRoot /var/www/myExamplePath
    RewriteEngine On
    ProxyRequests Off
    ProxyPreserveHost On
    UseCanonicalName Off
    # if the path doesn't exist, rewrite it to be a Seaside file ref
    RewriteRule ^/seaside/files(.*)$ http://localhost:8080/seaside/files$1 [P,L]
    RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f

    # redirect all requests to seaside - configure this as required
    RewriteRule ^/(.*)$ http://localhost:8080/seaside/$1 [P,L]

/virtualhost>

Which (when you put back the angle brackets that Blogger eats on the first and last lines) is taken directly from Ramon's post. It does two useful things. It takes any requests coming in to port 81, and first checks to see if you're requesting a physical file, and if so, serves it up (useful for the static parts of your site). If not, it passes them on to Seaside on the appropriate port, adding the standard /seaside prefix (so this is hidden from the outside world, which is a nice touch). Restarting Apache again and this time browsing http://localhost:81/examples/counter got me:

Again, good news. However, when I tried to navigate around, or use applications that made use of built-in #style and #script methods, things started to go wrong. The reason here is that Seaside prefixes its generated relative URLs with /seaside. So I simply added the following:

    # re-write all requests starting with seaside - these links are exposed
# by the application internally. Alternatively, you can
# re-configure Seaside to change the urls it generates.
RewriteRule ^/seaside/(.*)$ http://localhost:8080/seaside/$1 [P,L]

into the virtualhost directive, and re-started Apache once more. This time it all worked like clockwork.

Twenty minutes of not very challenging experimentation, and I now have Apache working as my front-end webserver. That means, all the handling of static files can be done outside of Squeak, I can configure my responses if the Squeak images crashes, I get access to all of Apache's features such as logging, SSL etc. Not a bad half-hour's work really.

Tuesday, 8 July 2008

Digging into the functionality behind morphs

Someone on the Seaside mailing list asked how to find out the meanings of all the icons against each method in the browser. The answer is quite interesting, as it helps you understand the importance of the "live environment" that Squeaks gives you.

Looking at a typical class, you'll see that many method definitions have a little icon by the name:

(nb alt- and cmd- prefixes used below may need to be changed depending on your image preferences).

In order to work out why things happen in morphs, you can easily access the underlying code. In this case, alt-click on the offending item repeatedly to bring up halo menus for more targeted morphs. The closest in you get is OBLazyList - if you go too far, you'll notice that you cycle round to the original morph again. Notice in this case that this morph is much bigger than its containing morph, and so much of its content is hidden from the user:

When you're on the morph you're after, you can then find out more about it by clicking on the "spanner" (=debug) icon, and selecting "browse morph class", which will open a browser on OBLazyListMorph.

Look in the "drawing" methods of this class, and you'll get your first clue in #display:atRow:on: which selects and draws the icons with the help of OBMorphicIcons class.

Selecting the name OBMorphicIcons and pressing Cmd-B will bring up a browser on that class, and you'll see that each icon is defined by an instance method. Click on one of these (I chose #arrowUpAndDown because that looks pretty likely to be unique to this usage), and press Cmd-n to find its senders.

Bingo! OBInheritanceFilter method #icon:forNode: decides which icon to display based on a number of tests, including a test for #halt: messages that puts a red flag against a method.

You'll also see that this code accidentally red-flags itself because it fails to distinguish between #halt: used as a symbol rather than as a message.

Friday, 23 May 2008

Ideal language for the JVM?

Charles Nutter (JRuby specialist at Sun) said recently:

"The CLR kind of grew up on the static language side of the world, from the C++ folks, whereas Java grew up on the Smalltalk side of the world. It was actually grown out of a Smalltalk VM. So what we have on the JVM, oddly enough, is a dynamic language runtime under the covers powering a statically typed language. The work that we are doing in future versions of the JVM and the work that we're doing with JRuby and the current set of dynamic languages is essentially exposing that capability to run dynamic languages extremely well to all of the different language implementations."

Makes you wonder what would be the ideal dynamic language to run on the JVM, doesn't it?

Thursday, 22 May 2008

Scripting with Smalltalk - updated

My post yesterday attracted more attention than I expected, with Paolo Bonzini and Randal Schwartz both being able to make out the code well enough to comment on it. Paolo was able to identify a number of improvements to the code, both in terms of identifying more appropriate approaches, and in identifying where the code was spending its time. As a result, here's a much faster version. It's also noticeably shorter at 31 lines excluding spaces and comments, but it's still very wide.

This version also uses a few more methods not found in core Squeak including #fold: #gather: #copyReplaceFrom:to: .

As Randal pointed out, I'm monkey-patching core classes with gay abandon, and subclassing would probably be safer, though the over-ride of #at: that (rightly) alarmed him is now gone.


"Inspired by http://norvig.com/spell-correct.html"
"s = SpellCheck new. s initialize. s correct: 'misplet'"
Collection extend [
    ifEmpty: block [ self isEmpty ifTrue: [ ^ block value].  ^ self ] "not in gst by default" 
    maxUsing: block [ ^ self fold: [ :a :b | ((block value: a) > (block value: b)) ifTrue: [ a ] ifFalse: [ b ] ] ] ]

String extend [
    swapAt: i [ ^ self copyReplaceFrom: i+1 to: i+2 with: {self at: i+2. self at: i+1} ]
    removeAt: i [ ^ self copyReplaceFrom: i+1 to: i+1 with: #() ]
    insert: l at: i [ ^ self copyReplaceFrom: i+1 to:i with: {l} ]
    replace: l at: i [ ^ (self copy) at: i put: l; yourself ] 
    findWords [ ^ (self asString tokenize: '[^a-zA-Z]+') collect: [ :each | each asLowercase ] ] ]

Object subclass: SpellCheck [ 
    | nwords alphabet |
    initialize [ | lines |
        lines := (File name: 'westminster.txt') contents.
        nwords := lines findWords asBag.
        alphabet := 'abcdefhgijklmnopqrstuvwxyz' asArray ]

    edits1: word [ | n |
        n := word size.
        ^ Array join: {
            0 to: (n - 2) collect: [ :i | word swapAt: i ].
            1 to: (n - 1) collect: [ :i | word removeAt: i ].
            alphabet gather: [ :letter | 1 to: n collect: [ :i | word replace: letter at: i ] ].
            alphabet gather: [ :letter | 0 to: n collect: [ :i | word insert: letter at: i ] ] } ]

    knownEdits2: word [ 
        ^ (self edits1: word) gather: [ :e1 |
            self known: (self edits1: e1) ] ]
 
    known: words [ ^ ( words select: [ :each | nwords includes: each ] ) ]
 
    correct: word [ | candidates |
        candidates := (self known: {word}) ifEmpty: [ 
            (self known: (self edits1: word)) ifEmpty: [
                (self knownEdits2: word) ifEmpty: [ {word} ] ] ].
        ^ candidates maxUsing: [ :d | nwords occurrencesOf: d ] ]
]