Friday 23 May 2008

Ideal language for the JVM?

Charles Nutter (JRuby specialist at Sun) said recently:

"The CLR kind of grew up on the static language side of the world, from the C++ folks, whereas Java grew up on the Smalltalk side of the world. It was actually grown out of a Smalltalk VM. So what we have on the JVM, oddly enough, is a dynamic language runtime under the covers powering a statically typed language. The work that we are doing in future versions of the JVM and the work that we're doing with JRuby and the current set of dynamic languages is essentially exposing that capability to run dynamic languages extremely well to all of the different language implementations."

Makes you wonder what would be the ideal dynamic language to run on the JVM, doesn't it?

Thursday 22 May 2008

Scripting with Smalltalk - updated

My post yesterday attracted more attention than I expected, with Paolo Bonzini and Randal Schwartz both being able to make out the code well enough to comment on it. Paolo was able to identify a number of improvements to the code, both in terms of identifying more appropriate approaches, and in identifying where the code was spending its time. As a result, here's a much faster version. It's also noticeably shorter at 31 lines excluding spaces and comments, but it's still very wide.

This version also uses a few more methods not found in core Squeak including #fold: #gather: #copyReplaceFrom:to: .

As Randal pointed out, I'm monkey-patching core classes with gay abandon, and subclassing would probably be safer, though the over-ride of #at: that (rightly) alarmed him is now gone.


"Inspired by http://norvig.com/spell-correct.html"
"s = SpellCheck new. s initialize. s correct: 'misplet'"
Collection extend [
ifEmpty: block [ self isEmpty ifTrue: [ ^ block value]. ^ self ] "not in gst by default"
maxUsing: block [ ^ self fold: [ :a :b | ((block value: a) > (block value: b)) ifTrue: [ a ] ifFalse: [ b ] ] ] ]

String extend [
swapAt: i [ ^ self copyReplaceFrom: i+1 to: i+2 with: {self at: i+2. self at: i+1} ]
removeAt: i [ ^ self copyReplaceFrom: i+1 to: i+1 with: #() ]
insert: l at: i [ ^ self copyReplaceFrom: i+1 to:i with: {l} ]
replace: l at: i [ ^ (self copy) at: i put: l; yourself ]
findWords [ ^ (self asString tokenize: '[^a-zA-Z]+') collect: [ :each | each asLowercase ] ] ]

Object subclass: SpellCheck [
| nwords alphabet |
initialize [ | lines |
lines := (File name: 'westminster.txt') contents.
nwords := lines findWords asBag.
alphabet := 'abcdefhgijklmnopqrstuvwxyz' asArray ]

edits1: word [ | n |
n := word size.
^ Array join: {
0 to: (n - 2) collect: [ :i | word swapAt: i ].
1 to: (n - 1) collect: [ :i | word removeAt: i ].
alphabet gather: [ :letter | 1 to: n collect: [ :i | word replace: letter at: i ] ].
alphabet gather: [ :letter | 0 to: n collect: [ :i | word insert: letter at: i ] ] } ]

knownEdits2: word [
^ (self edits1: word) gather: [ :e1 |
self known: (self edits1: e1) ] ]

known: words [ ^ ( words select: [ :each | nwords includes: each ] ) ]

correct: word [ | candidates |
candidates := (self known: {word}) ifEmpty: [
(self known: (self edits1: word)) ifEmpty: [
(self knownEdits2: word) ifEmpty: [ {word} ] ] ].
^ candidates maxUsing: [ :d | nwords occurrencesOf: d ] ]
]

Wednesday 21 May 2008

Scripting with Smalltalk

When I saw this post by Peter Norvig, and especially the lines-of-code comparison towards the end of the page, I thought it would be interesting to see how Smalltalk compared. Given the line-of-code metric, this was an ideal chance for me to play with GNU Smalltalk, which positions itself as a scripting-friendly Smalltalk. It proved to be quite a straightforward exercise, and the code came out surprisingly compactly (but quite wide!):

"Inspired by http://norvig.com/spell-correct.html"
"s = SpellCheck new. s initialize. s correct: 'misplet'"
Dictionary extend [
at: key [ ^ self at: key ifAbsent: [ 1 ] ] "make it a defaulting at:"
incrAt: key [ self at: key put: ((self at: key) + 1) ] "increments value" ]

Collection extend [
ifEmpty: block [ self isEmpty ifTrue: [ ^ block value]. ^ self ] "not in gst by default"
bestUsing: d [ ^ self inject: 'xxx' into: [ :sofar :this | ((d at: sofar) > (d at: this)) ifTrue: [ sofar ] ifFalse: [ this ] ] ] ]

String extend [
swapAt: i [ ^ (self copyFrom: 1 to: i), (self at: (i+2)) asString, (self at: (i+1)) asString, (self copyFrom: (i+3) to: self size) ]
removeAt: i [ ^ (self copyFrom: 1 to: i), (self copyFrom: (i+2) to: self size) ]
insert: l at: i [ ^ (self copyFrom: 1 to: i), l asString, (self copyFrom: (i+1) to: self size) ]
replace: l at: i [ ^ (self copy) at: i put: l; yourself ]
findWords [ ^ (self asString tokenize: '[^a-zA-Z]+') collect: [ :each | each asLowercase ] ] ]

Object subclass: SpellCheck [
| nwords alphabet |
initialize [ | lines |
lines := (File name: 'bigtext.txt') contents.
nwords := self train: lines findWords.
alphabet := 'abcdefhgijklmnopqrstuvwxyz' ]

train: features [ | model |
model := Dictionary new.
features do: [ :each | model incrAt: each asLowercase ].
^ model ]

edits1: word [ | s n |
s := Set new.
n := word size.
0 to: (n - 2) do: [ :i | s add: (word swapAt: i) ].
1 to: (n - 1) do: [ :i | s add: (word removeAt: i) ].
alphabet do: [ :letter |
1 to: n do: [ :i | s add: (word replace: letter at: i) ].
0 to: n do: [ :i | s add: (word insert: letter at: i) ] ].
^ s asArray ]

known_edits2: word [ | s |
s := Set new.
(self edits1: word) do: [ :e1 |
(self edits1: e1) do: [ :e2 |
(nwords keys includes: e2) ifTrue: [ s add: e2 ] ] ].
^ s asArray ]

known: words [ ^ ( words select: [ :each | nwords keys includes: each ] ) asSet asArray ]

correct: word [ | candidates |
candidates := (self known: {word}) ifEmpty: [
(self known: (self edits1: word)) ifEmpty: [
(self known_edits2: word) ifEmpty: [ {word} ] ] ].
^ candidates bestUsing: nwords ]
]

54 lines of not very idiomatic Smalltalk including blank lines and comments (such as they are). More than twice as many lines as Peter Norvig's Python solution, but 1/6th of the size of the Java solution!

It felt very strange - and frustrating - to be developing in Smalltalk without the full support of the traditional environment, especially as I was trying to pick up the subtle differences from Squeak Smalltalk, but the ability to 'extend' the core classes from within the scripts means that it will be very easy to be able to hack out scripts using this tool.

Sunday 11 May 2008

Using OpenDBX with Squeak

A team of students from UTN (National Technological University in Argentina) co-ordinated by Estaban Lorenzano has just been doing some work on SqueakDBX, a package to allow Squeak to access OpenDBX functionality, which gives a lighter-weight alternative to ODBC for connecting to databases including Firebird, Interbase, MS SQL Server, MySQL, Oracle, PostgreSQL, SQLite, SQLite3 and Sybase.

This uses the FFI (Foreign Function Interface) package (available through Package Universe), and requires access to the libopendbx library. Now I had a bit of fun and games working out how to do this on Mac OS X, so here's the magic that worked for me.

Firstly, I was interested in using PostgreSQL to compare with the Squeak postgres support which is already installed on my machine. I downloaded the OpenDBX source -- it's a UNIX tool, so distributed as source needing compilation -- you open a Terminal and change into the newly downloaded directory, and execute configure command as directed by the README: ./configure --with-backends="sqlite3 pgsql", but in my case this barfed because it couldn't find my postgres libraries. This was solved by manually pointing to the postgres library and include directory, as follows:

CPPFLAGS="-I/usr/local/pgsql/include/" LDFLAGS="-L/usr/local/pgsql/lib/" ./configure --with-backends="sqlite3 pgsql"

You'll see that I also set things up for SQLite3 as it's installed by default on OS X, and is increasingly being used to hold configuration files in OS X and Firefox. I know that there's a SQLite3 package for Squeak already, but I thought it would be interesting to compare the two.

Anyway, the configure stage now worked perfectly, and sudo make install installed the libraries for me. All I needed to do now was make Squeak FFI see the libraries. Ha! After a lot of poking about, I found this email from Jon McIntosh, which advised that creating a symbolic link to the library "in the right place is helpful". Thanks for the pointer John, but where's the right place???

Well, after a bit of experimentation, I found that I needed to create the link in the Resources directory of  the SqueakVM package:

cd /Applications/Squeak/Squeak 3.8.18beta1U.app/Contents/Resources
ln -s /usr/local/lib/libopendbx.dylib opendbx

And that's it. I'm now able to access my PostgreSQL database with code like this:

conn := DBXConnection new
backend: DBXBackend postgresql;
  connect: '127.0.0.1' port: 'x';
  open: 'x' name: 'x' password: 'x' method: 0.
  resultSet := conn query: 'select * from airline'.
DBXTranscript show: resultSet.
conn disconnect.
Smalltalk garbageCollect.

It's still very early days for the project, and it is not yet production-ready, so I've not tried doing anything much with it, but I notice that Esteban and colleagues appear to be building in GLORP support from the outset, so it's worth keeping an eye on this project.