Dreams of a Scorpion: Scripting with Smalltalk

My post yesterday attracted more attention than I expected, with Paolo Bonzini and Randal Schwartz both being able to make out the code well enough to comment on it. Paolo was able to identify a number of improvements to the code, both in terms of identifying more appropriate approaches, and in identifying where the code was spending its time. As a result, here's a much faster version. It's also noticeably shorter at 31 lines excluding spaces and comments, but it's still very wide.

This version also uses a few more methods not found in core Squeak including #fold: #gather: #copyReplaceFrom:to: .

As Randal pointed out, I'm monkey-patching core classes with gay abandon, and subclassing would probably be safer, though the over-ride of #at: that (rightly) alarmed him is now gone.


"Inspired by http://norvig.com/spell-correct.html"
"s = SpellCheck new. s initialize. s correct: 'misplet'"
Collection extend [
    ifEmpty: block [ self isEmpty ifTrue: [ ^ block value].  ^ self ] "not in gst by default" 
    maxUsing: block [ ^ self fold: [ :a :b | ((block value: a) > (block value: b)) ifTrue: [ a ] ifFalse: [ b ] ] ] ]

String extend [
    swapAt: i [ ^ self copyReplaceFrom: i+1 to: i+2 with: {self at: i+2. self at: i+1} ]
    removeAt: i [ ^ self copyReplaceFrom: i+1 to: i+1 with: #() ]
    insert: l at: i [ ^ self copyReplaceFrom: i+1 to:i with: {l} ]
    replace: l at: i [ ^ (self copy) at: i put: l; yourself ] 
    findWords [ ^ (self asString tokenize: '[^a-zA-Z]+') collect: [ :each | each asLowercase ] ] ]

Object subclass: SpellCheck [ 
    | nwords alphabet |
    initialize [ | lines |
        lines := (File name: 'westminster.txt') contents.
        nwords := lines findWords asBag.
        alphabet := 'abcdefhgijklmnopqrstuvwxyz' asArray ]

    edits1: word [ | n |
        n := word size.
        ^ Array join: {
            0 to: (n - 2) collect: [ :i | word swapAt: i ].
            1 to: (n - 1) collect: [ :i | word removeAt: i ].
            alphabet gather: [ :letter | 1 to: n collect: [ :i | word replace: letter at: i ] ].
            alphabet gather: [ :letter | 0 to: n collect: [ :i | word insert: letter at: i ] ] } ]

    knownEdits2: word [ 
        ^ (self edits1: word) gather: [ :e1 |
            self known: (self edits1: e1) ] ]
 
    known: words [ ^ ( words select: [ :each | nwords includes: each ] ) ]
 
    correct: word [ | candidates |
        candidates := (self known: {word}) ifEmpty: [ 
            (self known: (self edits1: word)) ifEmpty: [
                (self knownEdits2: word) ifEmpty: [ {word} ] ] ].
        ^ candidates maxUsing: [ :d | nwords occurrencesOf: d ] ]
]

Dreams of a Scorpion

Thursday, 22 May 2008

Scripting with Smalltalk - updated

No comments:

Squeak & Smalltalk blogs

Other Blogs

Blog Archive

About Me