Fatvat

Monday, 15 June 2009

The General Problem Solver

Paradigms of Artificial Intelligence Programming has been sitting on my desk for a few months since I worked through the Eliza chapter, so time for me to brush it off and (finally) work my way through it.

The General Problem Solver is early example of computer AI. Herbert Simon made a speech at the 12th National Meeting of the Operations Research Society where he stated:

It is not my aim to surprise or shock you- if indeed that were possible in an age of nuclear fission and prospective-interplanetary travel. But the simplest way I can summarize the situation is to say that there are now in the world machines that think, that learn, and that create. Moreover, their ability to do these things is going to increase rapidly until - in a visible future -the range of problems they can handle will be coextensive with the range to which the human mind has been applied. (Heuristic Problem Solving: The Next Advance in Operations Research [PDF])

That's a pretty big claim that proves (unsurprisingly) to be slightly wide of the mark!

The GPS solution solves problems by gathering the following information:

The current state of the world

The available operations that transfer from one-state to the next

A goal that we desire to reach

States and goals are represented using symbols. We represent the initial state of the world as a set of symbols, for example, #{'Bored 'At Home'}, indicates that the current state is bored and at home and the goal condition might be #{'Happy 'Inebriated}. Given a series of operators, GPS will attempt to find the way there.

An operator alters the current states. An operator consists of an action (a friendly name for what is taking place), a set of preconditions and an add/remove list which affect the current state of the world. For example, the operation drink might be defined as {:action drink :preconditions #{'have-money} :add-list 'drunk :remove-list 'money}.

The complete code is shown below. Note that it has mutable state and that references are used to hold this. This is enforced by Clojure and is one of the key differences between Lisp and Clojure.

One thing that caught me out was contains? which checks the presence of an INDEX not a value. Hence, the contains-value? definition.


;;; Implementation of a simplified General Problem Solver
;;; http://en.wikipedia.org/wiki/General_Problem_Solver
;;; Based on code in "Paradigms of Artificial Intelligence Programming"
(ns uk.co.fatvat.gps
  (:use [clojure.set]))

(defstruct operation :action :preconditions :add-list :delete-list)

(defn make-op
  [action preconditions add-list delete-list]
  (struct operation action preconditions add-list delete-list))

(def *state* (ref nil))

(def *ops* (ref nil))

(defn contains-value?
  [coll val]
  (not (nil? (some (partial = val) coll))))

(defn appropriate?
  [goal operation]
  "An op is appropriate to a goal if it is in its add list"
  (contains-value? (:add-list operation) goal))

(defn apply-op
 [op]
 "Print a message and update state if op is applicable"
 (when (every? achieve (:preconditions op))
   (println "Executing: " (:action op))
   (dosync
    (alter *state* 
           (fn [s]
             (union 
              (difference s (:delete-list op))
              (:add-list op)))))))

(defn achieve
  [goal]
  "A goal is achieved if it already holds.  Or if there is an appropriate
   operation for it that is applicable"
  (or (contains-value? @*state* goal)
      (some apply-op (filter (fn [x] (appropriate? goal x))  @*ops*))))

(defn gps
  [state goals ops]
  "General Problem Solver: Achieve all goals using the operations available."
  (when (every? achieve goals) 'solved))

So having written all this code, how'd we use it? All we have to do is define the problem in terms that GPS can understand. The example below shows a common problem. I'm at home, with money and Facebook, however I'd much rather be at a bar and dancing the night away. However, I need to get to the bar and before I can dance I really need a drink.



(def example-operations
     [(make-op 'message-friends
               #{'have-facebook}
               #{'friends-emailed}
               #{'at-home})

      (make-op 'arrange-party
               #{'friends-emailed}
               #{'party-arranged}
               #{})

      (make-op 'get-on-bus
               #{'party-arranged}
               #{'at-club}
               #{})
      
      (make-op 'drink
               #{'have-drink}
               #{'had-drink}
               #{'have-drink})

      (make-op 'dance
               #{'had-drink}
               #{'dancing}
               #{})

      (make-op 'give-bar-money
               #{'have-money 'at-club}
               #{'bar-has-money 'have-drink}
               #{'have-money})])

(defn example []
  (dosync
   (ref-set *state* #{'at-home 'have-money 'have-facebook}) 
   (ref-set *ops* example-operations))
  (gps @*state*
       #{'dancing 'at-club}
       example-operations))

Running this at the REPL shows me the way to go:


  uk.co.fatvat.gps> (example)
  Executing:  message-friends
  Executing:  arrange-party
  Executing:  get-on-bus
  Executing:  give-bar-money
  Executing:  drink
  Executing:  dance
  solved

This isn't very general, but we'll save that till next time!

Labels: ai, clojure, paradigms of artificial intelligence programming

# posted by Jeff Foster : 23:55 0 Comments Links to this post

Sunday, 14 June 2009

The Shunting Yard Algorithm (2)

In my previous post, I looked at the Shunting Yard algorithm and implemented a *really* bad version which required you to escape parenthesis. Not very Lispy!

The code below makes a very small change which converts nested expressions (via lists) into RPN.



(defn shunting-yard
  ([expr] 
     (shunting-yard expr []))
  ([expr stack]
     (if (empty? expr)
       stack
       (let [token (first expr)
             remainder (rest expr)]
         (cond
           (number? token) (lazy-seq
                             (cons token (shunting-yard remainder stack)))

           (operator? token) (if (operator? (first stack))
                               (let [stackfn (partial op-compare token)]
                                 (lazy-seq
                                   (concat (take-while stackfn stack)
                                           (shunting-yard remainder
                                                          (cons token
                                                                (drop-while stackfn stack))))))
                               (shunting-yard remainder (cons token stack)))
           
           (sequential? token) (lazy-seq
                                 (concat (shunting-yard token) (shunting-yard remainder stack)))
           
           :else (assert false))))))

This is much nicer as now, as the tests below show, you can write expressions in a more natural way.



(deftest test-shuntingyard

  ;; Basic operators defined
  (is (= (pretty-print (shunting-yard [100 + 200])) '(100 200 +)))
  (is (= (pretty-print (shunting-yard [100 * 200])) '(100 200 *)))
  (is (= (pretty-print (shunting-yard [100 / 200])) '(100 200 /)))
  (is (= (pretty-print (shunting-yard [100 - 200])) '(100 200 -)))

  ;; Test some precedence rules
  (is (= (pretty-print (shunting-yard [100 * 200 + 300])) '(100 200 * 300 +)))
  (is (= (pretty-print (shunting-yard [100 + 200 * 300])) '(100 200 300 * +)))

  ;; Redundant Parenthesis
  (is (= (pretty-print (shunting-yard [[[[1 + 1]]]] '(1 1 +)))))
  (is (= (pretty-print (shunting-yard [1 + [1]] '(1 1 +)))))
  (is (= (pretty-print (shunting-yard [[1] + [1]] '(1 1 +)))))

  ;; Bracketing of expressions
  (is (= (pretty-print (shunting-yard [[1 + 2 + 3] + 4])) '(1 2 + 3 + 4 +))) 
  (is (= (pretty-print (shunting-yard [[1 + 2 * 3] + 4])) '(1 2 3 * + 4 +)))
  (is (= (pretty-print (shunting-yard [[4 + 5] / [7 - 8]])) '(4 5 + 7 8 - /))))

I was thinking (as was Mark) that I could write a macro in order to simplify writing expressions. The idea would be to write something to convert infix to prefix (rather than postfix as here).

On a slight tangent, in other Lisps, you can write a reader macro which would allow you to annotate an expression with a symbol that your reader would use to convert into the appropriate form. Clojure doesn't allow reader macros (there's a discussion here [see around 19:00] as to the reasons why).

Before we write this as a macro, we'll first need an evaluator loop. This is dead simple and just involves (again) a stack. This is a very basic version that assumes all operators have arity 2.



(defn rpn-evaluator
  ([expr]
     (rpn-evaluator expr []))
  ([expr stack]
     (if (empty? expr)
       (if (= 1 (count stack))
         (first stack)
         (assert false)) ;; unbalanced
       (let [f (first expr)]
         (cond
   
           (number? f) (recur (rest expr) (cons f stack))
           
           ;; Should look up arity of operator!
           (operator? f) (recur (rest expr) 
                                (cons
                                 (f (second stack) (first stack))
                                 (drop 2 stack))))))))

Having written this code, what does it buy me to write it as a macro? Not much! A macro is simply a translation of some source code to another lump of source code. In this case I'm better off writing a simple function, such as (def infix (comp rpn-evaluator shunting-yard)) to get the job done. The main point is that the macro doesn't have access to the values in the expression (unless they are constant). If the code was (infix [a + b]) then the macro can not evaluation a and b because the values will not be known at macro expansion time.

The one use of the macro that I can see would be for constant folding which I'm sure already occurs as part of the compilation process!

Programming Clojure gives a taxonomy of macros as:

Conditional Evaluation
Defining vars
Java interop
Postponing evaluation
Wrapping evaluation
Avoiding a lambda

None of these seem appropriate here to me, so I *think* writing it as a function is the right way to do it. I need to read On Lisp again since I think my macro understanding is a little off.

Labels: clojure, macro

# posted by Jeff Foster : 23:11 0 Comments Links to this post

Saturday, 13 June 2009

The Shunting Yard Algorithm

The Shunting Yard Algorithm converts expressions written in infix notation (i.e. with the operator between the operands) to Reverse Polish Notation (RPN). The algorithm was published by Dijkstra in November 1961 and the original paper can be found here.

RPN is often used in stack-based languages, such as Joy. In this respect, Clojure is to prefix notation as Joy is to stack-based notation.

The algorithm is simple and the description on Wikipedia is very complete; but it still was a great learning exercise for lazy sequences. The code implements a very simple expression parser based on a sequence of elements. Currently nesting expressions isn't supported directly, you have to use $ and $ to escape parenthesis.


;;; Shunting Yard Algorithm
(ns uk.co.fatvat.algorithms.shuntingyard
  (:use clojure.contrib.test-is))

(defstruct operator :name :precedence :associativity)

(def operators
     #{(struct operator + '1 'left)
       (struct operator - '1 'left)
       (struct operator * '2 'left)
       (struct operator / '2 'left)
       (struct operator '^ 3 'right)})

(defn lookup-operator
  [symb]
  (first (filter (fn [x] (= (:name x) symb)) operators)))

(defn operator?
  [op]
  (not (nil? (lookup-operator op))))

(defn op-compare
  [op1 op2]
  (let [operand1 (lookup-operator op1)
        operand2 (lookup-operator op2)]
    (or
     (and (= 'left (:associativity operand1)) (<= (:precedence operand1) (:precedence operand2)))
     (and (= 'right (:associativity operand1)) (<= (:precedence operand1) (:precedence operand2))))))

(defn- open-bracket? [op]
  (= op \())

(defn- close-bracket? [op]
  (= op \)))

(defn shunting-yard
  ([expr] 
     (shunting-yard expr []))
  ([expr stack]
     (if (empty? expr)
       (if-not (some (partial = \() stack)
         stack
         (assert false))
       (let [token (first expr)
             remainder (rest expr)]
         (cond

           (number? token) (lazy-seq
                             (cons token (shunting-yard remainder stack)))
           
           (operator? token) (if (operator? (first stack))
                               (lazy-seq
                                 (concat (take-while (partial op-compare token) stack)
                                         (shunting-yard remainder 
                                                        (cons token 
                                                              (drop-while (partial op-compare token) stack)))))
                               (shunting-yard remainder (cons token stack)))

           (open-bracket? token) (shunting-yard remainder (cons token stack))
           
           (close-bracket? token) (let [ops (take-while (comp not open-bracket?) stack)
                                        ret (drop-while (comp not open-bracket?) stack)]
                                    (assert (= (first ret) \())
                                    (lazy-seq
                                      (concat ops (shunting-yard remainder (rest ret)))))
                                       
           :else (assert false))))))

(defn pretty-print 
  [expr]
  (map (fn [x] 
         (cond 
           (= x +) '+
           (= x -) '-
           (= x /) '/
           (= x *) '*
           :else x)) expr))
     
(deftest test-shuntingyard
  (is (= (pretty-print (shunting-yard [100 + 200])) '(100 200 +)))
  (is (= (pretty-print (shunting-yard [100 * 200])) '(100 200 *)))
  (is (= (pretty-print (shunting-yard [100 / 200])) '(100 200 /)))
  (is (= (pretty-print (shunting-yard [100 - 200])) '(100 200 -)))
  (is (= (pretty-print (shunting-yard [4 + 5 + \( 6 * 7 \)] '(4 5 + 6 7 * +)))))
  (is (= (pretty-print (shunting-yard [\( \( \( 6 * 7 \) \) \)] '(6 7 *)))))
  (is (= (pretty-print (shunting-yard [3 + 4 * 5])) '(3 4 5 * +))))

Most of the entries on my blog could be considered Coding Dojo type tasks. I'm basically after problems I can solve in a few hours that I can learn something from!

Labels: clojure, coding dojo

# posted by Jeff Foster : 11:08 2 Comments Links to this post

Sunday, 7 June 2009

Learning some lessons from Programming Clojure

My copy of Programming Clojure by Stuart Halloway arrived over the weekend. It's a thoroughly enjoyable read so far and I've definitely learned a few things (for example, I hadn't realized that keywords can act as functions for maps, nor did I realize that a set is a function for determining membership. D'oh


  ;; A keyword is a function
  user> (:houses {:biscuits 4 :houses 7})
  7

  ;; A map is a function
  user> ({:biscuits 4 :houses 7}) :houses})
  7

  ;; A set is a function on membership
  user> (#{4 5 6} 7)
  nil

  ;; On success returns the element
  user> (#{4 5 6} 4)
  4

The section on Functional Programming was very interesting, especially with respect to laziness. Stuart presents "Six Rules of Clojure FP", which have a heavy emphasis on laziness. Item 5 is:

Know the sequence library. You can often write code without using recur or the lazy APIs at all

Looking back at my neural network code made me realize I'd reinvented several wheels there and the code could be significantly tidied by just using the standard libraries. Revisiting the sequence documentation helped remind me the various tools in the armoury, particularly iterate and reduce.

As a reminder, the two lumps of code in question are run-patterns and train-network.



(defn run-patterns
  [network samples expecteds]
  (if (empty? samples)
    network
    (let [expected (first expecteds)
          sample (first samples)
          [ah ao] (run-network sample network)
          updated-network (back-propagate expected sample [ah ao] network)]
      (recur updated-network (rest samples) (rest expecteds)))))

(defn train-network
  ([samples expected iterations]
     (train-network (create-network (count (first samples)) 
                                    num-hidden (count (first expected))) 
                    samples expected iterations))
  ([network samples expected iterations]
     (if (zero? iterations)
       network
       (recur (run-patterns network samples expected) samples expected (dec iterations)))))

Starting with run-patterns. This takes a sequence of values (the samples and expected values) and reduces it down to a single output (the network) using the supplied function. This can be refactored to be a simple reduce operation (sequence in, value out) and this simplifies the code significantly.


(defn run-patterns
  [network samples expecteds]
  (reduce 
   (fn [n expectations] 
     (let [[sample expected] expectations
           [ah ao] (run-network sample n)]
       (back-propagate expected sample [ah ao] n)))   
   network ;; initial value
   (map list samples expecteds)))

Next is train-network. At the moment the implementation is burdened by a hand-rolled looping construct for a specified number of iterations. What if instead of calculating a fixed number, we just calculated an infinite amount of trained neural networks (with laziness obviously!) and let the caller decide what value they'd like?

The new version of train-network drops the number of iterations and returns an infinite sequence of neural networks, each trained one more time than the last. The Clojure function, iterate does the job here:

Returns a lazy sequence of x, (f x), (f (f x)) etc. f must be free of side-effects

Once we refactor to use iterate we get the following. The definition of train-network is considerably simpler, and the only change to the example function is specifying the number of iterations to use.



(defn train-network
  ([samples expected]
     (train-network (create-network (count (first samples)) 
                                    num-hidden (count (first expected))) 
                    samples expected))
  ([network samples expected]
     (iterate (fn [n] (run-patterns n samples expected)) network)))

(defn example[]
  (let [x (nth (apply train-network xor-test-data) 100)]
    (println (first (run-network [0 0] x)) "-->" 0)
    (println (first (run-network [0 1] x)) "-->" 1)
    (println (first (run-network [1 0] x)) "-->" 1)
    (println (first (run-network [1 1] x)) "-->" 0)))

I've still got to read the remaining chapters of the book, but I'm looking forward to learning more. Definitely recommended.

Labels: clojure, programming clojure

# posted by Jeff Foster : 13:09 0 Comments Links to this post

Wednesday, 3 June 2009

Back propagation algorithm in Clojure

In my previous post I looked at the most basic type of neural network, the single layer perceptron. Next, we'll look at the multi-layer perceptron network. This is a more powerful class of neural network than the single layer because it can handle non linearly separable data (such as the XOR test case which failed last time).

The main difference in a multi-layer perceptron is that each neuron is only activated based on the results of an activation function. This makes use of the kernel trick that maps a non-linear problem into a higher dimensional problem which is linearly separable (see also support vector machines). The mathematics behind this is better explained here [PDF].

A typical activation function is tanh, shown below between -10 and 10 and generated using Wolfram Alpha

Graph of tanh(x) from -10 to 10 from Wolfram Alpha

So how do we model this network in a functional programming language like Clojure? A typical solution using an imperative/OO language (see here for a good Python implementation) would use arrays representing the weights and modify things in situ, but that's not very functional.

We start by defining some constants (including the activation-function and its derivation). bp-nn represents the network itself.


(def activation-function (fn [x] (Math/tanh x)))
(def activation-function-derivation (fn [y] (- 1.0 (* y y))))
(def num-hidden 2)

(def learning-rate 0.5)
(def momentum 0.1)

(defstruct bp-nn :weight-input :weight-output :change-input :change-output)

Next we create some simple initialization functions to create an initial neural network, together with some helper functions for iterating over matrices (which we'll model as lists of lists). Usually you'd use random weights to initialize things, but allowing fixed values makes testing possible.



(defn make-matrix
  [width height]
  "Create a matrix (list of lists)"
  (repeat width (repeat height 0)))

(defn matrix-map
  [m func]
  "Apply a function to every element in a matrix"
  (map (fn [x] (map func x)) m))

(defn rand-range 
  [l h]
  "Return a real number within the given range"
  (+ (rand (- h l)) l))

(defn create-network
  ([input hidden output]
     (create-network input hidden output true))
  ([input hidden output use-random-weights]
  "Create a network with the given number of input, hidden and output nodes"
  (let [i (inc input)
        w-func (if use-random-weights (fn [_] (rand-range -0.2 0.2)) (fn [_] 0.2))
        o-func (if use-random-weights (fn [_] (rand-range -2.0 2.0)) (fn [_] 2.0))]
    (struct bp-nn
            (matrix-map (make-matrix i hidden) w-func)
            (matrix-map (make-matrix hidden output) o-func)
            (make-matrix i hidden)
            (make-matrix hidden output)))))

The first thing we should do is run a pattern through the network and see what comes out the other end. We're not just interested in the output result, we want to know what happened at the hidden layer, so we return a vector of ao (activation output) and ah (activation hidden).

comp is functional composition. ((comp x y) 5) is the equivalent of (x (y 5)) so in the example below we add the numbers together and then apply the activation function. The nested map calls allow us to iterate over the elements in a matrix.



(defn apply-activation-function
  [w i]
  "Calculate the hidden activations"
  (apply map (comp activation-function +) (map (fn [col p] (map (partial * p) col)) w i)))

(defn run-network
  [pattern network]
  "Run the network with the given pattern and return the output and the hidden values"
  (assert (= (count pattern) (dec (count (network :weight-input)))))
  (let [p (cons 1 pattern)] ;; ensure bias term added
    (let [wi (network :weight-input)
          wo (network :weight-output)
          ah (apply-activation-function wi p)
          ao (apply-activation-function wo ah)]
      [ao ah])))

In order to perform backwards-propagation we need a couple of helper functions that work on matrices and vectors to calculate changes in the output that will be used to update the weights.

These helper functions are pretty sick - (no-one wants to read (map (partial reduce +) ...). A better design would probably be to introduce a proper matrix abstraction. There's the beginnings of one here but this is a bit too "Java" syntax heavy for more liking.



(defn calculate-hidden-deltas
  [wo ah od]
  "Calculate the error terms for the hidden"
  (let [errors (map (partial reduce +) (map (fn [x] (map * x od)) wo))] ;; Sick.
    (map (fn [h e] (* e (activation-function-derivation h))) ah errors)))
    
(defn update-weights
  [w deltas co ah]
  (let [x (map 
           (fn [wcol ccol h] 
             (map (fn [wrow crow od] 
                    (let [change (* od h)]
                      [(+ wrow (* learning-rate change) (* momentum crow)) change]))
                  wcol ccol deltas))
           w co ah)]
    [(matrix-map x first) (matrix-map x second)]))

I did warn you...

The next thing we need to implement is the back-propagation algorithm itself. This takes in more parameters than it needs to so that it can be tested standalone (it could be implemented as a local function using a closure to capture some of them). It returns an updated version of the neural network.



(defn back-propagate
  [target p results network]
  "Back propagate the results to adjust the rates"
  (assert (= (count target) (count (first (network :weight-output)))))
  (let [pattern (cons 1 p) ;; ensure bias term added
        ao (first results)
        ah (second results)
        error (map - target ao)
        wi (network :weight-input)
        wo (network :weight-output)
        ci (network :change-input)
        co (network :change-output)
        output-deltas (map (fn [o e] (* e (activation-function-derivation o))) ao error)
        hidden-deltas (calculate-hidden-deltas wo ah output-deltas)
        updated-output-weights (update-weights wo output-deltas co ah)
        updated-input-weights (update-weights wi hidden-deltas ci pattern)]
    (struct bp-nn
            (first  updated-input-weights)
            (first  updated-output-weights)
            (second updated-input-weights)
            (second updated-output-weights))
  ))

All that remains is to train the network. We need a set of samples with know results, together with a number of iterations to try. I've split these into run-patterns which runs through the patterns once, and train-network> which creates the initial network and runs it through the patterns the specified number of times.



(defn run-patterns
  [network samples expecteds]
  (if (empty? samples)
    network
    (let [expected (first expecteds)
          sample (first samples)
          [ah ao] (run-network sample network)
          updated-network (back-propagate expected sample [ah ao] network)]
      (recur updated-network (rest samples) (rest expecteds)))))

(defn train-network
  ([samples expected iterations]
     (train-network (create-network (count (first samples)) 
                                    num-hidden (count (first expected))) 
                    samples expected iterations))
  ([network samples expected iterations]
     (if (zero? iterations)
       network
       (recur (run-patterns network samples expected) samples expected (dec iterations)))))

So how well does it work in practise? Pretty damn good. It correctly converges after a few iterations (this time 100) and consistently gets the XOR test data set correct.


(defn example[]
  (let [x (apply train-network (conj xor-test-data 100))]
    (println (first (run-network [0 0] x)) "-->" 0)
    (println (first (run-network [0 1] x)) "-->" 1)
    (println (first (run-network [1 0] x)) "-->" 1)
    (println (first (run-network [1 1] x)) "-->" 0)))

  ;;; Example run through the REPL
  uk.co.fatvat.mlperceptron> (example)
  (0.10717792758953508) --> 0
  (0.993502708495955) --> 1
  (0.9930515903590437) --> 1
  (0.00883530181467182) --> 0
  nil

Total code weighs in at ~100 of so lines of purely functional code. This made it a doddle to test. Coding this was an exercise in map masochism. Never have I had to construct so many convoluted map expressions. At least it works in the end. It feels like it would be considerably simpler to implement this using mutability, I might try that for comparison purposes. Any suggestions on improving the code much appreciated! Full code is on GitHub.

(update - read some of Programming Clojure and applied some of the lessons learnt - see here.)

Neural networks are pretty interesting and in the next post I'll look at how to implement basic OCR using them.

Labels: clojure, neural networks

# posted by Jeff Foster : 23:52 0 Comments Links to this post

Friday, 29 May 2009

JVisualVM and Clojure

JVisualVM is a Java Virtual machine monitoring tool that allows you to see a live view of the Java memory, cpu and threads that are currently active. In this post, I'll look at how easy it is to profile Clojure code using JVisualVM.

First step is to get jvisualvm installed. If you've got a recent JDK then it's already available in the bin directory of the JDK install. If not, then upgrade to the latest JDK here. Once installed, fire it up and you should see the following.

The left hand side shows the applications you can connect to. You should see a VisualVM instance that allows you to monitor the VisualVM itself. Very meta. The local applications will include all locally running Java applications. Since Clojure itself is a Java application we can fire up a REPL and use that to profile code.

Firstly we'll look at garbage collection by using the memory profiler. People often associate immutable data structures (such as Clojure's) with poor performance since every time we modifying a data structure we are actually creating a completely new structure (in reality that's not quite true, structure that is unchanged is often shared). Using JVisualVM we can profile how fast the JVM can allocate objects. Let's use a really simple example which sums up the numbers 1 to 100K. Without JVirtualVM this takes 40ms. In those 40 seconds we create at least 100K objects.


  user> (time (reduce + (range 100000))
  "Elapsed time: 40.677795 msecs"
  4999950000

If we instrument this we can see that 165K java.lang.Integer objects are created together with 35K java.lang.Long objects. In this example then we're creating roughly 4 million objects per second! Pretty impressive (I realize this is totally non scientific, I'm just aiming to get a feel for the numbers!).

Another area in which you can apply JVisualVM is looking for optimizations and monitoring threads. In one of my previous posts, I looked at Mandlebrot fractal generation. We can use jvisualvm to understand the performance characteristics of the application. The only interesting part of the code for this experiment is shown below:



(defn calculate-pixels []
  (let [pixels (range 0 (* *width* *height*))]
    (pmap (fn [p] 
            (let [row (rem p *width*) col (int (/ p *height*))]
              (get-color (process-pixel (/ row (double *width*)) (/ col (double *height*))))))
          pixels)))

Running this with a single-threaded map instead of pmap gives us the following CPU information.

The time is dominated by mathematical operations. This is what you'd expect. You'd expect (naively) if you used multiple threads you'd be able to speed things up since the operations on each pixel are independent. What happens if we switch to pmap?

There's a big change. The time now is dominated by ThreadPoolExecutor$Worker which is taking up a huge percentage of the time. My guess is that we're context switching far too much - using jvisualvm we can get a view of what the threads were doing:

The image shows that the threads spent a lot of time waiting and not a lot of time doing processing. Opportunities for simultaneous processing were few because of the time spent spawning off tiny amounts of work.

Let's partition off chunkier bits of works in an effort to make the CPU work harder for its money. In the function below I use partition to break the set of pixels to render into 16 discrete chunks (number chosen at random). We use pmap again but this time it's got chunker bits of work to do. doall is used to force the evaluation of the inner map.



(defn calculate-pixels-2 []
  (let [n (* *width* *height*)
        work (partition (/ n 16) (range 0 n))
        result (pmap (fn [x]
                   (doall (map 
                    (fn [p]
                      (let [row (rem p *width*) col (int (/ p *height*))]
                        (get-color (process-pixel (/ row (double *width*)) (/ col (double *height*))))))
                    x)))
                   work)]
    (doall (apply concat result))))

The difference this makes is very visible. The graphic below shows two runs. Each time we can see that the threads are fully busy, not blocked and able to fully utilize the CPU. The worker threads are indicated at the bottom in the image below.

So using VisualVM helped quantify various improvements. Whilst premature optimization is the root of all evil, if you are going to optimize then it should be guided by a tool rather than your intuition (which if you are anything like me is more often than not crap).

Labels: clojure, jvisualvm

# posted by Jeff Foster : 12:00 0 Comments Links to this post

Monday, 25 May 2009

Neural Networks and Clojure

Neural Networks (ANNs) attempt to "learn" by modelling the behaviour of neurons. Although neural networks sound cool, there is no magic behind them!

Invented in 1957, by Frank Rosenblatt, the single layer perceptron network is the simplest type of neural network. The single layer perceptron network is able to act as a binary classifier for any linearly separable data set.

Single Layer Perceptron graphic from Wikipedia

The SLP is nothing more than a collection of weights and an output value. The Clojure code below allows you to create a network (initially with zero weights) and get a result from the network given some weights and an input. Not very interesting.



(defn create-network 
  [out]
  (repeat in 0))

(defn run-network
  [input weights]
  (if (pos? (reduce + (map * input weights))) 1 0))

The clever bit is adapting the weights so that the neural network learns. This process is known as training and is based on a set of data with known expectations. The learning algorithm for SLPs is shown below. Given an error (either 1 or -1 in this case), adjust the weights based on the size of the inputs. The learning-rate decides how much to vary the weights; too high and the algorithm won't converge, too low and it'll take forever to converge.



(def learning-rate 0.05)

(defn- update-weights
  [weights inputs error]
  (map 
   (fn [weight input] (+ weight (* learning-rate error input)))
   weights inputs))

Finally, we can put this all together with a simple training function. Given a series of samples and the expected values, repeatedly update the weights until the training set is empty.



(defn train
  ([samples expecteds] (train samples expecteds (create-network (count (first samples)))))
  ([samples expecteds weights]
     (if (empty? samples)
       weights
       (let [sample (first samples)
             expected (first expecteds)
             actual (run-network sample weights)
             error (- expected actual)]
         (recur (rest samples) (rest expecteds) (update-weights weights sample error))))))

So we have our network now. How can we use it? Firstly, let's define a couple of data sets both linearly separable and not. jiggle adds some random noise to each sample. Note the cool # syntax for a short function definition (I hadn't seen it before).



(defn jiggle [data]
  (map (fn [x] (+ x (- (rand 0.05) 0.025))) data))

(def linearly-separable-test-data
     [(concat
       (take 100 (repeatedly #(jiggle [0 1 0])))
       (take 100 (repeatedly #(jiggle [1 0 0]))))
      (concat
       (repeat 100 0)
       (repeat 100 1))])

(def xor-test-data
     [(concat
       (take 100 (repeatedly #(jiggle [0 1])))
       (take 100 (repeatedly #(jiggle [1 0])))
       (take 100 (repeatedly #(jiggle [0 0])))
       (take 100 (repeatedly #(jiggle [1 1]))))
      (concat
       (repeat 100 1)
       (repeat 100 1)
       (repeat 100 0)
       (repeat 100 0))])

If we run these in the REPL we can see that the results are perfect for the linearly separable data.


  > (apply train ls-test-data)
  (0.04982859491606148 -0.0011851610388172009 -4.431771581539448E-4)

  > (run-network  [0 1 0] (apply train ls-test-data))
  0

  > (run-network  [1 0 0] (apply train ls-test-data))
  1

However, for the non-linearly separable they are completely wrong:


  > (apply train xor-test-data)
  (-0.02626745010362212 -0.028550312499346104)

  > (run-network [1 1] (apply train xor-test-data))
  0

  > (run-network [0 1] (apply train xor-test-data))
  0

  > (run-network [1 0] (apply train xor-test-data))
  0

  > (run-network [0 0] (apply train xor-test-data))
  0

The neural network algorithm shown here is really just a gradient descent optimization that only works for linearly separable data. Instead of calculating the solution in an iterative manner, we could have just arrived at an optimal solution in one go.

More complicated networks, such as the multi-layer perceptron network have more classification power and can work for non linearly separable data. I'll look at them next time!

Labels: clojure, neural networks

# posted by Jeff Foster : 22:49 0 Comments Links to this post

Monday, 18 May 2009

Clojure and Robocode

Robocode is an educational programming game originally provided by IBM. Users write code to control miniature tanks in battles with other user written robots. Each tank has the same components but different programming. Users write code to control the movement, targeting and radar features of their robot.

Robocode uses Java as the programming language, and robots are shared by packaging them into Jar files. This means we should, with very little effort, be able to use Clojure to write a robot. Start by downloading the latest version of Robocode, available here.

The first fly in the ointment is that Robocode restricts the use of third-party JARs. If you try to just add clojure.jar to the class path you'll get an error message like this:


Preventing uk.co.fatvat.robot.FatRobot (1) from access: (java.io.FilePermission clojure.jar read): 
You may only read files in your own root package directory. 
SYSTEM: An error occurred during initialization of uk.co.fatvat.robot.FatRobot (1)
SYSTEM: java.lang.ExceptionInInitializerError

To fix this, edit the start up script (robocode.bat or robocode.sh depending on whether you are running Windows or not) and make sure you disable the security manager by adding -DNOSECURITY=true to the startup line. This disables the security manager meaning that there are no restrictions on what the robot can do. The security manager is in place in case you are a sentient robot killing machine. Be warned.

Without further ado, here's the most basic robot imaginable, borrowed entirely from My First Robot and converted over to Clojure. This robot moves back and forth, turns the gun around and fires at anything that looks at him slightly funny.


(ns uk.co.fatvat.robot.FatRobot
  (:gen-class :extends robocode.Robot))

(defn -run
  [robot]
  "Infinite loop whilst robot is alive"
  (doto robot
    (.ahead 500)
    (.turnGunRight 360)
    (.back  500))
  (recur robot))

(defn -onScannedRobot
  [robot event]
  (doto robot
    (.fire 1)))

It's not exactly the most exciting robot in the world and even loses against a robot that sits still and turns the turret around! How can we make it a little bit cleverer?

The robot below extends from AdvancedRobot which allows non-blocking calls, writing to the file system and custom events.


(ns uk.co.fatvat.robot.NotQuiteAsBad
  (import (java.awt Color))
  (import (robocode Rules))
  (import (robocode.util Utils))
  (:gen-class :extends robocode.AdvancedRobot :init create-robot :state state))

This shows two bits of the :gen-class directive I haven't used before. :init allows you to specify a constructor routine. The return value for this function is unusual in that it is always a vector with two elements. The first represents any arguments to the superclass (in this case empty) and the second represents any state that the object needs. The :state keyword allows you to name the method of accessing the state.


(defstruct target-details :distance :bearing :energy :velocity)

(defn -create-robot
  []
  "Robot records a list of events and performs actions based on these observations"
  [[] (ref [])])

The constructor function simply returns a reference to the state which is initially empty. We also define a structure representing a sighting of the enemy. These sightings will be logged in our state and will be used to determine the strategy.


(defn -run
  [robot]
  "Infinite loop whilst robot is alive"
  (setup-robot robot)
  (loop [x 1] ;; TODO is there a better idiom for an infinite loop?
   (walk robot)
    (recur 1)))

(defn -onScannedRobot
  [robot event] 
  (let [distance (.getDistance event)
        name (.getName event)
        energy (.getEnergy event)
        velocity (.getVelocity event)
        bearing (.getBearing event)]
    (dosync
     (alter (.state robot) conj (struct target-details distance bearing energy velocity)))
    (attack robot)))

onScannedRobot now records the details of the last observation (dosync sets up the transaction, alter applies the function given (conj) with the given arguments.


(defn- setup-robot
  [robot]
  "Ensure robot looks pretty"
  (doto robot
    (.setAdjustRadarForGunTurn true)
    (.setColors Color/RED Color/BLACK Color/RED)))

(defn- attack
  [robot]
  "Based on the accrued events, hurt robots"
  (let [latest (last @(.state robot))]
    (.turnRight robot (get latest :bearing))
    (when (zero? (get latest :velocity))
      (.fire robot 3))
    (.setTurnRadarRight robot 360)))

(defn- walk
  [robot]
  "Go for a walk around the outside of the building"
  (let [x (mod (.getHeading robot) 90)]
    (.ahead robot 50)
    (when (not (zero? x))
      (.turnLeft robot x))
    (when (zero? (.getVelocity robot))
      (.turnRight robot 90))))

walk simply makes the robot run around the outside of the arena. This is heavily based on the example code here. attack just waits for a stationary robot, turns and fires!

This at least allows me to beat the Fire default robot 10-0 (most of the time!).

There are a load more strategies and patterns available on the Robocode Wiki.

This was quite a useful learning exercise because I found out about init and state. One problem I ran into was that init doesn't allow you to call methods on the object that is about to be constructed. This was fixed recently, and now there is a corresponding post-init hook.

You can find all the code on GitHub.

Labels: clojure, robocode

# posted by Jeff Foster : 13:43 0 Comments Links to this post

Saturday, 16 May 2009

Using Clojure and Ext/JS

Ext/JS is a popular framework for building Ajax interfaces. The model for Ajax applications is dead simple. Build a UI using JavaScript, and populate it using JSON retrieved from the server (I'm pretty sure there's more to it, but I like simple!).

JavaScript and Lisp languages should go together pretty well, Doug Crockford has described JavaScript as Lisp in C's clothing.

I haven't done enough JavaScript programming to be dangerous just yet, so I'll start with something simple from the Ext tutorial. The goal will be to display a data-grid populated by stuff retrieved using the persistence API I looked at previously. Here's the end goal:

Ext Datagrid populated with data from Clojure

.

Firstly we need to write a servlet that will retrieve all the stories. If we were going to do this properly we'd parametrize this with restrictions (find me all stories between X and Y, or retrieve the top 100 stories). I'm confident that no-one will ever use this site, so I'll just go for brute force retrieve everything!

Using the persistence API we prepare a query object to find everything matching the given type and lazily convert this (map) each element across using entity-as-map


(defn get-all
  [type]
  (let [query (Query. (str type))
        service (DatastoreServiceFactory/getDatastoreService)]
    (map entity-to-map (.asIterable (.prepare service query)))))

Now we can retrieve stories, we need to squirt this back as JSON. There's a package, clojure-json that does this for you, but I (foolishly) decided to do it the quick and dirty way and print out a string!


(ns news.liststory                                    
  (:use (news appengine))
  (:gen-class :extends javax.servlet.http.HttpServlet))

(defn story-to-json
  [s]
  (str "{\"title\":\"" (get s "title") "\"" ",\"body\":" "\"" (get s "body") "\"},"))

(defn -doGet
  [_ request response]
  (.setContentType response "text/xml")
  (let [w (.getWriter response) stories (get-all "story")]
    (.println w (str "{\"totalCount\":" (count stories) ",\"stories\":["))
    (doseq [story stories]
      (.println w (str \tab (story-to-json story))))
    (.println w (str "]}"))))

So what were aiming to print out is a JSON representation of the stories that looks a little like this.


{"totalCount":3,"stories":[
 {"title":"Home of the Clojure Language","body":"http://www.clojure.org/"},
 {"title":"Jeff's web page, full of rubbish!","body":"http://www.fatvat.co.uk"},
 {"title":"Wikipedia on Clojure","body":"http://en.wikipedia.org/wiki/Clojure"},
]}

Finally, we need something to render this. I took an example as a starting point and ended up with this. (Note that it's referencing local host because I'm running off a local development environment)



Ext.onReady(function(){
  var store = new Ext.data.JsonStore({
    root: 'stories',
    totalProperty: 'totalCount',
    idProperty: 'storyId',
    fields: [
      'body', 'title'
    ],

    proxy: new Ext.data.HttpProxy({
      url: 'http://localhost:8080/liststory?'
    })
  });

  var grid = new Ext.grid.GridPanel({
    el: 'story-grid',
    title: 'Clojure News!',
    store: store,
    loadMask: true,
    height: 400,
    columns:[{
      header: 'Link',
      dataIndex: 'body'
    },{
      header: 'Description',
      dataIndex: 'title'
    }]
  });

  grid.render();
  store.load({});
});

That was pretty painless to build. The painful bit was writing out JSON as a string. Lesson learnt, use a library (or build a decent abstraction).

I'm not sure I like mixing my languages, I'd really like a way to have Lisp goodness across the whole stack. One potential option for this is to use Google Web Toolkit. GWT compiles Java into cross-platform Javascript. I could (probably) have Clojure compile to Java which in turn is compiled to Javascript. That sounds fun!

Labels: clojure, javascript

# posted by Jeff Foster : 06:30 0 Comments Links to this post

Wednesday, 13 May 2009

Data Persistence in GAE with Clojure

If you want to persist stuff in Java, you've got a bewildering amount of choice

Java Data Objects

Java Persistence API

Hibernate

iBatis

JDBC

There's even an entire book about making the right decision! (Persistence in the Enterprise)

Google App Engine has gone with JDO using the Data Nucleus platform. In GAE this is split again into two APIs, the high-level one for persisting objects, and a lower-level one which allows you to persist raw data.

When using Clojure it makes more sense to go with the lower-level api. The higher-level one would require using annotations on objects which isn't supported at the moment in Clojure (as far as I know!).

So how do we store data in GAE? The example below saves a story to disk (as a learning exercise, I'm writing a quick and dirty reddit clone).


(ns news.savestory                                    
  (:use (news appengine))
  (:gen-class :extends javax.servlet.http.HttpServlet)
  (:import (com.google.appengine.api.datastore DatastoreServiceFactory Entity Key Query)))
                                      
(defn store
  [data type]
  (let [entity (Entity. (.toString type))]
    (doseq [[k v] data]
      (.setProperty entity (.toString k) v))
    (.put (DatastoreServiceFactory/getDatastoreService) entity)
    (.getKey entity)))

(defn -doGet                                                  
  [_ request response]
  (let [body (.getParameter request "storyLink")
        title (.getParameter request "storyTitle")]    
    (let [w (.getWriter response)]
      (.println w (store {:body body :title title} :story)))))

store takes a map and a type and persists it in the database and returns the key associated with this entity. Visiting the URL persists the data in the URL and returns the key.

Retrieving the data is much the same.


(ns news.viewstory                                    
  (:use (news appengine))
  (:gen-class :extends javax.servlet.http.HttpServlet)
  (:import (com.google.appengine.api.datastore DatastoreServiceFactory Entity Key Query KeyFactory)))

(defn entity-to-map 
  [entity]
  (into (hash-map) (.getProperties entity)))
                                      
(defn getEntity
  [id type]
  (let [k (KeyFactory/createKey (.toString type) (Long/valueOf id))]
    (entity-to-map
     (.get (DatastoreServiceFactory/getDatastoreService) k))))

(defn -doGet                                                  
  [_ request response]
  (let [id (.getParameter request "storyId")
        story (getEntity id :story)
        w (.getWriter response)]
    (doseq [[k v] story]
      (.println w (str k "=" v)))))

entity-to-map just converts the properties of the entity into a friendly Clojure type.

So now I know how to authenticate users, next step is to get some basic UI together. There's a number of choices here (JSP, server side HTML generation in Clojure or just go with Ajax). I'm leaning towards the Ajax!

Labels: appengine, clojure

# posted by Jeff Foster : 23:26 3 Comments Links to this post

Sunday, 10 May 2009

User authentication in GAE

Google App Engine provides a set of standard APIs for common tasks, such as user authentication, caching and persistent storage. This post looks at the user authentication API and creates a simple form that is authenticated against a Google login.

We'll use exactly the same structure and build scripts as the previous post, as that just makes life easier.

The servlet below checks whether there is an current user logged in. If not, then redirect to a prompt requiring a user to login, otherwise just display the users nickname.


(ns blogging.login                                                                        
  (:gen-class :extends javax.servlet.http.HttpServlet)
  (:import (com.google.appengine.api.users User UserService UserServiceFactory)))         
                                      
(defn greet 
  [user response]
  (.setContentType response "text/plain")
  (let [w (.getWriter response)]
    (.println w (str "Hello, " (.getNickname user)))))
                                                                                 
(defn -doGet                                                  
  [_ request response]
  (let [userService (UserServiceFactory/getUserService)
        user (.getCurrentUser userService)]
    (cond
      (not (nil? user)) (greet user response)
      :else (.sendRedirect response (.createLoginURL userService (.getRequestURI request))))))

If you deploy on the development server, then you get a screen like that shown below:

Neat! Next on the list, persisting data.

Labels: appengine, clojure

# posted by Jeff Foster : 23:46 0 Comments Links to this post

Clojure on the Google App Engine

The Google App Engine offers a complete stack for deploying applications in the cloud. Initially, support only existed for Python, but recently support was announced for Java.

Although, the main page announces this as support for the Java Language, it's much more than that, it's support for the Java Virtual Machine. The list of languages on the JVM is huge. This means that in theory, any of these languages can now be hosted in the cloud.

So how easy is it to get Clojure going in the cloud?

Firstly, register at http://appengine.google.com and get yourself an account. Download the Java AppEngine SDK too and unpack that and get the development server up and running.

GAE is based on the Servlet 2.5 specification, so the typical directory structure looks very similar to any JSP/Servlet type application:

Root - Top level directory for the project
Root/src - Source code
Root/war - JSP files and HTML artefacts
Root/war/WEB-INF -Application configuration files
Root/war/WEB-INF/classes - Compiled source files
Root/war/WEB-INF/lib - Deployment time dependencies

As GAE is based on servlets, we need to define a simple servlet for the mandatory hello world demo! This code goes in the src directory:


(ns helloclojure.servlet                                                                        
  (:gen-class :extends javax.servlet.http.HttpServlet))             
                                                                                                                              
(defn -doGet                                                  
  [_ request response]                                                                                                        
  (let [w (.getWriter response)]                                                                 
    (.println w "Hello world!")))

:gen-class causes Clojure to emit a class file representing this name space. The - before doGet indicates that this is a member function with three arguments (the first representing "this", unused and therefore a _ symbol). So all we do for this servlet is write "hello world" whenever any request is made.

Next we need to make a standard web.xml descriptor and put that in META-INF. This registers the servlet and specifies the mapping between a URL format and the servlet that deals with the request.



<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE web-app PUBLIC
 "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
 "http://java.sun.com/dtd/web-app_2_3.dtd">

<web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5">
    <servlet>
        <servlet-name>helloclojure</servlet-name>
        <servlet-class>helloclojure.servlet</servlet-class>
    </servlet>
    <servlet-mapping>
        <servlet-name>helloclojure</servlet-name>
        <url-pattern>/helloclojure</url-pattern>
    </servlet-mapping>
    <welcome-file-list>
        <welcome-file>index.html</welcome-file>
    </welcome-file-list>
</web-app>

Also in the META-INF directory we need a descriptor for the GAE.



<?xml version="1.0" encoding="utf-8"?>
<appengine-web-app xmlns="http://appengine.google.com/ns/1.0">
    <application></application>
    <version>1</version>
</appengine-web-app>

That's all the scaffolding you need. Finally, we need some way to build and deploy this. Thankfully, someone has already looked at this, so I took their build script and made a few modifications (remove the test target for example) in the name of simplification.

The net result is the following Ant script.


<project name="helloclojure" basedir="." default="compile">
 
  <property environment="env" />  
  <property name="sdk.dir" location="/home/jfoster/appengine-java-sdk-1.2.0" />
  <property name="classes.dir" value="war/WEB-INF/classes" />
  <property name="lib.dir" value="war/WEB-INF/lib" />
  <property name="src.dir" value="src/" />
  
  <import file="${sdk.dir}/config/user/ant-macros.xml"/>
  
  <path id="project.classpath">
    <pathelement path="${classes.dir}" />
    <pathelement path="${src.dir}/helloworld" />
    <fileset dir="${lib.dir}">
      <include name="**/*.jar" />
    </fileset>
    <fileset dir="${sdk.dir}/lib">
      <include name="**/*.jar" />
    </fileset>
  </path>
  
  <target name="clean">
    <delete dir="${classes.dir}" />
  </target>
  
  <target name="init">
    <mkdir dir="${classes.dir}" />
  </target>
  
  <target name="compile" depends="clean,init">
    <java classname="clojure.lang.Compile" classpathref="project.classpath" failonerror="true">
      <classpath path="${src.dir}" />
      <sysproperty key="clojure.compile.path" value="${classes.dir}" />
      <arg value="helloclojure.servlet" />
    </java>
  </target>
  
  <target name="devserver" description="run local dev appserver" depends="compile">
    <dev_appserver war="war" />
  </target>
  
  <target name="deploy" description="deploy to appspot" depends="compile">
    <appcfg action="update" war="war" />
  </target>
  
</project>

All code is available on my Git Repo

Labels: appengine, clojure

# posted by Jeff Foster : 08:38 2 Comments Links to this post

Wednesday, 6 May 2009

Clojure 1.0 Released!

After only 18 months or so since Clojure came to life, it's now released as v1.0. See here for the official announcement.

Labels: clojure

# posted by Jeff Foster : 23:54 0 Comments Links to this post

Sunday, 3 May 2009

Levenshtein Distance in Clojure (II)

In my last post, I looked at the Levenshtein Distance and showed that the time complexity of the algorithm (when naively implemented) is exponential. In this post I'll show how we can improve this time complexity of this algorithm with a small series of changes.

Dynamic programming is a technique for solving problems which consist of overlapping subproblems. The Levenshtein algorithm is perfect for this kind of problem as the final solution is the sum of the little solutions. Imperative languages will often solve this using a M x N matrix, where each item represents the cost taken to convert an item so far. This is expanded in much more detail on Wikipeda. This solution is nice, and computationally cheap, but it's not faithful to the definition of the problem. It's an example of the impedance mismatch between writing software and writing specifications for a problem (see previous post about Intentional Software for one approach to minimizing this mismatch).

So how can we keep faithful to the original specification without having exponential performance cost? Memoization is a technique for caching the results of function calls. Functional programming languages are inherently suitable for memoization, as every pure function will, for the same arguments, yield the same result. Clojure has direct support for memoization via memoize. If we were dealing with a simple example where the function body didn't refer to the function name, we could just use memoize directly, but Levenshtein distance refers to itself three times. There are a number of options here, we could either convert to a tail recursive function and recur will solve the problem or we could use a forward declaration (thanks for #Clojure for pointing that one out, you don't want to know what my plan C was!).

Transforming the algorithm to be tail recursive is going to be painful and radically change the way the code looks. So let's go for the forward declaration approach. We declare levenshtein-distance-fast up front, write the declaration of levenshtein-distance referring to the fast implementation (as yet unbound) and finally declare levenshtein-distance-fast as the memoized version of levenshtein-distance.



(declare levenshtein-distance-fast)

(defn levenshtein-distance-int
  "Calculates the edit-distance between two sequences"
  [seq1 seq2]
  (cond
    (empty? seq1) (count seq2)
    (empty? seq2) (count seq1)
    :else (min
           (+ (cost (first seq1) (first seq2)) (levenshtein-distance-fast (rest seq1) (rest seq2))) ;; substitution
           (inc (levenshtein-distance-fast (rest seq1) seq2))    ;; insertion
           (inc (levenshtein-distance-fast seq1 (rest seq2)))))) ;; deletion

(def levenshtein-distance-fast (memoize levenshtein-distance))

This makes a huge difference in performance as you can see below:


  user> (time (levenshtein-distance-fast "abcdefghi" "hijklmnop"))
  "Elapsed time: 0.191776 msecs"

  user> (time (levenshtein-distance-fast "abcdefghi" "hijklmnop"))
  "Elapsed time: 0.081372 msecs"

  user> (time (levenshtein-distance-slow "abcdefghi" "hijklmnop"))
  "Elapsed time: 668.713644 msecs"
  9

This (probably!) gives the same complexity class as a implementation using dynamic programming. However, it still has one big problem. It isn't tail recursive and hence it won't work for large very length sequences passed as arguments. Altering the definition to use tail recursion is not something I fancy doing!

Labels: clojure

# posted by Jeff Foster : 12:55 0 Comments Links to this post

Wednesday, 29 April 2009

Levenshtein Distance in Clojure

The Levenshtein Distance is an algorithm for computing the edit distance between two strings. The algorithm gives you a number which represents the number of substitutions, removals and insertions needed to transform one string into another. This has applications in fields as diverse as bioinformatics (distance between two genome sequences) and spell checking (finding out what you meant to type).

The algorithm can be stated very simply for two sequences a and b

If empty(a), then length(b) (length(b) insertions are required)

If empty(b), then length(b) (length(a) insertions are required)

otherwise, calculate the minimum of:
1. the distance between (rest a) (rest b) + 1 if a[0] = b[0] (substitution)
2. the distance between a (rest b) (insertion)
3. the distance between (rest a) a (deletion)

We can state this naively in Clojure.


(defn cost [a b]
  (if (= a b) 0 1))

(defn levenshtein-distance 
  "Calculates the edit-distance between two sequences"
  [seq1 seq2]
  (cond
    (empty? seq1) (count seq2)
    (empty? seq2) (count seq1)
    :else (min
           (+ (cost (first seq1) (first seq2)) (levenshtein-distance (rest seq1) (rest seq2))) ;; substitution
           (inc (levenshtein-distance (rest seq1) seq2))    ;; insertion
           (inc (levenshtein-distance seq1 (rest seq2)))))) ;; deletion

At first glance, this appears OK. The code works:


  user> (levenshtein-distance "kitten" "sitting")
  3
user>

And even better, the code actually looks like the specification of the problem. Looking closer, there are a number of problems. It's not tail recursive so the stack is definitely going to be blown for larger strings. Secondly, it has incredibly bad performance due to the recursion repeating the work.

How do we quantify incredibly badly? Let's assume that the cost of a single calculation (without the recursive calls) is 1. Then we can express the cost for any given pair as:


  T(X,Y) = 1 + T(X-1,Y-1) + T(X-1,Y) + T(X,Y-1)

X,Y represent the length of the string. We can express this too in some Clojure code.


(defn calculate-cost [x y]
  (cond
    (= x 0) 1
    (= y 0) 1
    :else
      (+ 
       (calculate-cost (dec x) (dec y))
       (calculate-cost x (dec y))
       (calculate-cost (dec x) y)
       1)))

Notice that the structure of this function is almost exactly the same as the structure of the function we're trying to measure! So how expensive is this implementation of Levenshtein:


  user> (calculate-cost 3 3)
  94
  user> (calculate-cost 4 4)
  481
  user> (calculate-cost 6 6)
  13483
  user> (calculate-cost 7 7)
  72958
  user> (calculate-cost 8 8)
  398593

This is an exponential cost function. This basically renders this implementation all but useless for anything other than toy size strings.

So how can we improve this algorithm whilst keeping the behaviour that it actually looks like the definition of the problem? That can wait for next time!

Labels: clojure, performance

# posted by Jeff Foster : 23:21 0 Comments Links to this post

Sunday, 26 April 2009

The Mod Operator and Zeller's Congruence

Java's built in mod operator (%) behaves differently than the Clojure mod function with negative operators

Java:System.out.println("5 % 2 = " + (5 % 2)); // 1
Clojure:(mod -7 2) ; 1

Java:System.out.println("-2 % 7" + (-2 % 7)); // -2
Clojure:(mod -2 7) ; 5
Clojure:(rem -2 7) ; -2

The Java behaviour is explained more clearly by the JLS, here. rem should be the Clojure equivalent if you want to support the same behaviour as Java.

Why would the behaviour of negative numbers and the mod operator ever be useful? One algorithm that uses this is Zeller's Congruence which allows you to determine the day of the week if you know the date.

Defining the algorithm in Clojure is simple:



(def days 
     {0 'Saturday
      1 'Sunday
      2 'Monday
      3 'Tuesday
      4 'Wednesday
      5 'Thursday
      6 'Friday})

(defn zeller
  "Zeller's congruence"
  [day-of-month month year century]
  (get days 
       (mod (+ day-of-month 
                    (int (/ (* 26 (inc month)) 10))
                    year 
                    (int (/ year 4))
                    (int (/ century 4))
                    (- (* 2 century))) 7)))

The algorithm is slightly strange in that January and February need to be treated as the 13th and 14th of the previous month. We can encapsulate this ugliness behind a nicer interface.


(def months
     ['January 
      'February 
      'March 
      'April 
      'May 
      'June 
      'July 
      'August 
      'September
      'October
      'November
      'December])

(def month-to-number
     (zipmap months [13 14 3 4 5 6 7 8 9 10 11 12]))


(defn day-of-week 
  [day month year]
  (let [month-num (get month-to-number month)
        year-int (if (or (= month-num 13) (= month-num 14)) (dec year) year)]

Now we can try this out:


  user> (day-of-week 22 'November 1963) ; JFK assassination
  Friday

  user> (day-of-week 20 'July 1969) ; Apollo Moon landing
  Sunday

  user> (day-of-week 27 'April 2009)
  Monday

Of course, any sane person would actually use a built in library function to do this! My rambling point is just that, Java % is not the same as Clojure mod!

Labels: clojure

# posted by Jeff Foster : 01:24 0 Comments Links to this post

Wednesday, 22 April 2009

Understanding the Y Combinator

Like monads, understanding the Y Combinator is a rite of passage for the aspiring functional programmer. So here's my take on it, using Clojure.

A fixed point combinator is a function g which produces such a fixed point p for any function f.


  g(f)=p
  f(p)=p

Why is this important or interesting for a programming language? It's important because it allows you to describe recursion in terms of rewrites, rather than computation steps. This allows for anonymous recursive functions.

So how do we write the Y combinator in Clojure? Taking the lambda-calculus definition and switching the word lambda for fn, gives us this incomprehensible mess!


(defn Y [r]
  ((fn [f] (f f))
   (fn [f]
     (r (fn [x] ((f f) x))))))

But what does it actually mean? Probably the best way to understand is with a real example and to do that, we need a recursive function. Given that everyone else uses factorial, we'll use a simpler recursive function that sums up a sequence.


(defn sum-seq [x]
  (if (empty? x)
    0
    (+ (first x) (sum-seq (rest x)))))

In the definition above, we've written the function with explicit recursion. We'll use the Y combinator to get rid of that!

The first step is to rethink the function in terms of a series of functions. Our goal is to create a way of expressing this computation as a series of function calls. We want to write a function that given a function to compute the sum of a sequence, gives us the next function to compute the sum of a sequence.


(defn sum-seq-fn-gen [func]
  (fn [s]
    (if (empty? s)
      0
      (+ (first s) (func (rest s))))))

So now we have such a function. We can already use it:


  user> ((sum-seq-fn-gen nil) [])
  0

  user> ((sum-seq-fn-gen (sum-seq-fn-generator nil)) [9])
  9

  user> ((sum-seq-fn-gen (sum-seq-fn-gen (sum-seq-fn-gen nil))) [1 9])
  10

It's not that useful, as at the moment we'd have to type in an awful lot of characters to sum a large list. What we really want is some way of calculating the fixed point of such a function. Thankfully we already have that, thanks to the Y combinator.


  user> ((Y sum-seq-fn-gen) [1 2 3 4 5])
  15

  user> ((Y sum-seq-fn-gen) (range 0 1000))
  499500

So the Y Combinator has given us what we need. Given a function and an input, find the fixed point. Note that there is no explicit recursion going on. sum-seq-fn-gen could be an anonymous function.


  user> ((Y 
   (fn [func]
     (fn [s] 
       (if (empty? s)
         0
         (+ (first s) (func (rest s))))))) [1 2 3 4 5])
  15

The best way to understand the Y combinator is to see how it's used and then run through the fun exercise of expanding it and seeing what happens. I found the links below useful:

Sketchy Lisp - derivation of Y Combinator

Clojure April 1.0

Y Combinator in Arc and Java

Labels: clojure

# posted by Jeff Foster : 23:04 0 Comments Links to this post

Saturday, 4 April 2009

Implementing Minilight in Clojure (2)

Any 3D rendering rubbish seems to require a 3d vector class of some description. A quick Google Code Search brings up thousands of results.

The Minilight code is no exception.

For now, my Clojure version will just use the built in vec based sequence. I'm sure that I'll find I'll need raw Java array access, but it's inconvenient to start with the ugly stuff.

ANSI Common Lisp mentions this explicitly,

"Lisp is really two programming languages, a language for writing fast programs and a language for writing programs fast."

I think the same applies to Clojure, I know I can always get the raw speed of Java when I need it, but I can stay nice and high level for most of the time (80/20 rule and all that).

So here's the most basic set of vector like functions I could write for Clojure. The cross product definition is the ugliest out there. Probably because it's only valid in 3 (or 7 according to Wikipedia!) dimensions!



(defn dot 
  "Dot product of two vectors"
  [v1 v2] (reduce + (map * v1 v2)))

(defn cross
  "Cross product of two 3 vectors"
  [v1 v2]
  (vec
   (- (* (get v1 1) (get v2 2)) (get v1 2) (get v2 1))
   (- (* (get v1 2) (get v2 0)) (get v1 0) (get v2 2))
   (- (* (get v1 0) (get v2 1)) (get v1 1) (get v2 0))))

(defn unary-minus
  "Negation of a vector"
  [v] (vec (map - v)))

(defn unitize
  [v]
  (let [len (Math/sqrt (dot v v))
        overlen (if (not (zero? len)) (/ 1 len) 0)]
    (vec (map (fn [x] (* overlen x)) v))))

(defn between
  [mini maxi x]
  (min (max x mini) maxi))

(defn get-clamped
  "Clamp v within the bounds minimum and maximum"
  [v mini maxi]
  (vec (map between mini maxi v)))

Labels: clojure

# posted by Jeff Foster : 12:56 0 Comments Links to this post

Tuesday, 31 March 2009

Implementing Minilight in Clojure (1)

Minilight is a global illumination renderer, designed to act as a minimal example of how to achieve certain rendering techniques. Minilight includes:

Monte-carlo path-tracing transport
Emitter sampling
Progressive refinement
RGB light
Diffuse materials
Triangle modelling primitives
Octree spatial index
Pin-hole ‘lens’
Ward linear tone-mapping

Together, this set of techniques is part of the Global illumination rendering algorithm. This considers not only a single ray (see my previous attempt) but reflections and emissions from all objects in the scene.

Over the next few weeks, I'll attempt to implement this in Clojure.

So for part 1, we'll look at reading in a scene file and converting it into a Clojure representation. The format for a model is simple. There's an example here.

Clojure really simplifies things because of the read function and because it's homioiconic (a Clojure function is a Clojure data structure).

The main gotcha I found was that read, by default, throws an exception when it reaches the end of the file. This made things slightly more complicated than I needed. Further investigation shows that read takes some optional parameters allowing you to control how EOF should be treated.

For now, I've just read the data in raw form. I've no idea if this will be a suitable data structure for actually doing anything with, but we'll see how this evolves...



(defstruct model
  :iterations
  :width
  :height
  :eye-position
  :look-direction
  :view-angle
  :sky-emission
  :ground-reflection
  :triangles)

(defstruct triangle
  :geometry
  :reflectivity
  :emitivity)

(defmacro safe-read [x]
  `(read ~x false nil))

(defn read-triangles
  [r]
  ((fn [triangles]
     (let [geom [(safe-read r) (safe-read r) (safe-read r)] refl (safe-read r) emit (safe-read r)]
       (if (first geom)
         (recur (cons (struct triangle geom refl emit) triangles))
         triangles))) nil))

(defn load-model
  [f]
  (with-open [r (PushbackReader. (reader f))]
    (.skip r (count "#MiniLight"))
    (struct model 
            (read r) ;iterations
            (read r) ;width
            (read r) ;height
            (read r) ;eye-position
            (read r) ;look-direction
            (read r) ;view-angle
            (read r) ;sky-emission
            (read r) ;ground-reflection
            (read-triangles r)))) ;triangles

Labels: clojure

# posted by Jeff Foster : 00:02 3 Comments Links to this post

Tuesday, 24 February 2009

Bloom Filters

Bloom Filters are an efficient data structure for determining whether an item is a member of a set. It's a probabilistic set, it's guaranteed to never return a false negative BUT can sometimes falsely report that an item is in the set.

The Bloom filter doesn't store the data in the structure, instead it uses a bit array. There are two operations on a Bloom filter:

Add - as the name suggests adds a new element to the filter.
Query - returns whether an item is in the set or not

A series of hash functions (ideally independent) are used to calculate a number of indices within the bit-array. When we add something we set the various bit indices, and when we query we check whether all these bit indices are set.


(defstruct bloom-filter :hashfns :value)

(defn make-bloom-filter
  ([n] (struct bloom-filter md5-hashes (bit-array n)))
  ([n fns] (struct bloom-filter fns (bit-array n))))

(defn add!
  [bloom n]
  (let [hashes (map (fn [x] (x n)) (bloom :hashfns))]
    (doseq [x hashes] (set-bit! (bloom :value) x 1))
    bloom))

(defn query
  [bloom n]
  (let [hashes (map (fn [x] (x n)) (bloom :hashfns))]
    (reduce bit-and (map (fn [z] (get-bit (bloom :value) z)) hashes))))

The gotcha for me was remembering to use doseq for side effects. If instead I'd used map I'd have (and was) in trouble because it wasn't evaluated. doseq forces the evaluation.

One simple choice for the hashes is to use MD5 hash values and split it. MessageDigest allows you to calculate various hash functions.


(ns bloom
  (:use bitarray)
  (:use clojure.contrib.test-is)
  (:import (java.security MessageDigest)))

(defn pad [n s]
  (let [padding (- n (count s))]
    (apply str (concat (apply str (repeat padding "0")) s))))

(defn md5-hash [s]
  (let [m (MessageDigest/getInstance "MD5")]
    (.update m (.getBytes (str s)) 0 (count s))
    (let [x (.toString (BigInteger. 1 (.digest m)) 16)]
      (pad 32 x))))

So how well does this work?


(deftest test-bloom
  (let [teststrs (map (fn [x] (str x)) (range 0 1000))
        bloom (make-bloom-filter 0xFFFF)]
    (doseq [x teststrs]
      (is (= 0 (query bloom x)))
      (add! bloom x)
      (is (= 0 (query bloom (str "not" x))))
      (is (query bloom x)))))

In this example, I've used hash functions which break MD5 down into 4 lots of 4 hex characters which gives a range of 65536.

Running this test gives


  bloom> (run-tests 'bloom)
  Ran 1 tests containing 3000 assertions.
  0 failures, 0 errors.

Awesome, no false positives. Taking it down to 0xFFF gives 82 false positives which is inline(ish!) with the figures here Bloom filter error table.

The primary use case is caching - check it's in some storage mechanism before doing something expensive (BigTable and Squid Cache both use bloom filters.).

Rapleaf write about how using a Bloom filter saved some serious time (see here).

Labels: clojure

# posted by Jeff Foster : 23:58 0 Comments Links to this post

Monday, 23 February 2009

Bit Fields using Clojure

A bit array is a way of getting a very compact array of Boolean values with each value being represented as a single bit. Bit arrays are usually associated with low(er)-level like C, but you can do them in Clojure too.

Clojure provides array functions through aget and aset (slightly strange in that it mutates a data structure in place). Using Java arrays isn't very functional, but in my opinion this is about providing the balance between strictness (e.g. Haskell) and leniency (e.g. C).

We define a bit-array as a structure consisting of some array data and the width of each field.


(defstruct bit-field :element-width :array-data)

(defn bit-array
  [n]
  (struct bit-field 31 (int-array (inc (int (/ n 31))))))

Where 31 is the range of an integer in Java (it doesn't support unsigned) (yeah, my thinking is fuzzy here, but I think that's right... Certainly 32 fails all my tests.).

A bit-array only consists of a couple of operations, get-bit and set-bit!. The bang (!) notation at the end of the set-bit function name is an informal way of indicating that the function mutates its data.


(defn set-bit!
  [bitfield bit val]
  (let [r (mod bit (bitfield :element-width))
        n (int (/ bit (bitfield :element-width)))
        x (aget (bitfield :array-data) n)]
    (if (not (zero? val))
      (aset (bitfield :array-data) n (bit-or x (bit-shift-left 1 r)))
      (aset (bitfield :array-data) n (bit-xor x (bit-shift-left 1 r))))
    bitfield))

(defn get-bit
  [bitfield bit]
  (let [r (mod bit (bitfield :element-width))
        x (aget (bitfield :array-data) (int (/ bit (bitfield :element-width))))]
    (if (= 0 (bit-and x (bit-shift-left 1 r))) 0 1)))

We work out the index in the array to change by dividing the bit to set by the width of each element and use bit-shift-left to identify the bit in question to twiddle.

How can we be sure this works? Well, we can never be 100% sure without a proof, but we can at least run it against a reasonable set of data! Clojure Contrib has a very simple library for testing. You define tests using the deftest macro, each test consists of a number of is functions that verify assertions.



(deftest test-bits
  (let [n 3000
        f (bit-array n)]
    (is (= 31 (f :element-width)))
    (doseq [x (range 0 n)]
      (is (= 0 (get-bit f x))))
    (doseq [x (range 0 n)]
      (set-bit! f x 1)
      (is (= 1 (get-bit f x))))
    (doseq [x (range 0 n)]
      (set-bit! f x 0)

Tests are run using run-tests. For example:


  bitarray> (run-tests 'bitarray)

  Testing bitarray

  Ran 1 tests containing 9001 assertions.
  0 failures, 0 errors.
  nil

So it passes the tests therefore it at least vaguely works!

Labels: clojure

# posted by Jeff Foster : 23:43 0 Comments Links to this post

Friday, 20 February 2009

JMusic and Clojure

JMusic is a Java library for music composition. I thought it'd be fun to play with music as with images, so I'm looking at trying some algorithmic compositions.

Since it's a Java library, it's easy to use JMusic with Clojure. It's simple as downloading the JAR file (see here) and then making sure the JAR is on your class path when you start Clojure.

You'll need to import a few classes to get started. Clojure doesn't support importing all members from a namespace, so it can be a little tedious. (as a side note, In Java I'm so used to IntelliJ auto-importing that I'd forgotten how much crud you have to import these days).

Here's my basic imports for a simple program which mirrors the basic one in the tutorial.


(ns jmusic
  (:use [clojure.contrib.import-static :only (import-static)])
  (:import jm.JMC)
  (:import (jm.util Write))
  (:import (jm.music.data Note Score Part Phrase)))

(import-static jm.JMC 
               CROTCHET
               C4
               FLUTE)

import-static is a very handy function - it does exactly what it says on the tin!

JMusic has a simple composite model for music.

A Note is the building block of any composition
A Phrase is 1 or more notes
A Part is a collection of phrases played with a given Instrument
A Score is a collection of parts

Writing functions to composes phrases of notes, and parts of phrases is very tedious, so we'll use a macro to help avoid duplication. Here's a few helper functions to build the various music domain objects in JMusic. Note that this is just for my "hello world" style application, they are incredibly inflexible functions at the moment!


(defmacro jm-add-children
  [m obj parts]
  `(let [obj# ~obj]
      (doseq [p# ~parts]
        (doto obj#
          (~m p#)))
      obj#))

(defn make-score 
  [name parts]
  (let [sc (Score. name)]
    (jm-add-children .addPart sc parts)))

(defn make-phrase
  [name notes]
  (let [p (Phrase. name)]
    (jm-add-children .addNote p notes)))

(defn make-part 
  [name instrument phrases]
  (let [part (Part. name instrument)]
    (jm-add-children .addPhrase part phrases)))

(defn make-note
  [freq rhythm]
  (Note. freq rhythm))

Once you've got a score together, you need to be able to save it.


(defn save [score output]
  (Write/midi score output))

OK, with all those helper functions out of the way we can now write some music. Taking the first example (chromatic scale) from the JMusic tutorial, we get:


(defn make-noise []
  (let [notes (map (fn [y] (make-note (+ C4 y) CROTCHET)) (range 0 12))
        phrase (make-phrase "Phrase1" notes)
        part   (make-part "Part" FLUTE (list phrase))
        score (make-score "Score" (list part))]
    (save score "chromatic-scale.mid")))

Running this at your REPL should get:


  jmusic> (make-noise)
  ----------------------------- Writing MIDI File ------------------------------
  Converting to SMF data structure...
      Part 0 'Part' to SMF Track on Ch. 0:  Phrase 0:............
  MIDI file 'chromatic-scale.mid' written from score 'Score' in 0.001 seconds.
  ------------------------------------------------------------------------------
  nil

And result in a chromatic scale midi file being output. Next, to find something funkier to do!

Labels: clojure, music

# posted by Jeff Foster : 13:50 1 Comments Links to this post

Wednesday, 18 February 2009

Lazier Clojure

Clojure has become a lazier language. See here for a description of the changes.

One of the main changes is the removal of "nil-punning". This was a technique where functions operating on empty lists returned nil which evaluated to false in a conditional statement. All of this is explained in much more detail here..

Labels: clojure

# posted by Jeff Foster : 22:51 0 Comments Links to this post

Monday, 16 February 2009

Countdown

Countdown is a Channel4 game show with standard number and word puzzles. In this post we'll look at the Numbers game. The rules are simple, given 6 numbers (between 1 and 999 inclusive), calculate the target number (between 100 and 999 inclusive). You can use +, -, / and * to get the numbers and you have a 30 second time limit to do so.

To solve this in Clojure we'll start with a brute force search of all the possibilities and see if that's good enough to solve it.

One approach I tried initially was just to build up the Clojure expression tree for all possible examples, using code like this:


(def *operators* ['+ '- '/ '*])

(defn expr
  "A list of expressions for a and b"
  [a b]
  (map (fn [x] (list x a b)) *operators*))

The idea would be that I could then just (map eval (expr 4 5)) and get all the possible results. This turned out to be very slow. Generally you want to avoid calling eval at run-time if you can help it.

To solve this I defined a simple structure to keep track of running expressions and their value. As the expressions are built up, the values are calculated in sync.


(def *operators* {'+ + '- - '/ / '* *})

(defn is-valid [op a b]
  (cond 
   (= + op) true
   (= - op) (> a b)
   (= * op) true
   (= / op) (= (mod a b) 0)))

(defstruct node :expression :value)

(defn value 
  [x]
  (if (map? x)
    (x :value)
    x))

(defn expression
  [x]
  (if (map? x)
    (x :expression)
    x))

(defn expr 
  "A list of expressions for a and b"
  [a b]
  (let [nodea (map? a) nodeb (map? b)]
    (filter (fn [x] (not (nil? x))) 
            (map (fn [x] (when (is-valid (second x) (value a) (value b))
                           (struct node 
                                   (list (first x) (expression a) (expression b)) 
                                   ((second x) (value a) (value b)))))
                 *operators*))))

Why is *operators* a map? That's simply because "+" doesn't print very nicely e.g.


  countdown> +
  #<core$_PLUS___3180 clojure.core$_PLUS___3180@61dd1c39>

I also added a check to prune entries out that results in floating point or negative numbers, that just helps keep the number of combinations down a little.

Armed with a function that calculates all the possible expressions for a pair of expressions, how do we now use that to generate all the possible expressions?


(defn make-expressions-helper 
  "Given a lst, build up all valid Countdown expressions"
  [x]
  (cond
   (< (count x) 2) (list (struct node (first x) (first x)))
   (= 2 (count x)) (apply expr x)
   :else
     (let [exps (apply expr (take 2 x))
           remd (drop 2 x)]
       (mapcat make-expressions-helper (map (fn [x] (cons x remd)) exps)))))

This is a recursive definition with the following logic:

A singleton list (1) can only evaluate to itself so the only possibility is [expr=1 value=1]
A list of size two just uses the expr function to generate all the expressions
Any other list builds all the possible expressions for the first two elements, and then for all of these (mapcat) calls make-expressions-helper on the rest.

Note that this just builds up the possible expressions with the numbers in this particular order. For example.


  countdown>(make-expressions-helper '(1 2 3))
  ({:expression (+ (+ 1 2) 3), :value 6} {:expression (/ (+ 1 2) 3), :value 1}
   {:expression (* (+ 1 2) 3), :value 9} {:expression (+ (* 1 2) 3), :value 5}
   {:expression (* (* 1 2) 3), :value 6})

  countdown> (count (make-expressions-helper '(1 2 3 4 5 6)))
  118

So now we need to apply the helper function to all possible combinations. Thankfully, Clojure Contrib already has a few combinatorics algorithms. permutations returns a lazy list of all possible permutations of the supplied list.


(defn make-expressions [lst]
  (if (nil? lst)
    nil
    (lazy-cat
     (mapcat make-expressions-helper (permutations lst))
     (mapcat make-expressions (drop-one lst)))))

So this algorithm applies the helper function to all permutations of the input, and then applies itself to all combinations of the remainder of the list. drop-one is a helper function which gives a list of all combinations of a list without one element.

So how many valid Countdown expressions are there?


  countdown> (count (make-expressions '(1 2 3 4 5 6)))
  300290

  countdown> (time (count (make-expressions '(1 7 8 25 50 75))))
  "Elapsed time: 2653.618442 msecs"
  268175

Note that the number is different because we rule out cases which result in floating point or negative numbers. The elapsed time is just under three seconds which is pretty fast! Remember that this time includes all the calculation of the results too, not just generating the expressions. So finally, all we need is a solver function.


(defn solve 
  "Solve the countdown problem"
  [numbers target]
  (filter (fn [x] (= (x :value) target)) (make-expressions numbers)))

This will return all the combinations that lead to the right results. Let's try it out with a toy examples:


  countdown> (time (solve '(4 5 6) 15))
  "Elapsed time: 0.281907 msecs"
  ({:expression (+ (+ 4 5) 6), :value 15} {:expression (+ (+ 4 6) 5), :value 15}
   {:expression (+ (+ 5 4) 6), :value 15} {:expression (+ (+ 5 6) 4), :value 15}
   {:expression (+ (+ 6 4) 5), :value 15} {:expression (+ (+ 6 5) 4), :value 15})

Notice that we've returned all possible + expressions that make 15. We've not taken any notice of the commutative properties of addition. Taking advantage of these properties is explored in "The Countdown Problem" [PDF] by Graham Hutton.

How does it fair on bigger solutions?


  countdown> (time (solve '(7 5 9 25 40 10) 753))
  "Elapsed time: 222.632493 msecs"
  {:expression (- (* (- (- (* 5 25) 9) 40) 10) 7), :value 753}

With the code as it stands we could add additional operators (exponent for example) without any code changes, but more operators would probably require something more sophisticated than brute force.

As usual, any suggestions for making the code clearer (or finding any bugs!) are greatly appreciated. Full code is on my Git repository.

Labels: clojure

# posted by Jeff Foster : 23:57 0 Comments Links to this post

Saturday, 14 February 2009

Huffman Encoding

Huffman Encoding is a simple technique for lossless data compression. The idea is simple; replace frequently occuring symbols with short bit patterns and infrequently occuring symbols with longer ones.

Firstly, we must produce a frequency table that gives weights for each symbols. ~~Here's a version~~ (update 14/2/9) Turns out there was a much better implementation of doing this (frequencies) already in Clojure contrib so I'll use that and save the world from seeing my bad version. Notes on why it was bad at end.

For example:


user> (frequencies "aaaabbbbcdddde")
{\e 1, \d 4, \c 1, \b 4, \a 4}

Once we've got the frequencies, we can construct a Huffman Coding Tree. The algorithm description (from Wikipedia) is:

Create a leaf node for each symbol and add it to the priority queue.
While there is more than one node in the queue:
1. Remove the node of highest priority (lowest probability) twice to get two nodes.
2. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
3. Add the new node to the queue.
The remaining node is the root node and the tree is complete.

This tree has the property that the path to each node has a unique prefix. We can translate this directly into Clojure as:


(defn coding-tree 
  "Given an ordered frequency list, create an encoding tree"
  [open]
  (prn open)
  (if (> (count open) 1)
    (let [new-node (apply tree-node (take 2 open))]
      (recur (add-to-queue new-node (drop 2 open))))
    (first open)))

Where add-to-queue simply inserts a node in the right place. See huffman.clj for full code.

The coding tree isn't enough on its own, we have to change this in to a map from symbol to bit-pattern. To get the bit pattern for any node we start from the root and follow a route to the symbol in question. When we take a left node we get a "1" and a right branch gets a "0". The lookup function takes an encoding tree and flattens it into a map.


(defn- lookup-helper
  [tree path]
  (if (nil? tree)
    nil
    (let [v (first (first tree))]
      (lazy-cat (if (= v \*) nil (list [v path] ))
                (lookup-helper (left-node tree)  (cons 0 path)) 
                (lookup-helper (right-node tree) (cons 1 path))))))

(defn lookup
  [tree]
  (into {} (lookup-helper tree nil)))

Lazy functions ensure that we don't get a stack overflow. The defn- indicates that lookup-helper is a private function.

Finally we need a function that given a sequence and an encoding table gives us the encoded series of bits.


(defn huffman-compress 
  [s table]
  (mapcat (partial get table) s))

Note that the sequence and the encoding table don't have to be the same. If, for example, the data to compress was in the English language, then you could use a known Huffman table based on Frequency Analysis of a typical corpus.

So how much compression can we get? Let's look at an example:



  user> (let [x "busy busy bee"]
       (compress x (huffman-coding-table x)))
  (1 0 0 0 1 1 1 0 1 1 1 0 1 1 0 0 0 1 1 1 0 1 1 1 0 1 1 0 0 0 1 0 0 1)

  user> (count *1)
  34

So "busy busy bee" encoded to 34 bits (*1 is used to refer to the last evaluated expression at the REPL). Compared to the 13*8 bits this would take with ASCII this is a good saving. How do we fair with bigger texts? Let's try Hamlet.


  user> (time (let [x (slurp "/home/jfoster/Desktop/2ws2610.txt")]
   (count (compress x (huffman-coding-table x)))))
  "Elapsed time: 592.317906 msecs"
  921595
  
  user> (* 8 (count (slurp "2ws2610.txt")))
  1544656

A pretty big saving again (down from ~1.5 million bits to 900000 bits). Note that in all these savings I'm not including the size of the tree!

In this use a symbol is a character, we could use words instead to get bigger savings (we wouldn't have to change the code at all). PKZIP use Huffman in their arsenal of compression techniques (see LZ77 and LZ78 for other examples).

As a side note, why was my version of frequencies less than good? (bad version preserved for posterity here).

I didn't know assoc took an optional argument representing the default, this means I could have avoided special casing the first occurrence of a symbol.
I didn't use a standard idiom (reduce) when I could have done. Counting frequencies is simply taking a big sequence and making a different one.

I should spend some more time reading source code - any other improvements that you can see are gratefully accepted!

Labels: clojure

# posted by Jeff Foster : 10:19 0 Comments Links to this post

Wednesday, 11 February 2009

Base 64 Decoding

For completeness.


(defn decode-num
  [num]
  (let [a (bit-and num 255)
        b (bit-shift-right (bit-and num 65280) 8)
        c (bit-shift-right (bit-and num 16711680) 16)]
    (list (char c) (char b) (char a))))
    
(defn decode
  "Lazily decode a sequence from base64"
  [s]
  (when-not (nil? s)
    (let [x (map (fn [x] (.indexOf *encode-table* (int x))) (take 4 s))
          num (+ (nth x 3) (bit-shift-left (nth x 1) 6) (bit-shift-left (nth x 2) 12) (bit-shift-left (nth x 0) 18))]
      (lazy-cat (decode-num num) (decode (drop 4 s))))))

Obviously base 64 decoding is a just what we did previously, only backwards!


  user> (apply str (decode (encode (decode (encode "The quick brown fox jumped over the lazy dog.")))))
 "The quick brown fox jumped over the lazy dog."

Labels: clojure

# posted by Jeff Foster : 00:01 0 Comments Links to this post

Sunday, 8 February 2009

Bit Shifting in Clojure

Base64 encoding is a way of converting a stream of binary data into a printable form. The name comes from the 64 allowable characters ([a-z][A-Z][0-9]+/=) that are used.

The algorithm is very simple. Get 3 bytes at a time (if you can't, just pad with a character, typically =), munge them together (making 24 bits). We then split this 24 bits into 4 lots of 6 bits which allows us to pick one of the 64 allowable characters. This involves dealing with a few of the bit operators in Clojure, as shown below:


(def *encode-table*
  "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/=")

; Daft and way too slow
(defn encode-num 
  [num]
  (let [a (bit-and num 63)
        b (bit-shift-right (bit-and num 4032) 6)
        c (bit-shift-right (bit-and num 258048) 12)
        d (bit-shift-right (bit-and num 16515072) 18)]
    (map (fn [x] (nth *encode-table* x )) (list d b c a))))

(defn str-pad [x size padchar]
  (let [d (rem (count x) size)]
    (if (zero? d)
      x
      (concat x (take (- size d) (repeat padchar))))))

                   
(defn encode 
  "Lazily encode a sequence as base64"
  [s]
  (if (nil? s)
    nil
    (let [x (map int (str-pad (take 3 s) 3 \=))
          num (+ (nth x 2) (* 256 (nth x 1)) (* 256 256 (first x)))]
      (lazy-cat (encode-num num) (encode (drop 3 s))))))

The magic numbers in the bit-and allow us to select the right parts of the integer (63 is 111111, 4032 is 111111000000 and so on). This could be improved substantially with more bit-twiddling (see for example String Encoders).

One nice property of this is that by using lazy-cat we can deal with infinite sequences (just as long as you don't try and print the result!)


user> (apply str (take 10 (encode (range 0 1000000000000000000))))
"AEACAQwFBc"

In case you're wondering, I am really scraping the bottom of the barrel for little programming tasks to learn something new at the moment! I've ordered myself a copy of The Princeton Companion to Mathematics which'll hopefully provide more inspiration when it arrives.

(Update 24/2/2009) As Cubix pointed out, there's a bug in the code above as padding should be applied after encoding, not before. The full code is in the comments, but the main point is to change the code such that if we don't get three elements we don't do any encoding and instead use a helper function to do the last bytes. I'm sure there must be a further improvement as padding duplicates functionality in encode.


(defn encode
  "Lazily encode a sequence as base64"
  [s]
  (if s
    (let [x (map int (take 3 s))]
      (if (= 3 (count x))
        (let [num (+ (nth x 2) (* 256 (nth x 1)) (* 256 256 (first x)))]
          (lazy-cat (encode-num num) (encode (drop 3 s))))
        (padding x))))) ;;; helper function, see comments

Labels: clojure

# posted by Jeff Foster : 08:19 9 Comments Links to this post

Monday, 2 February 2009

Clojure Macros

Macros are one of the defining features of Lisp languages. A macro operates prior to compilation, allowing you to shape the code as you wish. Since Lisp code is homoiconic the macro language is the language.

Languages like C/C++ have macros, but they are in no way the same. You have a very limited language and you don't get the fine grained access to the code that you do with Lisp (since the code is just itself a Lisp data structure).

Pl Patterns tries to describe a taxonomy of macro use. One of the examples is debug printing with an example from the Arc language. Converted to Clojure this looks like this:


(defmacro dbg-prn
  "Debugging form that prints out results"
  [& more]
  `(let [start# ~more]
     (print '~more "==>" start# "\n")
     start#))

defmacro defines a new macro with similar structure to defn The ` is used to create a template expression, where we can evaluate certain items within the expression by using macro characters (#,~,`,list-frag?). In this example we used start# to create a uniquely named value for the let expression and ~more to evaluate the parameters.

So now when I'm trying to debug code there's no more repetition of print x, return x I can just edit my function definition by adding dbg-prn (without the normal hardship of wrapping extra brackets around).


  user> (dbg-prn + 1 2 3 4 5)
  (+ 1 2 3 4 5) ==> 15  ;; printed to std-out
  15

  user> (dbg-prn + (* 2 3) (* 4 5))
  (+ (* 2 3) (* 4 5)) ==> 26 
  26

You can use macroexpand-1 to expand out macros to see what they actually do:


  user> (macroexpand-1 '(dbg-prn + 1 1))
    (clojure.core/let [start__2150__auto__ (+ 1 1)] 
      (clojure.core/print (quote (+ 1 1)) "==>" start__2150__auto__ "\n") 
      start__2150__auto__)

Labels: clojure, macro

# posted by Jeff Foster : 22:59 0 Comments Links to this post

Sunday, 1 February 2009

Mandlebrot Fractals

The Mandlebrot Set is probably the most famous set of fractals. The maths behind it is dead simple and with just a few lines of code you can get some impressive results.

For any given pixel, you can work out a colour value thus:

(def *max-iteration* 512)

(defn process-pixel [x y]
((fn [x y xc yc accum]
(let [x1 (+ (- (* x x) (* y y)) xc)
y1 (+ (* 2 x y) yc)
sq (+ (* x1 x1) (* y1 y1))]
(cond
(> accum *max-iteration*) *max-iteration*
(> sq 2.0) accum
:else (recur x1 y1 xc yc (inc accum))))) x y x y 0))

The harder part is translating a number between 0 and *max-iteration* into a decent range of colours. I'll ignore this for now and stick with green!

Source code for version 0.1 is here. Next on the list:

Make it look like the Math - write complex number library
Make it run fast - optimize (uses more cores, minimize type coercions)
Make it look nice - find a better colour mapping function

Labels: clojure, fractals

# posted by Jeff Foster : 11:30 0 Comments Links to this post

Tuesday, 27 January 2009

Passing Parameters in Clojure

Clojure has support for a number of ways of passing parameters. A function body can have multiple function bodies associated with it, each with different arity. This isn't any where near as general as Haskell's pattern matching (see [1]).


  user> (defn foo ([x] x) ([x y] (+ x y)) ([x y z] (+ x y z)))
  #'user/foo
  user> (foo 1)
  1
  user> (foo 1 2)
  3
  user> (foo 1 2 3)
  6

Similarly, you can make a var arg function with the & notation.


  user> (defn bar ([x] x) ([x & rest-args] (reduce + (cons x rest-args))))
  #'user/bar
  user> (bar 1)
  1
  user> (bar 1 2 3 4 5 6 7)
  28

The last option for parameter passing is keywords, the parameter list becomes a map. This is discussed in more detail here.

Labels: clojure

# posted by Jeff Foster : 00:13 0 Comments Links to this post

Wednesday, 21 January 2009

Multi-methods in Clojure

Multimethods (more commonly known as multiple dispatch) means that a method call is dispatched dynamically based upon the type of multiple arguments.

In Java / C++ you'd typically approach this with the visitor pattern, with something like the following.


  interface Foo {
    // ... lots of exciting methods
    void accept( FooVisitor visitor );
  }

  interface FooVisitor {
    void visit( Bar bar );
    void visit( Baz baz );
  }

  class Bar implements Foo {

    //... implementation

    public void accept ( FooVisitor visitor ) {
      visitor.visit( this );
    }
  }

  class Baz implements Foo {

    //... implementation
    
    public void accept( FooVisitor visitor ) {
      visitor.visit( this );
    }
  }

Why's this not good? Well, the FooVisitor has to know about all the subclasses, which is pretty nasty. In addition, well, it's just a lot of typing - way too much scaffolding. Languages like Smalltalk and Lisp both support multi-methods natively which give you all the benefits of this "pattern", without the crazy verbosity of the Java version.

Clojure multimethods support dispatching on variables other than the runtime type of an argument. From the Clojure docs, you can dispatch on "types, values, attributes and metadata of, and relationships between, one or more arguments".

Clojure multi-methods consist of two things:

Here's a trivial example defining an increment function that dispatches on the class of the arguments. class is a built in function within Clojure core that gives the class of the object.


  (defmulti my-increment class)

  (defmethod my-increment String [s]
    (let [x (Integer/valueOf s)]
      (str (my-increment x))))

  (defmethod my-increment Integer [i]
    (+ i 1))

  user> (my-increment 4) ==> 5
  user> (my-increment "4") ==> "5"

But, we do not just have to be boring and dispatch only on the type of the argument, we could be dispatching on the value:


  (defmulti my-decrement identity) ;; identify is built in (fn [x] x)

  (defmethod my-decrement 0 [x]
    99999)

  (defmethod my-decrement :default [x]
    (- x 1))

  user> (my-decrement 2) ==> 1
  user> (my-decrement 1) ==> 0
  user> (my-decrement 0) ==> 99999

The dispatching function isn't limited to just the first argument, you could dispatch on the type of multiple arguments e.g.


  (defmulti my-add (fn [x y] (and (string? x) (string? y))))

  (defmethod my-add true [x y]
    (str x y))

  (defmethod my-add false [x y]
    (+ x y))

  user> (my-add 3 4) ==> 7
  user> (my-add "3" "4") ==> "34"

This gives about the most general method of method dispatch imaginable!

The documentation explains how this goes further still, allowing relationships to be created with derive and how you can add/remove/define an ordering implementations of methods at runtime.

Reading Programming Clojure shows that multi-methods are very rarely used in most of the current Clojure code (only about 1 occurrence per thousand lines of code). There's some guidelines in the book about choosing whether to use multi-methods, but it's summed up best with "try writing the function in both styles, and pick the one that seems more reliable".

Labels: clojure

# posted by Jeff Foster : 23:07 1 Comments Links to this post

Tuesday, 20 January 2009

Compojure

Compojure is a web-framework for Clojure.

Grab the latest source from Git and build with Ant.


  git clone git://github.com/weavejester/compojure.git
  cd compojure
  ant

Now you've got to add Compojure into your classpath. I use Emacs and Slime so I edited my .emacs file based on some suggestions from IRC and got (in addition to the usual)


(swank-clojure-config
 (setq swank-clojure-jar-path (concat clj-root "clojure/trunk/clojure.jar")))
 (setq swank-clojure-extra-classpaths (directory-files "~/lisp/clj/" t ".jar$"))

When ~/lisp/clj contains all my JARs (well symlinks anyway).

After this, I wasted (well wasted probably isn't true, I got some new knowledge) precious minutes of my life wondering why I couldn't get anything working, but on the plus side I found some functions that helped:


  user> (all-ns)
  ;; big list of name spaces, check that you've got Compojure on here!

  user> (System/getProperty "java.class.path")
  ;; big class path list, make sure you have Compojure + all the dependent JARs

Then, I got a bit lost because I'd mismatched versions of compojure.jar and clojure.ar! Rather than using the version of Clojure that came with Compojure, I was using the head revision - bad idea! Use a consistent set of JARs that come from the git repository.

And now I'm able to compile the examples from the Wikibook and all is well.

I also decided to finally by the Programming Clojure book, now all I need is a decent e-book reader!

Labels: clojure

# posted by Jeff Foster : 12:18 0 Comments Links to this post

Wednesday, 14 January 2009

Using Agents

I've had a hard time groking Clojure agents, so this is mostly just a series of daft micro-examples to understand them together with restating the bleeding obvious.

An agent is created with agent and the associated data. You can use deref or @ to access the data associated with an agent (same syntax as I mentioned previously for atoms).


  user> (agent '(1 2 3 4 5 6))
  #
  user> @(agent '(1 2 3 4 5 6))
  (1 2 3 4 5 6)

Agents are reactive - they'll only do something if you tell them so. All communication with agents is through functions. There are two commands for sending functions to agents, send (used for non-blocking calls) and send-off (used for potentially blocking calls). Both send and send-off return immediately, the difference being that send-off guarantees the message will be processed in a different thread.


  user> (let [a (agent 4)]
   (send a + 1) ; schedules a call of (apply + agent_state 1)
   (await a)
   (prn @a))
  5
  nil

Without the invocation of await, this may return 4 not 5. await blocks the current thread indefinitely until all actions have been completed (which makes it quite dangerous!).

What if the function you are sending has errors? Let's look at a divide by zero:


  user> (let [a (agent 4)]
     (send a / 0)
     (await a)
     (prn @a))
  ; Evaluation aborted.
  java.lang.Exception: Agent has errors (NO_SOURCE_FILE:0)
    [Thrown class clojure.lang.Compiler$CompilerException]

Errors in agents can be inspected with agent-errors which returns a sequence of exceptions. Once an agent is in an error state it can not process any more messages until the errors have been cleared with clear-agent-errors.


  user> (let [a (agent 4)]
   (send a / 0)
   (await a)
   (prn (agent-errors a))
   (clear-agent-errors a)
   (prn @a))

  (#)
  4
  nil

So agents seem incredibly simple - why are they so powerful?

Integration with STM, which leads too...
Always observable - no matter what calculation is going on, you can always get the current (valid) value with deref.
No message-loops - agents are purely reactive which (to me) seems a simpler programming model to understand
Open - agents are mutated by functions and therefore this can be extended by anyone (in contrast with having to add a new type of message for the loop to switch on)

Labels: clojure

# posted by Jeff Foster : 23:09 0 Comments Links to this post

Monday, 12 January 2009

Metadata in Clojure

Each Clojure function can have a documentation string associated with it. This must be declared immediately before the parameters, for example:


(defn foo-baz 
  "Foo-bazes bar"
  [bar]
  (+ bar 77))

user> (doc foo-baz)
-------------------------
user/foo-baz
([bar])
  Foo-bazes bar

This is actually short hand for:


(defn #^{:doc "Foo-bazes bar"} foo-baz 
  [bar]
  (+ bar 77))

#^ indicates to the reader that the following map is metadata to be associated with the next element read. You can use this to document def'd values too (e.g. (def #^{:doc "This is the value x"} x).

Metadata is associated with symbols or collections, and meta can be used to view the associated values. Initially I was doing (meta count) and expecting it to come up with the details about agent. This is wrong because count is an instance of the function, whereas what I need to pass as an argument to meta is the symbol associated with the function e.g.


user> (meta count)
nil
user> (meta #'count) ; ^#'count also works (^ is a reader character to get the metadat)
{:ns #, :name count, :doc "Returns the number of items in the collection.
(count nil) returns\n  0.  Also works on strings, arrays, and Java Collections and Maps", 
:arglists ([coll]), :line 842, :file "core.clj"}

Note how extra metadata has arrived when it was compiled, including the line number of the original file, name space etc.

There are many cool potential applications of metadata in Clojure, though I haven't seen any implemented yet!:

Aspect oriented programming (side note inspiration for this came from Lisp Advice (see Emacsspeak mentioned in the book Beautiful Code.)
Software tools - knowing the line number and file helps in a number of ways, maybe could be used to do semantic diff (for example, re-ordering my functions wouldn't make any difference because they'd semantically still be the same).

Labels: clojure, metadata

# posted by Jeff Foster : 23:19 2 Comments Links to this post

Friday, 9 January 2009

Generating Text

Often you need a chunk of vaguely real looking text to test some code, web layout, file handling etc. In ANSI Common Lisp (which I'm currently reading if you hadn't guessed) there's an example of how to generate random text.

The idea is simple - read in a list of words, for each pair of words keep a count of the number of occurrences of that pair. Once you've got that data, you can pick a word at random and then pick from a probability distribution what the next word should be. Apply that pattern until you've generated enough text. There's much more sophisticated work based on same ideas.

The example Lisp code is quite nasty and it's written in what feels like an iterative style. In Clojure we have access to a richer standard library (Java). For example, we can read in a list of words thus:


(defn file-as-wordlist [f]
  (filter (fn [x] (> (count x) 0)) (.split (slurp f) "\n|[ ]|\r|[.]|[,]|[\"]")))

This takes a file name as an arguments, slurps the entire file into memory and splits it using String.split

Next we need to be a frequency map which has an entry for each word, together with a count of each word that follows it. We use a map of Word => (Word => Count).


(defn build-frequency-map [words]
  (let [word-pairs (mapcat (fn [x y] (list [x y])) (cons (last words)  words) words)]
    (reduce (fn [accum v]
       (let [w1 (first v) w2 (second v) val (get accum w1)]
  (if (nil? val)
    (assoc accum w1 {w2 1})
    (let [currentVal (get val w2)]
      (if (nil? currentVal)
        (assoc accum w1 (conj val {w2 1}))
        (assoc accum w1 (conj val {w2 (inc currentVal)})))))))
     {} word-pairs)))

This is a beefy function, but I couldn't see how to simplify it. It also assumes that the last word maps to the first, which is daft but then this is only playing around! The idea is to get the two sequences, shift one to the left and then count the two in a map. For example the frequency map of Caroline Kennedy would be, well you know?, something like this:


user> (build-frequency-map ["you" "know" "it's" "great" "you" "know"])
{"know" {"it's" 1, "you" 1}, "great" {"you" 1}, "it's" {"great" 1}, "you" {"know" 2}}

So once we've generated a frequency map we need to work out what the next work should be based on the probability. "know" is followed by 1 occurrence of either "it's" or "you", so each should have a 50/50 chance of getting picked.


(defn frequency-map-count [m word]
  (let [v (get m word)]
    (if (nil? v)
      0
      (reduce (fn [x y] (+ x (second y))) 0 v))))

(defn next-word [m word]
  (let [following (get m word) p (rand-int (frequency-map-count m word))]
    ((fn [words prob]
       (let [word-count (second (first words))]
  (if (>= word-count prob)
    (first (first words))
    (recur (rest words) (- prob word-count))))) following p)))

frequency-map-count counts the number of occurrences of a word. next-word rolls a dice and selects the next word based on the number of occurrences.

Finally, all we need is a function to apply next-word


(defn generate-text [example n & [start]]
  (let [words (file-as-wordlist example) fm (build-frequency-map words)
        start-word (if (nil? start) "the" start)]
    (apply str (interpose " " (take n (iterate (partial next-word fm) start-word))))))

There's several nice things here from a functional programming point of view:
* Use of partial the function of a single argument (currying)
* Use of iterate/take to have an infinite stream of text (laziness)

So is it any good? Here's 250 words Dracula


user> (generate-text "/home/jfoster/Documents/345.txt" 250)

"the same condition of those which pretend and said Finis and I agree later period He evidently the geologic and women jumped down one whom each side manifestly as to set in them he had pressed upon us? Be not in through my poor Lucy noticed it up the section their swords can be more and had expected A great brain just finished the third person or sea I do for the belt with so sweet little before he woke bright smile of it afterwards when we did anent something like a night till tonight it and on in paragraph 1 October --As I describe when Arthur I have unchecked sway a low shrill call him what will have not friend John was no opinion there would indeed happiness were parted at rest Happily Van Helsing hypnotized me But you? Whilst we may deign to be at rest seemed to the facts of a cluster and mixed up the wide spread out one by the library of whips The soft voice gave them both quiet joy when we were they come to the big swan thought struck with my eyelids and pondered over to take away I took out of the free Nay sir bethink yourself It would like it so strange tongues fierce voice came close to bear you consated his hands Our dear Lucy after you shall not look into my hands our intent I will grace said nothing The opiate of white cloud in spirit Perhaps it"

Well, it's not going to win the Booker Prize, but it's almost believable! Full code available here

Other applications of this:

Predictive messaging

Generating auto-blog posts by "rewriting" other content

Language recognition based on work occurrence / frequency

Labels: clojure

# posted by Jeff Foster : 16:47 0 Comments Links to this post

Wednesday, 7 January 2009

Regular Expressions in Clojure

Regular Expressions are a incredibly powerful language for performing text manipulation.

ANSI Common Lisp doesn't feature regex support, though there are a large number of libraries that do (see here). Clojure uses the Java implementation of regular expressions (see the regex tutorial) and adds some nice syntactic sugar to the mix.

To define a regex pattern in Clojure you use the #"<regex>" syntax. This is actually the regex pattern, so evaluating it at the REPL gives you itself. The pattern is compiled, so putting anything nonsensical results in an error.


user> #"[0-9]+"
#"[0-9]+"

user> #"[]+"
; Evaluation aborted.
; clojure.lang.LispReader$ReaderException: java.util.regex.PatternSyntaxException: 
; Unclosed character class near index 2

The Clojure regex functions are well documented. Unfortunately, I didn't really read the documentation! re-find returns either the first match (directly as a string) OR a vector of matches if multiple matches. This is optimized for the normal case where a user enters the text and the regex is well known ahead of time e.g.


user>(re-find #"bar" "bar")
"bar"

user>(re-find #"(foo)|(bar)" "foo bar")
["foo" "foo" nil]

When learning regular expressions, I found Regex Coach invaluable (which is actually Lisp powered!). It's an application that lets you immediately see how a regular expression matches some text. Let's do the basics of this in Clojure.

You have two areas, one for the regular expression and one for the text. As you press keys the matching regular expression (if any) is highlighted in the text area.

Firstly we need a function that given a regular expression and some text returns where to do some highlighting:


(defn first-match [m]
  (if (coll? m) (first m) m))

(defn match [regex text]
  (let [m (first-match (re-find (re-pattern regex) text))]
    (if (nil? m)
      [0 0]
      (let [ind (.indexOf text m) len (.length m)]
 [ind (+ ind len)]))))

first-match is a helper function that gives the first match (handling the case of re-find returning multiple entries). match just gives you a vector [x y] representing the index of the match.

Next a bit of UI that features a couple of new things:

$ syntax for inner classes (a Java thing)

Exception handling

Each time a key is pressed match returns the area to highlight and Highlighter does the rest. The exception handling ensures that if there was an error compiling the regex then that's printed on the status bar.


(defn regexcoach []
  (let [frame (JFrame. "Regular Expression Coach") pane (JPanel.) regexText (JTextField.) 
 targetText (JTextField. "")
 statusBar (JLabel. "Match from 0 to 0")
        keyHandler (proxy [KeyAdapter] [] 
       (keyTyped [keyEvent] 
         (try       
   (let [m (match (.getText regexText) (.getText targetText)) 
         hl (.getHighlighter targetText)
         pen (DefaultHighlighter$DefaultHighlightPainter. Color/RED)]
     (.removeAllHighlights hl)
     (.addHighlight hl (first m) (second m) pen)
     (.setText statusBar (format "Match from %s to %s" (first m) (second m))))
   (catch PatternSyntaxException e (.setText statusBar (.getMessage e))))))]
    (doto regexText
      (.addKeyListener keyHandler))
    (doto targetText
      (.addKeyListener keyHandler))
    (doto pane
      (.setLayout (BoxLayout. pane BoxLayout/Y_AXIS))
      (.add (JLabel. "Regular Expression"))
      (.add regexText)
      (.add (JLabel. "Target String"))
      (.add targetText)
      (.add statusBar))
    (doto frame
      (.add pane)
      (.setSize 300 300)
      (.setVisible true))))

Full code here.

Labels: clojure, regex

# posted by Jeff Foster : 14:18 0 Comments Links to this post

Tuesday, 6 January 2009

Game of Life Part II

A helpful comment on my previous blog entry suggested that instead of crazy zip rubbish, I could instead view the grid as a Point => Value mapping. Is that a good change to make?

Here's the changes from the previous version. Full code on my Git repository.


(defstruct point :x :y)

(defn world-at [world point]
  (get world point))

(defn toggle-pos [world point]
  (if (zero? (world-at world point))
    (assoc world point 1)
    (assoc world point 0)))

(defn neighbours [p]
  (let [x (:x p) y (:y p)]
    [(struct point (dec x) (dec y)) (struct point x (dec y)) (struct point (inc x) (dec y))
     (struct point (dec x) y) (struct point (inc x) y)
     (struct point (dec x) (inc y)) (struct point x (inc y)) (struct point (inc x) (inc y))]))

(defn neighbour-count [world p]
  (reduce + (map (fn [x] (let [v (world-at world x)] (if (nil? v) 0 v))) (neighbours p))))

(defn new-state [world p]
  (let [neighbours (neighbour-count world p) alive (world-at world p)]
    (cond 
     (and (= alive 1) (< neighbours 2)) 0 ;; under population
     (and (= alive 1) (> neighbours 3)) 0 ;; over-crowding
     (and (= alive 1) (or (= 2 neighbours) (= 3 neighbours))) 1 ;; unchanged to the next generation
     (and (= 3 neighbours)) 1 ;; any tile with exactly 3 live neighbour cells becomes alive
     :else 0)))

(defn life-step [w]
  (into (hash-map) (map (fn [x] [(first x) (new-state w (first x))]) w)))

(defn create-world [w h]
  (let [x (range 0 w) y (range 0 h)]
    (apply hash-map (mapcat (fn [a] (mapcat (fn [b] (list (struct point a b) 0))  y)) x))))

Most of the functions above are much clearer (apart from create-world) than they were previously. In addition the SLOC has decreased from 74 to 66, so the code is more concise too.

I did sacrifice sparseness for neatness. I needed to have the values populated as dead such that life-step could just be written as a map function. If the values didn't exist, I'd have to create something from nothing. In this case, I think the trade off is OK.

Overall, I think this is a definite improvement over the previous version.

Labels: clojure

# posted by Jeff Foster : 00:01 0 Comments Links to this post

Sunday, 4 January 2009

Ray Tracing in Clojure (Part II)

In my previous post, I ported across a ray-tracing example from ANSI Common Lisp. Now, I'll try and work out how threading functions work.

Each area of the viewing image is independent, so let's divide up the viewing surface into a number of independent buffered images. We have a little helper function which given a width, height and tile size gives us back a list of positions to draw at.


(defn create-work-list [width height unitX unitY]
  (let [xs (range 0 width unitX) ys (range 0 height unitY)]
    (mapcat (fn [x] (mapcat (fn [y] (list (list x y))) ys)) xs)))

For example, if we want to draw use a tile size of 150x150 with a viewing surface of 300x300 we get the following offsets:


user> (create-work-list 300 300 150 150)
((0 0) (0 150) (150 0) (150 150))

Now we should make the ray-trace function take notice of these co-ordinates. The previous version of the ray-trace function wasn't very functional as it performed IO (drawing to the canvas). Side-effects are the enemy of referential transparency.


(defn ray-trace [world w h ox oy]
  (let [buffered-image (BufferedImage. w h BufferedImage/TYPE_BYTE_GRAY)]
    (doseq [x (range 0 (dec w))]
      (doseq [y (range 0 (dec h))]
 (.setRGB buffered-image x y (color-at (+ x ox) (+ y oy)))))
    buffered-image))

Now the function is pure because it always gives the same output for the same input and no IO occurs.

Finally, we need to distribute this work across all the cores. Clojure has a built in function, pmap that works the same as map but in parallel.

Initially I performed the drawing to the canvas within a pmap expression - this is very wrong! pmap is lazy, it's not evaluated unless it is needed. Clojure doesn't know that I intended that to always been evaluated, so it was only ever drawing the first four tiles (presumably because that's how many cores my desktop has).

If you want to force the evaluation there's a number of functions which do that:

doseq
dorun
doall

In this case I decided I'd use ray-trace in a pmap expression to produce a list of images, and then use doseq to perform the side effects.


(def canvas (proxy [JPanel] []
  (paintComponent [g]
    (proxy-super paintComponent g)    
    (.setColor g Color/RED)
    (let [width (.getWidth this) height (.getHeight this) unitX (/ width 16) unitY (/ height 16) work-list (create-work-list width height unitX unitY)]
      (doseq [image (pmap (fn [pos] (list (apply ray-trace (list world unitX unitY (first pos) (second pos))) (first pos) (second pos))) work-list)]
 (.drawImage g (first image) (second image) (nth image 2) unitX unitY nil))))))

The separation of IO and pure functions is something advocated in "A wish list for the next mainstream programming language". Clojure doesn't force this, whereas something like Haskell does. Haskell uses monads to achieve this, which is something I'll visit at some point. See LtU for some explanations.

Ok, enough theory - what difference does this actually make? Well for me, about a 4x difference, and I have 4 cores, so that's good! Timing's aren't that exciting, but you can see the difference with the system monitor.

Pretty much full CPU utilization across all cores, with only a few lines of code changed, funky!

Labels: clojure, ray-tracing

# posted by Jeff Foster : 09:09 0 Comments Links to this post

Saturday, 3 January 2009

Ray Tracing in Clojure

ANSI Common Lisp, by Paul Graham, is a great introduction to Lisp. Lots of compelling examples on run-length encoding (see previous post), poetry generation and ray-tracing.

This post looks at translating the example (from Chapter 9) into Clojure.

Ray-tracing is a very simple technique. From ACL:
"To generate a 3D image, we need to define at least four things: an eye, one or more light sources, a simulated world consisting of one or more surfaces, and a plane (the image plane) that serves as a window onto this world. The image we generate is the projection of the world onto a region of the image plane."

So how do we generate the pictures? That's pretty simple too, all we do is for every pixel in the image plane just trace the ray that the eye would see from there. Done.

We start off by defining some simple maths functions and an object to represent a point in 3D space.


(defn square [x] (* x x))

(defstruct point :x :y :z)

(defn magnitude [p]
  (Math/sqrt (+ (square (:x p)) (square (:y p)) (square (:z p)))))

(defn unit-vector [p]
  (let [d (magnitude p)]
    (struct point (/ (:x p) d) (/ (:y p) d) (/ (:z p) d))))

(defn point-subtract [p1 p2]
  (struct point 
   (- (:x p1) (:x p2))
   (- (:y p1) (:y p2))
   (- (:z p1) (:z p2))))

(defn distance [p1 p2]
  (magnitude (point-subtract p1 p2)))

(defn minroot [a b c]
  (if (zero? a)
    (/ (- c) b)
    (let [disc (- (square b) (* 4 a c))]
      (if (> disc 0)
 (let [discroot (Math/sqrt disc)]
   (min (/ (+ (- b) discroot) (* 2 a))
        (/ (- (- b) discroot) (* 2 a))))))))

The original Lisp code mixed the point structure with individual values. I felt this made the code a bit ugly and hard to read, so in here we try to use the point structure as much as possible. (struct point 1 2 3) feels like clunky syntax, but I was unable to find anything better. Perhaps an alternative is to just use a plain vector / map? Or wait for the future and see if struct support improves?

Anyway, the code above is self explanatory, minroot is the big one and that's just a solver for the quadratic equation. function.

Next we need to define some of the environment. For this we'll fix the image plan between (0,0) and (300,300) and we'll just render spheres. Each surface has a grey-scale colour associated with it (a surface).


(def eye (struct point 150 150 200))

(defstruct surface :color)

(defstruct sphere :color :radius :centre)

(defn defsphere [point r c]
  (struct sphere c r point))

(def world [(defsphere (struct point 150 150 -600) 250 0.32)
            (defsphere (struct point 175 175 -300) 100 0.64)])

One thing I did find was that Clojure doesn't support the :include for structs that Common Lisp does. For this example, the world is a couple of spheres one smaller than the other and in front (and slightly brighter).

The following functions determine where a sphere gets hit with a ray from a specific source (in this case a point) and the surface normal of a hit.


(defn sphere-normal [s pt]
  (let [c (:centre s)]
    (unit-vector (point-subtract c pt))))

(defn sphere-intersect [s pt ray]
  (let [c (:centre s)
 n (minroot (+ (square (:x ray)) (square (:y ray)) (square (:z ray)))
     (* 2 (+ (* (- (:x pt) (:x c)) (:x ray))
      (* (- (:y pt) (:y c)) (:y ray))
      (* (- (:z pt) (:z c)) (:z ray))))
     (+ (square (- (:x pt) (:x c)))
        (square (- (:y pt) (:y c)))
        (square (- (:z pt) (:z c)))
        (- (square (:radius s)))))]
 (if n
   (struct point (+ (:x pt) (* n (:x ray)))
                        (+ (:y pt) (* n (:y ray)))
   (+ (:z pt) (* n (:z ray)))))))

sphere-intersect can return nil if it doesn't hit. Now we define the Lambert function


(defn lambert [s intersection ray]
  (let [normal (sphere-normal s intersection)]
    (max 0 (+ (* (:x ray) (:x normal))
       (* (:y ray) (:y normal))
       (* (:z ray) (:z normal))))))

That's it for the machinery to actually generate the image - now we need some UI and something to actually draw it. The original code generated a PPM format image, but since Clojure has a decent UI toolkit with Swing, let's just render something in a Window instead. The UI just uses the "canvas" idiom I used for the bubble sort application.


(def canvas (proxy [JPanel] []
  (paintComponent [g]
    (proxy-super paintComponent g)    
    (.setColor g Color/RED)
    (ray-trace world 1 g (.getWidth this) (.getHeight this)))))

(defn raytraceapp []
  (let [frame (JFrame. "Ray Tracing")]
    (doto frame
      (.add canvas)
      (.setSize 300 300)
      (.setResizable false)
      (.setVisible true))))

All that remains is to define the ray-trace function


;; second item = what we hit
;; first item = where we hit
(defn first-hit [pt ray]
  (reduce 
   (fn [x y]
     (let [hx (first x) hy (first y)]
       (cond
 (nil? hx) y
 (nil? hy) x
 :else (let [d1 (distance hx pt) d2 (distance hy pt)]
    (if (< d1 d2) x y)))))
   (map (fn [obj]
   (let [h (sphere-intersect obj pt ray)]
     (if (not (nil? h))
       [h obj]))) world)))

(defn send-ray [src ray]
  (let [hit (first-hit src ray)]
    (if (not (nil? hit))
      (let [int (first hit) s (second hit)]
 (* (lambert s ray int) (:color s)))
      0)))

(defn color-at [x y]
  (let [ray (unit-vector (point-subtract (struct point x y 0) eye))]
    (* (send-ray eye ray) 255)))

(defn ray-trace [world res g w h]
  (let [buffered-image (BufferedImage. w h BufferedImage/TYPE_BYTE_GRAY)]
    (doseq [x (range 1 w)]
      (doseq [y (range 1 h)]
 (.setRGB buffered-image x y (color-at x y))))
    (.drawImage g buffered-image 0 0 Color/RED nil)))

The only major difference between this and the ACL code, is prefering to use map and reduce instead of the nested do code. This feels more functional to me and also opens up parallelism opportunities which I'll look at for the next post.

So what does it look like (not very good, but a background of code looks cool!)?

Labels: clojure

# posted by Jeff Foster : 13:18 0 Comments Links to this post

Tuesday, 30 December 2008

Mutants!

Part of the rationale of Clojure is to get rid of free-for-all mutation: "for the concurrent programming future, pervasive, unmoderated mutation simply has to go."

Atoms are a way of performing mutation in a safe way, free from any potential race conditions.

An atom is associated with a Clojure value, in this case an integer.


  (def position (atom 0))

If you want to observe the value, use deref or the @ notation.


  user> (prn @position)
  0
  nil
  user> (prn position)
  #
  nil
  user> (prn (deref position))
  0
  nil

So how do you actually change the value inside the atom? swap! and compare-and-set! change the values of atoms. Both have the ! (bang) suffix to indicate that they are destructive operations.

swap! takes two arguments, the atom itself and a function to apply to the value and swap with the current value of the atom. For example:


  user> (swap! position inc)
  1
  user> (swap! position inc)
  2

compare-and-set! (CAS)is a lower level function which sets the value of an atom given the original value and the new. The value is only changed if it is observed first to be equal to the original value. A Boolean return value indicates whether the atom was changed.


  user> (swap! position (fn [x] 0))
  0
  user> (compare-and-set! position 0 1)
  true
  user> (compare-and-set! position 0 2)
  false

I'll use atoms to change state when animating my little sorting visualization.

Labels: clojure

# posted by Jeff Foster : 12:50 0 Comments Links to this post

Monday, 29 December 2008

Clojure short-hand

After reading through some Clojure code and talking to people on IRC, I found a few ways of making code a bit more concise.

#(+ %1 %2) is short hand for (fn [x y] (+ x y)). For example, this is valid:


  user> (map #(+ %1 %2) (range 0 4 ) (range 0 4))
  (0 2 4 6)

Documentation for this is squirreled away on the Clojure Reader page.

Another useful function I didn't know was into which allows you to create a new data structure from an existing one. When you create literal sets, the reader creates hash types (i.e. unordered).


  user> #{1 2 3 4 5 6 7 98 100}
  #{1 2 98 3 4 100 5 6 7} ;; look the ordering has changed.

Using into, I can change that:


  user> (into (sorted-set) #{1 2 3 4 5 6 7 98 100})
  #{1 2 3 4 5 6 7 98 100}

Going back to my sort app, based on the Snake game code previously mentioned, it seems to get a drawing panel I should subclass a JPanel and override the paint method to get a canvas to draw on.



(def maxval 100)

(def model (take 100 (repeatedly (fn [] (rand-int maxval)))))

(def canvas (proxy [JPanel ActionListener] []
  (paintComponent [g]
    (proxy-super paintComponent g)
    (.setColor g Color/RED)
    (let [width (.getWidth this) height (.getHeight this) barHeight (/ height (inc (count model))) barWidthPerVal (/ width maxval)]
      (prn width height)
      (doseq [val (into (sorted-map) (zipmap (range 0 (count model)) model))] 
 (let [y (int (* (first val) barHeight)) barWidth (int (* (second val) barWidthPerVal))]
   (.fillRect g 0 y barWidth barHeight)))))
  (actionPerformed [e] (prn "Doing something"))))

this is still a keyword in Clojure, which is something that took me a while to work out, I guess I was hoping that I could do (.getHeight) and the this would be explicit?

proxy (as I've briefly mentioned before) allows you to override methods and implement interfaces. In this case I've overridden the painComponent method of JPanel and replaced it with some gubbins to draw the list I'm trying to sort.

I've also overridden the actionPerformed method which is where I'll mutate the model with one iteration of the chosen sort method. I've decided (again based on the Snake Clojure code) to use a Swing Timer to fire events to signal when to redraw.

On a side note, encapsulating an object behind functions is nice and simple. In the style of SICP we use a local let expression to hide x from the outside world.


(let [x (Timer. 1000 canvas)]
  (defn stop-timer [] (.stop x))
  (defn start-timer [] (.start x))
  (defn is-running [] (.isRunning x)))

Now all I need to write in my daft demo app is a way of generating all the intermediate steps for sorting algorithms. For my implementation of bubble-sort this is simple (it generates a list of the intermediate bits and pieces), but for quick sort it's little bit harder.

Labels: clojure

# posted by Jeff Foster : 23:31 0 Comments Links to this post

Saturday, 27 December 2008

Building a GUI with Clojure

Well, that's my most rubbish UI ever, but at least it gave me a chance to learn a few things. Starting from the example, I made a few very simple changes and ended up with the rubbish one above....

I'm still not fulling groking exactly how "real" applications are designed in a functional programming language. For example, in this daft app what should the separation of concerns be? Does MVC have a functional counterpart?

Next on this list is changing the code (prn alg) to actually render what's going on with quicksort and bubble sort.



(import '(javax.swing JFrame JLabel JTextField JButton JComboBox JPanel Timer)
       '(java.awt.event ActionListener)
       '(java.awt GridLayout))

(defn draw-sort [canvas alg]
 (prn alg))

(defn sortapp []
 (let [frame (JFrame. "Sort Visualizer")
 canvas (JPanel. true)
 algorithm-chooser (JComboBox.)
 run-button (JButton. "Run Algorithm")]
   (.addActionListener run-button
     (proxy [ActionListener] []
 (actionPerformed [evt]
  (draw-sort canvas (.toString (.getSelectedItem algorithm-chooser))))))
   (doto algorithm-chooser
     (.addItem "Quick sort")
     (.addItem "Bubble sort"))
   (doto frame
     (.setLayout (GridLayout. 2 2 3 3))
     (.add algorithm-chooser)
     (.add canvas)
     (.add run-button)
     (.setSize 300 300)
     (.setVisible true))))

Things I have learnt:

doto syntax saves lots of repetition!
proxy is used to generate implementations of interfaces dynamically

Things I need to find out:

What's the idiomatic way of drawing something on the screen? Do I need to be using timers? Probably!

Labels: clojure

# posted by Jeff Foster : 13:22 0 Comments Links to this post

Friday, 26 December 2008

Bubbling Clojure

There's quite a few sorting algorithms around. I wanted to do something similar to Sorting Algorithms, only in Clojure + Swing as some kind of learning exercise. (99 Problems is still on-going, but to be honest it's boring (!) I'll reach the end one day...).

Perhaps the simplest is bubble sort, go through the list and if you find two adjacent items out of order, swap them. Repeat until done.


(defn bubble [lst]
(if (or (nil? lst) (nil? (second lst)))
  lst
  (if (> (first lst) (second lst))
    (cons (second lst) (cons (first lst) (nthrest lst 2)))
    (lazy-cons (first lst) (bubble (rest lst))))))

(defn bubble-sort [lst]
(if (= (bubble lst) lst)
  lst
  (recur (bubble lst))))

I couldn't find a neater way of doing it. An alternative is to use iterate to apply bubble until no more swaps are necessary. This at least makes the time complexity pretty obvious.


(defn bubble-sort [lst]
 (last (take (* (count lst) (count lst)) (iterate bubble lst))))

We can simplify this further. The bubble function sucks because it only applies a single swap and then just rebuilds up the sequence. Applying the function as we go up we get:


(defn bubble [lst]
(if (or (nil? lst) (nil? (second lst)))
  lst
  (if (> (first lst) (second lst))
    (cons (second lst) (lazy-cons (first lst) (bubble (nthrest lst 2))))
    (lazy-cons (first lst) (bubble (rest lst))))))

Quicksort is a much better algorithm. The algorithm is:

Pick a pivot X

Move all items less than the pivot to the left and all greater to the right

Apply recursively to each side

This has a very simple implementation in Clojure


(defn qsort [lst]
(if (nil? lst)
  lst
  (concat
   (qsort (filter (partial > (first lst)) (rest lst)))
   (list (first lst))
   (qsort (filter (partial <= (first lst)) (rest lst))))))

How do the two compare?

First I needed to find out how to generate a big random sequence. repeatedly applies a function of no arguments and generates an infinite list. Using repeatedly and take, you can get a list of random elements like this:


(take 500 (repeatedly (fn [] (rand-int 100)))

So now I can compare the two:


user> (time (count (bubble-sort (take 1000 (repeatedly (fn [] (rand-int 100)))))))
"Elapsed time: 575.713898 msecs"
1000
user> (time (count (qsort (take 1000 (repeatedly (fn [] (rand-int 100)))))))
"Elapsed time: 38.933079 msecs"
1000

Things I still need to work out:

Why does my bubble sort implementation blow the stack with large lists? Shouldn't the laziness mean it shouldn't? last looks daft, replace with take/drop combo
Multiple return values (or faking them). It'd be nice to stop swapping in the bubble sort when no swaps are done. Clojure doesn't support multiple return values (for compatibility with Java)

Things I have learnt:

Iterate is a cool function

Labels: clojure

# posted by Jeff Foster : 15:28 0 Comments Links to this post

Tuesday, 23 December 2008

Sieve of Eratosthenes

The Sieve of Eratosthenes is an algorithm for calculating all the prime numbers up to a given value.

In Clojure a naive sieve algorithm might look a little like this:


(defn isprime [p]
  (and (> p 1)
       (every? (fn [x] (not (zero? (rem p x)))) (range 2 (inc (Math/sqrt p))))))

(defn sieve [x]
  (filter isprime (range 1 x)))

A number X is prime if it is only divisible by itself and items in the range 2 to Sqrt( X ). We can then apply the filter function.

We can use this in the REPL and see that it's suprisingly fast:


user> (time (count (sieve 1000)))
"Elapsed time: 1.18792 msecs"
167

user> (time (count (sieve 1000000)))
"Elapsed time: 7990.35914 msecs"
78497

This isn't in fact the Sieve algorithm at all. We recalculate the primes each time so calculating the next prime doesn't take into account the previous ones...

The Genuine Sieve of Eratosthenes (also discussed at LtU) gives some ideas on how to get it going faster...

Labels: clojure

# posted by Jeff Foster : 12:10 0 Comments Links to this post

Monday, 22 December 2008

99 Problems in Lisp (Part VIII)

P26 - Generating the combinations of K distinct objects chosen from N.


(defn append-prefix [prefix lst-elements]
 (mapcat (fn [x] (list (concat prefix (list x)))) lst-elements))

(defn combination [n lst]
 (if (> n (count lst))
   nil
   (let [elem-list (split lst (dec n)) rlist (nthrest lst (dec n))]
     (concat (append-prefix (first elem-list) rlist) (combination n (rest lst))))))

Only two left and I've done the lists part....

Labels: clojure

# posted by Jeff Foster : 13:40 0 Comments Links to this post

Sunday, 21 December 2008

99 Problems in Lisp (Part VII)

P23 - Extract a given number of randomly selected elements from a list. This is daft looking because the previously defined remove-at function is one based!


(defn rnd-select [lst n]
  (when (> n 0)
    (let [x (rand-int (count lst))]
      (lazy-cons (nth lst x) (rnd-select (remove-at lst (inc x)) (dec n))))))

P24 Select N different frombers from the set 1..m


(defn lotto-select [n rng]
  (rnd-select (range 1 rng) n))

P25 Permute a list


(defn rnd-permu [lst]
  (let [length (count lst)]
    (when (> length 0)
      (let [x (rand-int length)]
 (lazy-cons (nth lst x) (rnd-permu (remove-at lst (inc x))))))))

Labels: clojure

# posted by Jeff Foster : 06:58 0 Comments Links to this post

Saturday, 20 December 2008

Improving my Clojure style

P22 on my list of problems seems simple, "Create a list containing all integers within a given range.". My first naive (a polite way of saying crap!) implementation is shown below:


(defn my-range-crap [start end]
 ((fn [x y accum]
    (if (= x y)
      (concat accum (list x))
      (recur (inc x) y (concat accum (list x))))) start end nil))

The pattern (concat accum (list x)) is very ugly. There's a much simpler way of doing that using conj. From the documentation.

(conj coll item)
Conj[oin]. Returns a new collection with the item 'added'. (conj nil item) returns (item). The 'addition' may happen at different 'places' depending on the concrete type.

Much better, so now I can do:


(defn my-range-not-quite-as-crap [start end]
 ((fn [x y accum]
    (if (= x y)
      (conj accum x)
      (recur (inc x) y (conj accum x)))) start end nil))

Incidentally, this fixes another issue. If I'm using concat then (my-range-crap 1 50000) blows the stack. Not good. Using conj fixes the problem.

Talking on #Clojure, I got a much better solution from kotarak + poiuyt


(defn much-better-range
  [start end]
  (when (< start end)
    (lazy-cons start (much-better-range (inc start) end))))

Much more concise!

Using lazy-cons means the recursion can be explicit and that you get rid of the problems with stack overflow (it only evaluates when you need it).

Labels: clojure

# posted by Jeff Foster : 11:33 0 Comments Links to this post

99 Problems in Lisp (Part VI)

P19 Rotate a list N places to the left


(defn rotate [lst n]
  (if (> n 0)
    (take (count lst) (drop n (cycle lst)))
    (take (count lst) (drop (- (count lst) (Math/abs n)) (cycle lst)))))

P20 Remove the kth element from the list


(defn remove-at [lst n]
  (concat (take (dec n) lst) (drop n lst)))

P21 - Insert an element at a given position into a list


(defn insert-at [lst elt n]
  (concat (take (dec n) lst) (list elt) (drop (dec n) lst)))

Labels: clojure

# posted by Jeff Foster : 11:30 0 Comments Links to this post

99 Problems in Lisp (Part V)

I think that the style I have of using recur is probably very wrong. concat must be O(N) (since the only way to get to the end of the list is to walk it). I'm making N calls to it, so most of the code I've written is probably O(N^2).

P13 encode it directly


(defn encode-direct [lst]
  ((fn [xs accum]
     (if (= nil xs)
       accum
       (recur (drop-while (fn [x] (= x (first xs))) xs)
       (let [items (take-while (fn [x] (= x (first xs))) xs)]
  (if (= 1 (count items))
    (concat accum items)
    (concat accum (list (list (count items) (first items))))))))) lst nil))

P14 Duplicate the elements of a list


(defn dupli [lst]
  (mapcat (fn [x] (list x x)) lst))

P15 Replicate the elements of a list a given number of times


(defn repli [lst n]
  (mapcat (fn [x] (replicate n x)) lst))

P16 Drop every nth element from a list


(defn drop-nth [lst n]
  ((fn [xs i accum]
     (if (= nil xs)
       accum
       (if (= 0 (rem i n))
  (recur (rest xs) (inc i) accum)
  (recur (rest xs) (inc i) (concat accum (list (first xs))))))) lst 1 nil))

P17 split a list into two parts


(defn split [lst n]
  (list (take n lst) (drop n lst)))

P18 extract a slice from a list


(defn slice [lst i n]
  (take (inc (- n i)) (drop (dec i) lst)))

Labels: clojure

# posted by Jeff Foster : 01:13 0 Comments Links to this post

Friday, 19 December 2008

Using Git to store code

I decided I should set myself up some version control software as I sporadically do really stupid crap (well, not even sure about the sporadic bit!).

Git is all the rage these days. It's a distributed revision control system, so you have both a local and remote repository. Not exactly sure what I@m doing at the moment, but the cheat sheet is helping!

GitHub provides free hosting and allows you to use git. So, I created something to dump this Clojure code. See here

And a new release of Clojure today too...

Labels: clojure

# posted by Jeff Foster : 11:46 0 Comments Links to this post

Thursday, 18 December 2008

99 Problems in Lisp (Part IV)

P11 - Modified Run length encoding


(defn encode-modified [lst]
  ((fn [xs accum]
     (if (= nil xs)
       accum
       (recur (rest xs) 
       (concat accum 
        (list 
         (if (= (count (first xs)) 1)
    (ffirst xs)
    (list (count (first xs)) (ffirst xs)))))))) (pack-list lst) nil))

P12 - Decode a run length encoding list


(defn decode [lst]
  ((fn [xs accum]
     (if (= nil xs)
       accum
       (recur (rest xs)
       (if (list? (first xs))
  (concat accum (replicate (ffirst xs) (first (rfirst xs))))
  (concat accum (list (first xs))))))) lst nil))

Labels: clojure

# posted by Jeff Foster : 12:01 0 Comments Links to this post

Wednesday, 17 December 2008

99 Problems in Lisp (Part III)

So, onwards with spending a few minutes each day working my way through a random list of puzzles...

P08 - Eliminate consecutive duplicates of list elements


;; Ugly style
(defn eliminate-dupes [lst]
 ((fn [n last accum]
    (if (= nil n)
      accum
      (if (= (first n) last)
        (recur (rest n) last accum)
        (recur (rest n) (first n) (concat accum (list (first n))))))) lst nil '()))

;; Nicer functional style
(defn eliminate-dupes2 [lst]
 ((fn [n accum]
    (if (= nil n)
      accum
      (recur (drop-while (fn [x] (= x (first n))) n) 
      (concat accum (list (first n)))))) lst nil))

drop-while and take-while are functions that read from a (potentially) infinite sequence and either take or drop elements based on a predicate. They are (obviously) lazily evaluated!

P09 - Pack consecutive duplicates of list elements into sublists


;; P09 - pack consecutive duplicates of list elements into sublists
;; TODO should really use an accumulator
(defn pack-list [lst]
  (if (= lst nil)
    nil
    (cons (take-while (fn [x] (= x (first lst))) lst) 
   (pack-list (drop-while (fn [x] (= x (first lst))) lst)))))

(defn pack-list2 [lst]
  ((fn [xs accum]
     (if (= xs nil)
       accum
       (recur (drop-while (fn [x] (= x (first xs))) xs) 
       (concat accum (list (take-while (fn [x] (= x (first xs))) xs)))))) lst nil))

The transformation from non tail recursive, to tail recursive + accumulator is very formulaic. A quick search led me to LtU and in turn to LLVM.

LLVM is a low level virtual machine (doesn't provide GC / type system etc). It provides a framework for allowing optimizations to be generated and amongst these are including tail call optimization and transform. See the demo. Unfortunately, until it's possible to guarantee that this optimization is performed, you have to assume the worst and right code with explicit accumulators. Ho-hum!

P10 - Run length encoding of sublists



(defn encode [lst]
  ((fn [xs accum]
     (if (= nil xs)
       accum
       (recur (rest xs) (concat accum (list (list (count (first xs)) (ffirst xs))))))) (pack-list lst) nil))

count was a difficult one to find - it returns the length of a sequence, also works on strings, arrays and Java collections.

doc is an amazingly useful function that returns the documentation string associated with a function e.g.


user> (doc count)
-------------------------
clojure.core/count
([coll])
  Returns the number of items in the collection. (count nil) returns
  0.  Also works on strings, arrays, and Java Collections and Maps
nil

Labels: clojure

# posted by Jeff Foster : 23:22 0 Comments Links to this post

Monday, 15 December 2008

99 Problems in Lisp (Part II)

P06 - Find out whether a list is a palindrome?


(defn palindrome? [x]
  (= x (reverse x)))

P07 -Flatten a nested list structure.


(defn atom? [x]
  (or (nil? x) (not (seq? x))))

;; Not tail recursive and generally makes me feel sick that I've even typed
;; such a monstrosity!
(defn my-flatten [list]
  (if (atom? list)
    list
    (if (atom? (first list))
      (cons (first list) (my-flatten (rest list)))
      (concat (my-flatten (first list)) (my-flatten (rest list))))))

The above feels yucky! Using the example from the Haskell puzzles we can come up with a much cleaner solution.

How'd you flatten a list? If it's an atom then we're done (list x), whereas if it's a list then we want to flatten each element in turn (mapcat my-flatten2 x). This makes a really simple definition as below:


(defn my-flatten2 [x]
  (if (atom? x)
    (list x)
    (mapcat my-flatten2 x)))

mapcat is similar to the map function but collects all of the items together with concatenation. It takes a function and a series of lists as arguments, for example:


user> (mapcat (fn [x y] (list (+ x y))) '(1 2 3) '(3 4 5))
(4 6 8)

Labels: clojure

# posted by Jeff Foster : 23:39 0 Comments Links to this post

99 Problems in Lisp

Since the only way to get up to speed with a programming language is to actually use it, thought I'd work my way through 99 Problems in Lisp, only in Clojure.

Obviously, these are probably daft implementations, so any improvements welcome. Anyways, on with the problems. Instead of using recursion, I've tried to always go for recur since calling functions directly always runs the risk of running out of stack space. This actually "feels" surprisingly clean, no more repeating function names in bodies. Odd

P01 - Find the last item in a list


(defn last-in-list [x]
  ((fn [x last]
     (if (nil? x)
       last
       (recur (rest x) (first x)))) x nil))

P02 - Find the last but one in a list


(defn last-but-one-in-list [x]
  ((fn [x last]
     (if (nil? (rest (rest x)))
       last
       (recur (rest x) (first (rest x))))) x nil))

P03 - Find the K'th element of a list


(defn element-at [x n]
  (if (= n 0)
    (first x)
    (recur (rest x) (dec n))))

P04 - Find the length of a list


(defn length [x]
  ((fn [x acc]
     (if (nil? x)
       acc
       (recur (rest x) (inc acc)))) x 0))

P05 - Reverse a list


(defn my-reverse [x]
  ((fn [list acc]
     (if (nil? list)
       acc
       (recur (rest list) (cons (first list) acc)))) x nil))

Labels: clojure

# posted by Jeff Foster : 23:23 0 Comments Links to this post

Sunday, 14 December 2008

Recursion in Clojure

How do you write functions that don't explode the stack if Clojure doesn't have TCO?

Let's start with a bad definition of factorial:


(defn factorial [x]
  (if (= x 0)
      1
      (* x (factorial (- x 1)))))

Using it in the REPL is fine:


user> (factorial 100)
933262154439441526816992388562667004907
159682643816214685929638952175999932299
156089414639761565182862536979208272237
582511852109168640000000000000000000000
00

Up to a point...


user> (factorial 100000)
; Evaluation aborted
java.lang.StackOverflowError (NO_SOURCE_FILE:0)
  [Thrown class clojure.lang.Compiler$CompilerException]

Restarts:
 0: [ABORT] Return to SLIME's top level.
 1: [CAUSE] Throw cause of this exception

This is barfing because the evaluator has to keep around state for each call due to the expression (* x (factorial (- x 1))) . We need to make this function tail recursive.

recur can be thought of as the Clojure operator for looping. Think of it like a function call for the nearest enclosing let or function definition supplied with new variables. Naively we can switch over to using this by doing:


user> (defn factorial2 [x]
        (if (= x 0)
            1
            (* x (recur (- x 1)))))

But this is a compile-time error (which in itself is pretty neat!).


java.lang.UnsupportedOperationException: Can only recur from tail position (NO_SOURCE_FILE:4)

An accumulator parameter is an extra parameter to a function that's used to gather intermediate parts of the calculation. If we do this, we can make sure that the recur call is in the tail position. Using an anonymous function we get:


(defn factorial3 [x]
  ((fn [x y]
      (if (= x 0)
          y
          (recur (- x 1) (* x y)))) x 1))

Now when recur is used, it doesn't need to keep any of the previous stack frame around. This means we can finally calculate factorial 1000000, which begins with 282 and ends with lots of zeros!

Labels: clojure

# posted by Jeff Foster : 23:39 1 Comments Links to this post

Thursday, 11 December 2008

Clojure Videos

There's some great Clojure video online at Blip.TV. I started watching Clojure for Lisp Programmers. The following are brief notes on this.

Clojure was designed for concurrency and also targeted specifically at the Java Virtual Machine. Apparently it's OK to hate Java, but like the virtual machine. Clojure is a "language as platform" implementation rather than language + platform. This gives the benefits that you get lots of stuff for free (e.g. the virtual machine, garbage collection, byte code generation, JIT compilation etc). Compare this to most Lisp implementations where you have to build this yourself and Clojure's off to a good start!

The focus is on concurrency because we're approaching the limit of single-core speed ups (whether that's actually true or not is debatable, but certainly Intel and AMD seem to be pushing us to a multi-core future). As has been said many times before functional programming is ideally suited for concurrency - no shared state hugely simplifies things. However, Clojure isn't a pure language like Haskell; if you need to get "dirty" and modify state you can.

Clojure is not object-oriented because it encourages mutable state. "Mutable stateful objects are the new spaghetti code" was a great quote! Object-oriented polymorphism is very restrictive, CLOS has shown how useful basing it on multiple types can be, but can also base it on values, current state, relationships etc.

Clojure is a different kind of lisp for several reasons:
* First class support for sets, maps, lists and vectors (first class meaning Lisp reader support for example)
* Abstractions (all containers defined as seq interface)
* Thread aware
* Host embracing (e.g. tightly tied to the JVM)
* Not constrained by backwards compatability.

In compared to other Lisp's:
* Clojure is a lexically scoped Lisp1
* Common Lisp style macros and dynamic vars
* Dynamically compiled to JVM byte code
* No tail call optimization

In the video, Rich Hickey discussed the recent JVM summit and his (and most other attendees) hopes for getting support for TCO in the JVM. However according to a recent article support it not coming in Java 7. Worst yet, if Java 7 is 2010 then we've a long wait till Java 8!

One of the fundamental differences between Lisp and Clojure is that Lisp doesn't have the cons cell as the primary building block. Instead of a concrete implementation for lists, Clojure is based on the simple abstraction of first and rest. The seq interface is common across all containers (list,map,set,vector,string,regex matches,files etc). (seq x) will give you an available sequence interface from x. (conj x 6) will append a six on the end of any sequence.

Lazy seqs is also possible using the lazy-cons macro. As an example, take (which returns the first n values of a potentially infinite collection), can be defined as:


(defn take [n coll]
   (when (and (pos? n) (seq coll)) ;pos? positive number
        (lazy-cons (first coll)
            (take (dec n) (rest coll)))))

From the brief look of the video, the standard library has lots of neat little functions e.g.


(drop 2 [1 2 3 4]) ; (3 4)
(take 9 (cycle [1 2 3 4])) ; (1 2 3 4 1 2 3 4 1)
(interleave [:a :b :c :d :e] [1 2 3 4 5]) ;(:a 1 :b 2 :c 3 :d 4 :e 5)
(partition 3 [1 2 3 4 5 6]) ; ((1 2 3) (4 5 6))
(map vector [:a :b :c] [1 2 3]) ; ([:a 1] [:b 2] [:c 3])
(apply str (interpose \, "asdf")) ; "a,s,d,f"
(reduce + (range 100)) ; 4950

And I'm only an hour into the video before work has to kick in :(

Labels: clojure

# posted by Jeff Foster : 11:21 0 Comments Links to this post

Monday, 8 December 2008

Passing functions around in Clojure

fn is the Clojure equivalent of lambda from Lisp.


  (fn [x] (+ x 1)) ; #
  ((fn [x] (+ x 1)) 7) ; 8

These work as you'd expect when you pass them to functions such as map


  (map (fn [x] (+ x 1)) (list 1 2 3 4)) ; (2 3 4 5)
  (defn adder [x] (+ x 1)) ; #'user/adder
  (map #'adder (list 1 2 3 4) ; (2 3 4 5)

Sharp quote (#') works same as it does in Lisp it appears.

Labels: clojure

# posted by Jeff Foster : 23:42 0 Comments Links to this post

Clojure Java Interop

Use the "/" notation to call static methods


  (Math/pow 2 3) ; 8
  (Integer/toBinaryString 10) ; "1010"

  ;; not good style!
  (. Integer toBinaryString 10) ; legal syntax but / operator preferred for clarity!

Use the "." operator to call instance methods


  (. "shoes" length) ; 5
  (. "1234abcd" substring 0 4) ; "1234

Labels: clojure

# posted by Jeff Foster : 23:33 0 Comments Links to this post

Sunday, 7 December 2008

Five minutes with Clojure

After setting things up I had a spare five minutes to actually try some Clojure. Most frustrating thing so far is the lack of decent error messages. For example, getting messages like below isn't that helpful to me at the moment!


java.lang.IllegalArgumentException: Don't know how to create ISeq from: Integer (NO_SOURCE_FILE:1)
  [Thrown class clojure.lang.Compiler$CompilerException]

Restarts:
 0: [ABORT] Return to SLIME's top level.
 1: [CAUSE] Throw cause of this exception

Backtrace:
  ;;; snip

First class support for more data structures than Lisp is great. There's Set, Map and Vector represented by the syntax below.


  '(a b c) ; list 
  ['a 'b 'c] ; vector 
  {:a 1 :b 2 :c 3} ; map
  #{:a :b :c} ; set

I'm still a little confused with some of the syntax, for example with the latest version from SVN the following happens


  '(a b c) ==> (a b c)
  ('a 'b 'c) ==> c

Odd. Wonder why that is? a is a symbol bound to a function that apparently returns its first or second argument if it has 1 or 2 args. It's not defined for more arguments. Weird.

For declaring functions, you don't use defun. Instead you can use fn to create a function, for example (shamelessly stolen from Practical Common Lisp)


  (def make-cd (fn [title artist rating] (list title artist rating)))
  (make-cd "Technology Sucks" "AC" 1) ==> ("Technology Sucks" "AC" 1)

You can simplify the above definition using the defn macro.


  (defn make-cd [title artist rating] (list title artist rating))

macroexpand-1 shows that this expands into the above. defn allows you to provide some overloads. I couldn't think of a good example, but the syntax is pretty simple, just a list of forms.


  (defn crazy-function
    ([x] (list 42))
    ([x y] (list (+ x y)))
    ([x y &more] (list y)))

  (crazy-function 7) ==> 42
  (crazy-function 7 11) ==> 18
  (crazy-function 1 2 3 4 5) ==> 2

Labels: clojure

# posted by Jeff Foster : 23:21 0 Comments Links to this post

Clojure + Slime

Following the Bill Clementson's guide http://bc.tech.coop/blog/081205.html I was able to get Clojure + Slime playing together happily.

Now all I need to do is find the time to learn it....

Labels: clojure

# posted by Jeff Foster : 11:28 0 Comments Links to this post

Subscribe to Posts [Atom]