tmountain a écrit :
> Very cool. I actually cleaned up the code a little bit more this
> morning trying to speed things up a bit. It's still not as fast as I'd
> like, but I'm not up to speed on Closure optimization either, so I
> could be missing something.
>   

There are two things that I noticed in your code:
- you use nth on seq (linear access),
- you append elements to seqs.

It would be better to use vectors instead of seqs:
- random access,
- when you conj an element to a vector it is appended.

Below is the "vectorized" version, it runs (on my box) twice as fast as 
your original code.
(I also removed the in-loop building of the string because it was needless.)

(ns markov
  (use clojure.contrib.str-utils))

(defn rand-elt
  "Return a random element of this vector or seq"
  [s]
  (nth s (rand-int (count s))))

(defn clean [txt]
  "clean given txt for symbols disruptive to markov chains"
  (let [new-txt (re-gsub #"[:;,^\"()]" "" txt)
        new-txt (re-gsub #"'(?!(d|t|ve|m|ll|s|de|re))" "" new-txt)]
new-txt))

(defn chain-lengths [markov-chain]
  "return a set of lengths for each element in the collection"
  (let [markov-keys (map keys markov-chain)]
    (set (for [x markov-keys] (count x)))))

(defn max-chain-length [markov-chain]
  "return the length lf the longest chain"
  (apply max (chain-lengths markov-chain)))

(defn chain
  "Take a list of words and build a markov chain out of them.
  The length is the size of the key in number of words."
  ([words]
   (chain words 3))
  ([words length]
   (let [words (concat (repeat length nil) words)
         suffixes (take-while #(seq (drop length %)) (iterate rest words))]
     (reduce (fn [markov-chain [a b c d]]
               (merge-with into markov-chain {[a b c] [d]}))
       {} suffixes))))

(defn split-sentence [text]
  "Convert a string to a collection on common boundaries"
  (filter seq (re-split #"[,.!?()\d]+\s*" text)))

(defn file-chain
  "Create a markov chain from the contents of a given file"
  ([file]
   (file-chain file 3))
  ([file length]
   (let [sentences (split-sentence (slurp file))]
     (reduce #(merge-with into %1 (chain (re-split #"\s+" %2))) {} 
sentences))))

(defn construct-sentence
   "Build a sentence from a markov chain structure.  Given a
   Markov chain (any size key),  Seed (to start the sentence) and
   Proc (a function for choosing the next word), returns a sentence
   composed until is reaches the end of a chain (an end of sentence)."
  ([markov-chain]
   (construct-sentence markov-chain nil rand-elt))
  ([markov-chain seed]
   (construct-sentence markov-chain seed rand-elt))
  ([markov-chain seed proc]
   (let [seed (or seed (rand-elt (keys markov-chain)))
         next-key #(concat (rest %) [(proc (markov-chain %))])
         logorrhea (map first (iterate next-key seed))
         sentence (take-while identity (drop-while nil? logorrhea))]
     (str-join " " sentence))))

hth,

Christophe

-- 
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.blogspot.com/ (en)



--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to