Cool... I actually did a Markov chain generator myself as one of my
early Clojure projects. I posted about it at the DC Study group, here:

http://groups.google.com/group/clojure-study-dc/browse_thread/thread/26ccdc8acb102f9/d18d7627ddcaf167

It looks like yours is more succinct... I'll definitely have to take
some time and compare our approaches.

-Luke


On Apr 24, 8:47 am, tmountain <[email protected]> wrote:
> In an effort to learn more about Clojure, I decided to port a markov
> text generator which a friend wrote in Python. After getting through a
> few snags, I completed the program and decided to have some fun
> feeding in some e-books downloaded from the Gutenberg project as
> input. In this case, I chose Sherlock Holmes and Bram Stoker's Dracula
> to create a bizarre mashup, which could be called Draclock Holmes or
> something approximate. I had the program print out three-line snippits
> of text, and some of the resulting text resembles a sort of absurd
> poetry. I'd imagine if I let it churn and burn for a few hours, some
> real gems could emerge.
>
> acting in her interests
> Mina's morning and evening hypnotic answer is unvaried
> with devilish passion
>
> she succeeded somewhat
> swiftly and deftly
> His look is a warning
>
> together as we swept along
> found myself lying on my bed trembling all over
> Miss Stoner and I gazed at him in many tongues
>
> my power to reward you for your services
> common subject for conversation
> throwing open another door
>
> nine years in England
> strong-faced old man
> to mediaeval times
>
> Here's the code. I'm new to Clojure, so I'm open to suggestions. It's
> written in a purely functional non-destructive fashion; although, I'm
> sure a few things could be improved.
>
> (ns markov
>   (use clojure.contrib.str-utils))
>
> (defn rand-nth [coll]
>   "return a random element from a collection"
>   (nth (seq coll) (rand-int (count coll))))
>
> (defn clean [txt]
>   "clean given txt for symbols disruptive to markov chains"
>   (let [new-txt (re-gsub #"[:;,^\"()]" "" txt)
>         new-txt (re-gsub #"'(?!(d|t|ve|m|ll|s|de|re))" "" new-txt)]
> new-txt))
>
> (defn chain-lengths [markov-chain]
>   "return a set of lengths for each element in the collection"
>   (let [markov-keys (map keys markov-chain)]
>     (set (for [x markov-keys] (count x)))))
>
> (defn max-chain-length [markov-chain]
>   "return the length lf the longest chain"
>   (apply max (chain-lengths markov-chain)))
>
> (defn flatten [x]
>   "Flatten a collection"
>   (let [s? #(instance? clojure.lang.Sequential %)]
>     (filter (complement s?) (tree-seq s? seq x))))
>
> (defn build-chain [markov-chain keychain words]
>   "Builds a markov chain"
>   (let [first-word (first words)]
>     (if (seq words)
>       (recur (assoc markov-chain keychain
>                     (cons first-word (get markov-chain keychain)))
>              (concat (rest keychain) [first-word])
>              (rest words))
>       (assoc markov-chain keychain []))))
>
> (defn chain
>   "Take a list of words and build a markov chain out of them.
>   The length is the size of the key in number of words."
>   ([words]
>    (chain words 3))
>   ([words length]
>    (build-chain {} (for [x (range length)] nil) (map clean words))))
>
> (defn split-sentence [text]
>   "Convert a string to a collection on common boundaries"
>   (filter seq (re-split #"[,.!?()\d]+\s*" text)))
>
> (defn file-chain
>   "Create a markov chain from the contents of a given file"
>   ([file]
>    (file-chain file 3))
>   ([file length]
>    (let [sentences (split-sentence (slurp file))
>          flatten-list (fn [& x] (flatten (list x)))]
>      (loop [markov-chain {} words sentences]
>        (if (seq words)
>          (recur (merge-with flatten-list
>                             markov-chain
>                             (chain (re-split #"\s+" (first words))))
>                 (rest words))
>          markov-chain)))))
>
> (defn construct-sentence
>    "Build a sentence from a markov chain structure.  Given a
>    Markov chain (any size key),  Seed (to start the sentence) and
>    Proc (a function for choosing the next word), returns a sentence
>    composed until is reaches the end of a chain (an end of sentence)."
>   ([markov-chain]
>    (construct-sentence markov-chain nil rand-nth))
>   ([markov-chain seed]
>    (construct-sentence markov-chain seed rand-nth))
>   ([markov-chain seed proc]
>    (loop [words (if seed seed (rand-nth (keys markov-chain)))
>           sentence (str-join " " (filter identity words))]
>      (if (seq (markov-chain words))
>        (let [word-new (proc (markov-chain words))]
>          (recur (concat (rest words) [word-new])
>                 (str-join " " (into [sentence] [word-new]))))
>        sentence))))
>
> Example usage:
>
> (ns main (use markov))
> (def markov (file-chain "draclock.txt"))
> (doseq [x (range 100)]
>   (doseq [x (range 3)] (println (construct-sentence markov)))
>   (println))
>
> Input files:http://www.gutenberg.org/files/345/345.txt- 
> draculahttp://www.gutenberg.org/dirs/etext99/advsh12.txt- sherlock holmes
>
> I just cat them together to make draclock.txt ;-)
>
> Cheers!
> Travis
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to