In an effort to learn more about Clojure, I decided to port a markov text generator which a friend wrote in Python. After getting through a few snags, I completed the program and decided to have some fun feeding in some e-books downloaded from the Gutenberg project as input. In this case, I chose Sherlock Holmes and Bram Stoker's Dracula to create a bizarre mashup, which could be called Draclock Holmes or something approximate. I had the program print out three-line snippits of text, and some of the resulting text resembles a sort of absurd poetry. I'd imagine if I let it churn and burn for a few hours, some real gems could emerge.
acting in her interests Mina's morning and evening hypnotic answer is unvaried with devilish passion she succeeded somewhat swiftly and deftly His look is a warning together as we swept along found myself lying on my bed trembling all over Miss Stoner and I gazed at him in many tongues my power to reward you for your services common subject for conversation throwing open another door nine years in England strong-faced old man to mediaeval times Here's the code. I'm new to Clojure, so I'm open to suggestions. It's written in a purely functional non-destructive fashion; although, I'm sure a few things could be improved. (ns markov (use clojure.contrib.str-utils)) (defn rand-nth [coll] "return a random element from a collection" (nth (seq coll) (rand-int (count coll)))) (defn clean [txt] "clean given txt for symbols disruptive to markov chains" (let [new-txt (re-gsub #"[:;,^\"()]" "" txt) new-txt (re-gsub #"'(?!(d|t|ve|m|ll|s|de|re))" "" new-txt)] new-txt)) (defn chain-lengths [markov-chain] "return a set of lengths for each element in the collection" (let [markov-keys (map keys markov-chain)] (set (for [x markov-keys] (count x))))) (defn max-chain-length [markov-chain] "return the length lf the longest chain" (apply max (chain-lengths markov-chain))) (defn flatten [x] "Flatten a collection" (let [s? #(instance? clojure.lang.Sequential %)] (filter (complement s?) (tree-seq s? seq x)))) (defn build-chain [markov-chain keychain words] "Builds a markov chain" (let [first-word (first words)] (if (seq words) (recur (assoc markov-chain keychain (cons first-word (get markov-chain keychain))) (concat (rest keychain) [first-word]) (rest words)) (assoc markov-chain keychain [])))) (defn chain "Take a list of words and build a markov chain out of them. The length is the size of the key in number of words." ([words] (chain words 3)) ([words length] (build-chain {} (for [x (range length)] nil) (map clean words)))) (defn split-sentence [text] "Convert a string to a collection on common boundaries" (filter seq (re-split #"[,.!?()\d]+\s*" text))) (defn file-chain "Create a markov chain from the contents of a given file" ([file] (file-chain file 3)) ([file length] (let [sentences (split-sentence (slurp file)) flatten-list (fn [& x] (flatten (list x)))] (loop [markov-chain {} words sentences] (if (seq words) (recur (merge-with flatten-list markov-chain (chain (re-split #"\s+" (first words)))) (rest words)) markov-chain))))) (defn construct-sentence "Build a sentence from a markov chain structure. Given a Markov chain (any size key), Seed (to start the sentence) and Proc (a function for choosing the next word), returns a sentence composed until is reaches the end of a chain (an end of sentence)." ([markov-chain] (construct-sentence markov-chain nil rand-nth)) ([markov-chain seed] (construct-sentence markov-chain seed rand-nth)) ([markov-chain seed proc] (loop [words (if seed seed (rand-nth (keys markov-chain))) sentence (str-join " " (filter identity words))] (if (seq (markov-chain words)) (let [word-new (proc (markov-chain words))] (recur (concat (rest words) [word-new]) (str-join " " (into [sentence] [word-new])))) sentence)))) Example usage: (ns main (use markov)) (def markov (file-chain "draclock.txt")) (doseq [x (range 100)] (doseq [x (range 3)] (println (construct-sentence markov))) (println)) Input files: http://www.gutenberg.org/files/345/345.txt - dracula http://www.gutenberg.org/dirs/etext99/advsh12.txt - sherlock holmes I just cat them together to make draclock.txt ;-) Cheers! Travis --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---