Following my recent adventure with words ranking, here's the parallel
version:
(use 'clojure.contrib.duck-streams)
(defn top-words-core [s]
(reduce #(assoc %1 %2 (inc (%1 %2 0))) {}
(re-seq #"\w+"
(.toLowerCase s))))
(defn format-words [words]
(apply str
(map #(format "%20s : %5d \r\n" (key %) (val %))
(sort-by #(- (val %))
words))))
(defn split-string-in-two [s]
(let [chunk-size (quot (count s) 2)]
[(subs s 0 chunk-size), (subs s chunk-size)]))
(defn parallel-top-words [in-filepath out-filepath]
(let [string (slurp in-filepath)
agents (map #(agent %) (split-string-in-two string))]
(doseq [a agents] (send a top-words-core))
(apply await agents)
(spit out-filepath
(format-words
(apply merge-with + (map deref agents))))))
(http://pastie.org/348106)
On 38MB file it takes 28s, compared to 38s of similar but sequential
version.
1. Is there a better way to do it? Perhaps agents should share some
data structure?
2. Despite producing valid results, the program never ends. Why?
regards,
Piotrek
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---