Parallel words frequency ranking

Piotr 'Qertoip' Włodarek Sun, 28 Dec 2008 19:50:53 -0800

Following my recent adventure with words ranking, here's the parallel
version:


(use 'clojure.contrib.duck-streams)

(defn top-words-core [s]
      (reduce #(assoc %1 %2 (inc (%1 %2 0))) {}
              (re-seq #"\w+"
                      (.toLowerCase s))))

(defn format-words [words]
  (apply str
         (map #(format "%20s : %5d \r\n" (key %) (val %))
              (sort-by #(- (val %))
                       words))))

(defn split-string-in-two [s]
  (let [chunk-size (quot (count s) 2)]
    [(subs s 0 chunk-size), (subs s chunk-size)]))

(defn parallel-top-words [in-filepath out-filepath]
  (let [string  (slurp in-filepath)
        agents (map #(agent %) (split-string-in-two string))]

    (doseq [a agents] (send a top-words-core))
    (apply await agents)

    (spit out-filepath
        (format-words
          (apply merge-with + (map deref agents))))))


(http://pastie.org/348106)


On 38MB file it takes 28s, compared to 38s of similar but sequential
version.

1. Is there a better way to do it? Perhaps agents should share some
data structure?
2. Despite producing valid results, the program never ends. Why?


regards,
Piotrek
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Parallel words frequency ranking

Reply via email to