Re: Exercise: words frequency ranking

Piotr 'Qertoip' Włodarek Sat, 27 Dec 2008 04:00:48 -0800

Thank you for all improvements and suggestions. Based on your
feedback, here is my final version:



(defn read-words
  "Given a file, return a seq of every word in the file, normalizing
words by
  coverting them to lower case and splitting on whitespace"
  [in-filepath]
  (re-seq #"\w+"
          (.toLowerCase (slurp in-filepath))))

(defn count-words
  "Given a collection, return a mapping of unique elements in the
collection
  to the number of times that the element appears"
  [coll]
  (reduce #(merge-with + %1 {%2 1}) {} coll))

(defn format-words
  "Given a map from words to their frequencies, return a pretty
string,
  sorted in descending order by number of appearances"
  [words]
  (apply str
         (map #(format "%20s : %5d \r\n" (key %) (val %))
              (sort-by #(- (val %))
                       words))))

(defn top-words
  "Compute the frequencies of each word in in-filepath. Output the
results to
  out-filepath"
  [in-filepath out-filepath]
  (spit out-filepath
        (format-words (count-words (read-words in-filepath)))))


Some robustness notes:

On 5.2MB file, it takes 9s compared to 7s of improved Mibu version, or
7s of mine initial one.

On 38MB file, it takes 53s and about 270MB of memory. Similarly, the
initial one and the mibu versions take 39s and also about 270MB of
memory. I also like Ipetit code, except it needs 60s and 530MB RAM.


regards,
Piotrek
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To post to this group, send email to clojure@googlegroups.com
To unsubscribe from this group, send email to 
clojure+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Exercise: words frequency ranking

Reply via email to