Thank you for all improvements and suggestions. Based on your feedback, here is my final version:
(defn read-words "Given a file, return a seq of every word in the file, normalizing words by coverting them to lower case and splitting on whitespace" [in-filepath] (re-seq #"\w+" (.toLowerCase (slurp in-filepath)))) (defn count-words "Given a collection, return a mapping of unique elements in the collection to the number of times that the element appears" [coll] (reduce #(merge-with + %1 {%2 1}) {} coll)) (defn format-words "Given a map from words to their frequencies, return a pretty string, sorted in descending order by number of appearances" [words] (apply str (map #(format "%20s : %5d \r\n" (key %) (val %)) (sort-by #(- (val %)) words)))) (defn top-words "Compute the frequencies of each word in in-filepath. Output the results to out-filepath" [in-filepath out-filepath] (spit out-filepath (format-words (count-words (read-words in-filepath))))) Some robustness notes: On 5.2MB file, it takes 9s compared to 7s of improved Mibu version, or 7s of mine initial one. On 38MB file, it takes 53s and about 270MB of memory. Similarly, the initial one and the mibu versions take 39s and also about 270MB of memory. I also like Ipetit code, except it needs 60s and 530MB RAM. regards, Piotrek --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---