A better implementation would split the different steps of the program into separate functions. This increases readability and testability of the source code, and encourages the reuse of code in new programs. I haven't tested this program, but hopefully you'll understand the general approach.
Also, you're using a def inside your defn. In all likelihood this doesn't do what you think it does (defines a locally-scoped variable) because all def's are global. One consequence of this: if more than one thread executes topwords (for different files arguments) at the same time, then a race condition could make one or both of them generate completely wrong results, and the resulting bug would be maddeningly difficult to track down. Instead of def, use let instead. (defn freqs "Given a collection, return a mapping of unique elements in the collection to the number of times that the element appears" [coll] (merge-with + {} (for [el coll] {el 1}))) (defn word-seq "Given a file, return a seq of every word in the file, normalizing words by coverting them to lower case and splitting on whitespace" [file] (seq (.split (.toLowerCase (slurp in-filepath "\\s+"))))) (defn freq-format "Given a map from words to their frequencies, return a pretty string, sorted in descending order by number of appearances" [freq] (apply str (map #(format "%20s : %5d \r\n" (key %) (val %)) (sort-by #(- (val %)) freq)))) (defn topwords "Compute the frequencies of each word in in-filepath. Output the results to out-filepath" [in-filepath out-filepath] (spit out-filepath (str (freq-format (freqs (word-seq in-filepath))) "\r\n"))) On Dec 25, 7:16 am, Piotr 'Qertoip' Włodarek <qert...@gmail.com> wrote: > Given the input text file, the program should write to disk a ranking > of words sorted by frequency, like: > > the : 52483 > and : 32558 > of : 23477 > a : 22486 > to : 21993 > > My first implementation: > > (defn topwords [in-filepath, out-filepath] > (def words (.split (.toLowerCase (slurp in-filepath)) "\\s+")) > > (spit out-filepath > (apply str > (concat > (map (fn [pair] (format "%20s : %5d \r\n" (key pair) > (val pair))) > (sort-by #( -(val %) ) > (reduce > (fn [counted-words word] > ( assoc counted-words > word > (inc (get counted-words > word 0)) )) > {} > words))) > ["\r\n"])))) > > Somehow I feel it's far from optimal. Could you please advise and > improve? What is the best, idiomatic implementation of this simple > problem? > > regards, > Piotrek --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---