On Sat, Dec 13, 2008 at 10:41 AM, Dmitri <dmitri.sotni...@gmail.com> wrote: > > I wrote a simple word counter described here http://ptrace.fefe.de/wp/ > it reads stdin and counts the occurrences of words, however I notice > that it runs significantly slower than the java version in the link.
There are several differences that could be factors. For example, the Java version uses StreamTokenizer, while your Clojure version uses String.split with a regex that gets recompiled for each line read. > I've also noticed that there is a significant speed difference between > conj and assoc, why is that? > If I understand correctly both should only create the delta of the new > elements and the old structure, however assoc appears to perform much > better. user=> (let [c 1000000 p [1 1]] (time (reduce #(conj % [%2 %2]) {} (range c))) (time (reduce #(assoc % %2 %2) {} (range c))) nil) "Elapsed time: 1544.180472 msecs" "Elapsed time: 1894.318809 msecs" nil user=> (let [c 1000000 p [1 1]] (time (reduce #(conj % [%2 %2]) {} (range c))) (time (reduce #(assoc % %2 %2) {} (range c))) nil) "Elapsed time: 1549.159812 msecs" "Elapsed time: 1594.18912 msecs" That's a million items added to a hash-map each way in about 1.5 seconds -- not too shabby. And the speeds for conj vs. assoc seem very close, though I'm actually seeing a slight advantage for conj. And I'm sorry for what follows -- it's like a compulsion for me, and I hope it doesn't put you off. Each of these functions takes the same input and produces the same output as your original code, but each is implemented a bit more succinctly: (import '(java.io BufferedReader InputStreamReader)) (defn inc-count [words word] (if (seq word) (assoc words word (inc (words word 0))) words)) (defn sort-words [words] (reverse (sort (map (fn [[k v]] [v k]) words)))) (defn print-words [words] (doseq [head words] (println head))) (defn read-words [words line] (reduce inc-count words line)) (defn read-input [] (with-open [buf (BufferedReader. (InputStreamReader. System/in))] (let [words (for [line (line-seq buf)] (.split line " "))] (print-words (sort-words (reduce read-words {} words)))))) (time (read-input)) --Chouser --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---