Thank you, Christophe, that makes total sense. Interestingly, on my machine, the reducers version is still slower:
Run # 0 "Elapsed time: 1105.586 msecs" "Elapsed time: 685.8 msecs" "Elapsed time: 549.969 msecs" Run # 1 "Elapsed time: 497.831 msecs" "Elapsed time: 489.932 msecs" "Elapsed time: 520.38 msecs" Run # 2 "Elapsed time: 363.873 msecs" "Elapsed time: 489.529 msecs" "Elapsed time: 556.117 msecs" Run # 3 "Elapsed time: 357.822 msecs" "Elapsed time: 429.894 msecs" "Elapsed time: 540.975 msecs" Run # 4 "Elapsed time: 425.276 msecs" "Elapsed time: 431.819 msecs" "Elapsed time: 528.8 msecs" I guess the JVM finds some serious optimizations during the runs for the non-reducer versions. I saw the same difference when using criterium for benchmarking. Balint On Friday, December 7, 2012 1:44:58 PM UTC+1, Christophe Grand wrote: > > Absolutely > > > On Fri, Dec 7, 2012 at 1:29 PM, László Török <ltor...@gmail.com<javascript:> > > wrote: > >> Hi, >> >> shouldn't >> >> (fn [groups a] >> (assoc groups (f a) (conj (get groups a []) a))) >> >> be >> >> (fn [groups a] >> (let [k (f a)] >> (assoc groups k (conj (get groups k []) a)))) >> >> ? >> >> Las >> >> 2012/12/7 Christophe Grand <chris...@cgrand.net <javascript:>> >> >>> Hi, >>> >>> There are indeed too much allocationfoing on in your r/map. >>> You don't need the rmap, >>> >>> Start from a plain old reduce like your reduce-by-naive, replace reduce >>> by r/fold, remove the seed and add the combine-fn (which now provides the >>> seed): >>> >>> (defn group-by-red [f coll] >>> (r/fold >>> (partial merge-with concat) >>> (fn [groups a] >>> (assoc groups (f a) (conj (get groups a []) a))) >>> coll)) >>> >>> >>> Crude benchmark; >>> (let [v (vec (range 1000000))] >>> (dotimes [n 5] >>> (println "Run" n) >>> (time (group-by #(mod % 29) v)) >>> (time (group-by-naive #(mod % 29) v)) >>> (time (group-by-red #(mod % 29) v)))) >>> Run 0 >>> "Elapsed time: 938.878 msecs" >>> "Elapsed time: 778.161 msecs" >>> "Elapsed time: 1035.997 msecs" >>> Run 1 >>> "Elapsed time: 715.352 msecs" >>> "Elapsed time: 771.913 msecs" >>> "Elapsed time: 840.264 msecs" >>> Run 2 >>> "Elapsed time: 656.588 msecs" >>> "Elapsed time: 700.114 msecs" >>> "Elapsed time: 777.323 msecs" >>> Run 3 >>> "Elapsed time: 731.781 msecs" >>> "Elapsed time: 713.547 msecs" >>> "Elapsed time: 875.801 msecs" >>> Run 4 >>> "Elapsed time: 689.711 msecs" >>> "Elapsed time: 710.728 msecs" >>> "Elapsed time: 1112.609 msecs" >>> nil >>> >>> Let's play with the granularity (defaults to 512) >>> (defn group-by-red [f coll] >>> (r/fold 2048 >>> (partial merge-with concat) >>> (fn [groups a] >>> (assoc groups (f a) (conj (get groups a []) a))) >>> coll)) >>> >>> Re-benchmark: >>> Run 0 >>> "Elapsed time: 810.676 msecs" >>> "Elapsed time: 736.764 msecs" >>> "Elapsed time: 547.651 msecs" >>> Run 1 >>> "Elapsed time: 681.249 msecs" >>> "Elapsed time: 850.046 msecs" >>> "Elapsed time: 521.27 msecs" >>> Run 2 >>> "Elapsed time: 669.385 msecs" >>> "Elapsed time: 712.15 msecs" >>> "Elapsed time: 518.502 msecs" >>> Run 3 >>> "Elapsed time: 673.108 msecs" >>> "Elapsed time: 745.688 msecs" >>> "Elapsed time: 542.837 msecs" >>> Run 4 >>> "Elapsed time: 654.196 msecs" >>> "Elapsed time: 723.074 msecs" >>> "Elapsed time: 506.861 msecs" >>> >>> Note that you are not exactly comparing apples to apples since >>> group-by-red is returning maps whose values are unrealized lazy sequence >>> (concats of concats of ocncats of vactors) instead of vectors >>> >>> (defn group-by-red [f coll] >>> (r/fold 2048 >>> (partial merge-with into) >>> (fn [groups a] >>> (assoc groups (f a) (conj (get groups a []) a))) >>> coll)) >>> >>> Run 0 >>> "Elapsed time: 763.455 msecs" >>> "Elapsed time: 763.501 msecs" >>> "Elapsed time: 681.075 msecs" >>> Run 1 >>> "Elapsed time: 645.52 msecs" >>> "Elapsed time: 731.545 msecs" >>> "Elapsed time: 476.381 msecs" >>> Run 2 >>> "Elapsed time: 660.775 msecs" >>> "Elapsed time: 728.19 msecs" >>> "Elapsed time: 475.543 msecs" >>> Run 3 >>> "Elapsed time: 657.255 msecs" >>> "Elapsed time: 725.995 msecs" >>> "Elapsed time: 494.038 msecs" >>> Run 4 >>> "Elapsed time: 647.53 msecs" >>> "Elapsed time: 731.085 msecs" >>> "Elapsed time: 538.649 msecs" >>> >>> hth, >>> >>> Christophe >>> >>> >>> On Fri, Dec 7, 2012 at 10:30 AM, Balint Erdi >>> <balin...@gmail.com<javascript:> >>> > wrote: >>> >>>> BTW I understood the most about reducers (still not quite there yet, >>>> though :) ) from Rich Hickey's talk at EuroClojure: >>>> >>>> https://vimeo.com/45561411 >>>> >>>> >>>> On Friday, December 7, 2012 10:21:59 AM UTC+1, Balint Erdi wrote: >>>>> >>>>> Hey, >>>>> >>>>> Reducers is fascinating and quite complex at the same time. I feel >>>>> like "best practices" around it has not quite solidified yet. >>>>> >>>>> Here is how I made your example work: >>>>> >>>>> (ns group-by-reducers.core >>>>> (:require [clojure.core.reducers :as r :only [fold reduce map]]) >>>>> (:require [criterium.core :as c])) >>>>> >>>>> (defn group-by-naive [f coll] >>>>> (reduce >>>>> (fn [groups a] >>>>> (assoc groups (f a) (conj (get groups a []) a))) >>>>> {} >>>>> coll)) >>>>> >>>>> (defn group-by-red [f coll] >>>>> (letfn [(reduce-groups >>>>> ([] {}) >>>>> ([g1 g2] >>>>> (merge-with concat g1 g2)))] >>>>> (r/fold reduce-groups (r/map #(hash-map (f %) [%]) coll)))) >>>>> >>>>> (defn -main >>>>> [& args] >>>>> (c/quick-bench (group-by #(mod % 29) (range 10000))) >>>>> (c/quick-bench (group-by-naive #(mod % 29) (range 10000))) >>>>> (c/quick-bench (group-by-red #(mod % 29) (range 10000)))) >>>>> >>>>> (available as a gist here: >>>>> https://gist.github.com/**4232044<https://gist.github.com/4232044> >>>>> ) >>>>> >>>>> The core and naive versions perform roughly equally (~4ms) while the >>>>> reducers version takes 10x as much time (~40ms) on my two-core laptop. >>>>> >>>>> I'm absolutely sure I'm doing something wrong (there is probably too >>>>> much memory allocated by the map function) and would like to hear the >>>>> opinion of someone more knowledgeable with reducers. I just did not want >>>>> the subject to be buried. >>>>> >>>>> Hope that still helps a bit, >>>>> Balint >>>>> >>>>> ps. There are some tips and tricks in this post: >>>>> http://www.thebusby.com/**2012/07/tips-tricks-with-** >>>>> clojure-reducers.html<http://www.thebusby.com/2012/07/tips-tricks-with-clojure-reducers.html> >>>>> >>>>> On Monday, December 3, 2012 5:04:17 PM UTC+1, Las wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> As I was trying to wrap my head around the reducers library[1], I >>>>>> thought implementing group-by would be a good exercise to gain some >>>>>> insight. >>>>>> >>>>>> After spending a few hours with it, I'm still pretty much clueless, >>>>>> so hope to find someone here to help me out: >>>>>> >>>>>> So if I understood the reducer lingo introduced in [2],[3] and >>>>>> group-by correctly, it reduces the following reducing function on a >>>>>> collection >>>>>> >>>>>> (fn group-by-reducef [keyfn ret x] >>>>>> (let [k (keyfn x)] >>>>>> (assoc ret k (conj (get ret k []) x)))) >>>>>> >>>>>> where keyfn is provided by a partial function application. >>>>>> >>>>>> fold needs a combining function that takes two result maps that have >>>>>> already been grouped and merges them. >>>>>> A naive implementation could look like >>>>>> >>>>>> (defn group-by-combinef >>>>>> ([] {}) >>>>>> ([g1 g2] >>>>>> (persistent! >>>>>> (reduce (fn [res k v] >>>>>> (assoc! res k (into (get res k []) v))) >>>>>> (transient g1) g2)))) >>>>>> >>>>>> (defn group-by [f coll] >>>>>> (fold (partial gr-by-reducef f) gr-by-combinef coll)) >>>>>> >>>>>> Now couple of questions: >>>>>> 1) I expected fold to actually perform the operation, how can I force >>>>>> it to give me the result? >>>>>> 2) Can somehow the actual reducing at the leaf nodes still take >>>>>> advantage of transient collections? >>>>>> 3) I took a look at flatten as it seems the "closest" match. Again, >>>>>> if I call (flatten [[1 2] [2 4]]), I don't actually get the result. How >>>>>> do >>>>>> I get to the result? >>>>>> >>>>>> >>>>>> Thanks! >>>>>> >>>>>> [1] https://github.com/**clojure/clojure/blob/master/** >>>>>> src/clj/clojure/core/reducers.**clj<https://github.com/clojure/clojure/blob/master/src/clj/clojure/core/reducers.clj> >>>>>> [2] http://clojure.com/blog/**2012/05/08/reducers-a-library-** >>>>>> and-model-for-collection-**processing.html<http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html> >>>>>> [3] >>>>>> http://clojure.com/blog/**2012/05/15/anatomy-of-reducer.**html<http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html> >>>>>> -- >>>>>> László Török >>>>>> >>>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To post to this group, send email to clo...@googlegroups.com<javascript:> >>>> Note that posts from new members are moderated - please be patient with >>>> your first post. >>>> To unsubscribe from this group, send email to >>>> clojure+u...@googlegroups.com <javascript:> >>>> For more options, visit this group at >>>> http://groups.google.com/group/clojure?hl=en >>>> >>> >>> >>> >>> -- >>> On Clojure http://clj-me.cgrand.net/ >>> Clojure Programming http://clojurebook.com >>> Training, Consulting & Contracting http://lambdanext.eu/ >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Clojure" group. >>> To post to this group, send email to clo...@googlegroups.com<javascript:> >>> Note that posts from new members are moderated - please be patient with >>> your first post. >>> To unsubscribe from this group, send email to >>> clojure+u...@googlegroups.com <javascript:> >>> For more options, visit this group at >>> http://groups.google.com/group/clojure?hl=en >>> >> >> >> >> -- >> László Török >> >> -- >> You received this message because you are subscribed to the Google >> Groups "Clojure" group. >> To post to this group, send email to clo...@googlegroups.com<javascript:> >> Note that posts from new members are moderated - please be patient with >> your first post. >> To unsubscribe from this group, send email to >> clojure+u...@googlegroups.com <javascript:> >> For more options, visit this group at >> http://groups.google.com/group/clojure?hl=en >> > > > > -- > On Clojure http://clj-me.cgrand.net/ > Clojure Programming http://clojurebook.com > Training, Consulting & Contracting http://lambdanext.eu/ > > -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en