Thank you, Christophe, that makes total sense.

Interestingly, on my machine, the reducers version is still slower:

Run # 0
"Elapsed time: 1105.586 msecs"
"Elapsed time: 685.8 msecs"
"Elapsed time: 549.969 msecs"
Run # 1
"Elapsed time: 497.831 msecs"
"Elapsed time: 489.932 msecs"
"Elapsed time: 520.38 msecs"
Run # 2
"Elapsed time: 363.873 msecs"
"Elapsed time: 489.529 msecs"
"Elapsed time: 556.117 msecs"
Run # 3
"Elapsed time: 357.822 msecs"
"Elapsed time: 429.894 msecs"
"Elapsed time: 540.975 msecs"
Run # 4
"Elapsed time: 425.276 msecs"
"Elapsed time: 431.819 msecs"
"Elapsed time: 528.8 msecs"

I guess the JVM finds some serious optimizations during the runs for the 
non-reducer versions. I saw the same difference when using criterium for 
benchmarking.

Balint

On Friday, December 7, 2012 1:44:58 PM UTC+1, Christophe Grand wrote:
>
> Absolutely
>
>
> On Fri, Dec 7, 2012 at 1:29 PM, László Török <ltor...@gmail.com<javascript:>
> > wrote:
>
>> Hi,
>>
>> shouldn't 
>>
>> (fn [groups a]
>>       (assoc groups (f a) (conj (get groups a []) a)))
>>
>> be 
>>
>> (fn [groups a]
>>       (let [k (f a)]
>>             (assoc groups k (conj (get groups k []) a))))
>>
>> ?
>>
>> Las
>>
>> 2012/12/7 Christophe Grand <chris...@cgrand.net <javascript:>>
>>
>>> Hi,
>>>
>>> There are indeed too much allocationfoing on in your r/map.
>>> You don't need the rmap,
>>>
>>> Start from a plain old reduce like your reduce-by-naive, replace reduce 
>>> by r/fold, remove the seed and add the combine-fn (which now provides the 
>>> seed):
>>>
>>> (defn group-by-red [f coll]
>>>   (r/fold
>>>     (partial merge-with concat)
>>>     (fn [groups a]
>>>       (assoc groups (f a) (conj (get groups a []) a)))
>>>     coll))
>>>
>>>
>>> Crude benchmark;
>>>  (let [v (vec (range 1000000))] 
>>>   (dotimes [n 5]
>>>     (println "Run" n)
>>>     (time (group-by #(mod % 29) v))
>>>     (time (group-by-naive #(mod % 29) v))
>>>     (time (group-by-red #(mod % 29) v))))
>>> Run 0
>>> "Elapsed time: 938.878 msecs"
>>> "Elapsed time: 778.161 msecs"
>>> "Elapsed time: 1035.997 msecs"
>>> Run 1
>>> "Elapsed time: 715.352 msecs"
>>> "Elapsed time: 771.913 msecs"
>>> "Elapsed time: 840.264 msecs"
>>> Run 2
>>> "Elapsed time: 656.588 msecs"
>>> "Elapsed time: 700.114 msecs"
>>> "Elapsed time: 777.323 msecs"
>>> Run 3
>>> "Elapsed time: 731.781 msecs"
>>> "Elapsed time: 713.547 msecs"
>>> "Elapsed time: 875.801 msecs"
>>> Run 4
>>> "Elapsed time: 689.711 msecs"
>>> "Elapsed time: 710.728 msecs"
>>> "Elapsed time: 1112.609 msecs"
>>> nil
>>>
>>> Let's play with the granularity (defaults to 512)
>>> (defn group-by-red [f coll]
>>>   (r/fold 2048
>>>     (partial merge-with concat)
>>>     (fn [groups a]
>>>       (assoc groups (f a) (conj (get groups a []) a)))
>>>     coll))
>>>
>>> Re-benchmark:
>>> Run 0
>>> "Elapsed time: 810.676 msecs"
>>> "Elapsed time: 736.764 msecs"
>>> "Elapsed time: 547.651 msecs"
>>> Run 1
>>> "Elapsed time: 681.249 msecs"
>>> "Elapsed time: 850.046 msecs"
>>> "Elapsed time: 521.27 msecs"
>>> Run 2
>>> "Elapsed time: 669.385 msecs"
>>> "Elapsed time: 712.15 msecs"
>>> "Elapsed time: 518.502 msecs"
>>> Run 3
>>> "Elapsed time: 673.108 msecs"
>>> "Elapsed time: 745.688 msecs"
>>> "Elapsed time: 542.837 msecs"
>>> Run 4
>>> "Elapsed time: 654.196 msecs"
>>> "Elapsed time: 723.074 msecs"
>>> "Elapsed time: 506.861 msecs"
>>>
>>> Note that you are not exactly comparing apples to apples since 
>>> group-by-red is returning maps whose values are unrealized lazy sequence 
>>> (concats of concats of ocncats of vactors) instead of vectors
>>>
>>> (defn group-by-red [f coll]
>>>   (r/fold 2048
>>>     (partial merge-with into)
>>>     (fn [groups a]
>>>       (assoc groups (f a) (conj (get groups a []) a)))
>>>     coll))
>>>
>>> Run 0
>>> "Elapsed time: 763.455 msecs"
>>> "Elapsed time: 763.501 msecs"
>>> "Elapsed time: 681.075 msecs"
>>> Run 1
>>> "Elapsed time: 645.52 msecs"
>>> "Elapsed time: 731.545 msecs"
>>> "Elapsed time: 476.381 msecs"
>>> Run 2
>>> "Elapsed time: 660.775 msecs"
>>> "Elapsed time: 728.19 msecs"
>>> "Elapsed time: 475.543 msecs"
>>> Run 3
>>> "Elapsed time: 657.255 msecs"
>>> "Elapsed time: 725.995 msecs"
>>> "Elapsed time: 494.038 msecs"
>>> Run 4
>>> "Elapsed time: 647.53 msecs"
>>> "Elapsed time: 731.085 msecs"
>>> "Elapsed time: 538.649 msecs"
>>>
>>> hth,
>>>
>>> Christophe
>>>
>>>
>>> On Fri, Dec 7, 2012 at 10:30 AM, Balint Erdi 
>>> <balin...@gmail.com<javascript:>
>>> > wrote:
>>>
>>>> BTW I understood the most about reducers (still not quite there yet, 
>>>> though :) ) from Rich Hickey's talk at EuroClojure:
>>>>
>>>> https://vimeo.com/45561411
>>>>
>>>>
>>>> On Friday, December 7, 2012 10:21:59 AM UTC+1, Balint Erdi wrote:
>>>>>
>>>>> Hey,
>>>>>
>>>>> Reducers is fascinating and quite complex at the same time. I feel 
>>>>> like "best practices" around it has not quite solidified yet.
>>>>>
>>>>> Here is how I made your example work:
>>>>>
>>>>> (ns group-by-reducers.core
>>>>>   (:require [clojure.core.reducers :as r :only [fold reduce map]])
>>>>>   (:require [criterium.core :as c]))
>>>>>
>>>>> (defn group-by-naive [f coll]
>>>>>   (reduce
>>>>>     (fn [groups a]
>>>>>       (assoc groups (f a) (conj (get groups a []) a)))
>>>>>     {}
>>>>>     coll))
>>>>>
>>>>> (defn group-by-red [f coll]
>>>>>   (letfn [(reduce-groups
>>>>>             ([] {})
>>>>>             ([g1 g2]
>>>>>               (merge-with concat g1 g2)))]
>>>>>     (r/fold reduce-groups (r/map #(hash-map (f %) [%]) coll))))
>>>>>
>>>>> (defn -main
>>>>>   [& args]
>>>>>   (c/quick-bench (group-by #(mod % 29) (range 10000)))
>>>>>   (c/quick-bench (group-by-naive #(mod % 29) (range 10000)))
>>>>>   (c/quick-bench (group-by-red #(mod % 29) (range 10000))))
>>>>>
>>>>> (available as a gist here: 
>>>>> https://gist.github.com/**4232044<https://gist.github.com/4232044>
>>>>> )
>>>>>
>>>>> The core and naive versions perform roughly equally (~4ms) while the 
>>>>> reducers version takes 10x as much time (~40ms) on my two-core laptop.
>>>>>
>>>>> I'm absolutely sure I'm doing something wrong (there is probably too 
>>>>> much memory allocated by the map function) and would like to hear the 
>>>>> opinion of someone more knowledgeable with reducers. I just did not want 
>>>>> the subject to be buried.
>>>>>
>>>>> Hope that still helps a bit,
>>>>> Balint
>>>>>
>>>>> ps. There are some tips and tricks in this post: 
>>>>> http://www.thebusby.com/**2012/07/tips-tricks-with-**
>>>>> clojure-reducers.html<http://www.thebusby.com/2012/07/tips-tricks-with-clojure-reducers.html>
>>>>>
>>>>> On Monday, December 3, 2012 5:04:17 PM UTC+1, Las wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> As I was trying to wrap my head around the reducers library[1], I 
>>>>>> thought implementing group-by would be a good exercise to gain some 
>>>>>> insight.
>>>>>>
>>>>>> After spending a few hours with it, I'm still pretty much clueless, 
>>>>>> so hope to find someone here to help me out:
>>>>>>
>>>>>> So if I understood the reducer lingo introduced in [2],[3] and 
>>>>>> group-by correctly, it reduces the following reducing function on a 
>>>>>> collection
>>>>>>
>>>>>> (fn group-by-reducef [keyfn ret x]
>>>>>>      (let [k (keyfn x)]
>>>>>>         (assoc ret k (conj (get ret k []) x))))
>>>>>>
>>>>>> where keyfn is provided by a partial function application.
>>>>>>
>>>>>> fold needs a combining function that takes two result maps that have 
>>>>>> already been grouped and merges them.
>>>>>> A naive implementation could look like
>>>>>>
>>>>>> (defn group-by-combinef
>>>>>>   ([] {})
>>>>>>   ([g1 g2]
>>>>>>      (persistent!
>>>>>>       (reduce (fn [res k v]
>>>>>>                 (assoc! res k (into (get res k []) v)))
>>>>>>               (transient g1) g2))))
>>>>>>
>>>>>> (defn group-by [f coll]
>>>>>>   (fold (partial gr-by-reducef f) gr-by-combinef coll))
>>>>>>
>>>>>> Now couple of questions:
>>>>>> 1) I expected fold to actually perform the operation, how can I force 
>>>>>> it to give me the result?
>>>>>> 2) Can somehow the actual reducing at the leaf nodes still take 
>>>>>> advantage of transient collections?
>>>>>> 3) I took a look at flatten as it seems the "closest" match. Again, 
>>>>>> if I call (flatten [[1 2] [2 4]]), I don't actually get the result. How 
>>>>>> do 
>>>>>> I get to the result?
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> [1] https://github.com/**clojure/clojure/blob/master/**
>>>>>> src/clj/clojure/core/reducers.**clj<https://github.com/clojure/clojure/blob/master/src/clj/clojure/core/reducers.clj>
>>>>>> [2] http://clojure.com/blog/**2012/05/08/reducers-a-library-**
>>>>>> and-model-for-collection-**processing.html<http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html>
>>>>>> [3] 
>>>>>> http://clojure.com/blog/**2012/05/15/anatomy-of-reducer.**html<http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html>
>>>>>> -- 
>>>>>> László Török
>>>>>>
>>>>>>   -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>>>> Note that posts from new members are moderated - please be patient with 
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clojure+u...@googlegroups.com <javascript:>
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>>
>>>
>>>
>>>
>>> -- 
>>> On Clojure http://clj-me.cgrand.net/
>>> Clojure Programming http://clojurebook.com
>>> Training, Consulting & Contracting http://lambdanext.eu/ 
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google
>>> Groups "Clojure" group.
>>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>>> Note that posts from new members are moderated - please be patient with 
>>> your first post.
>>> To unsubscribe from this group, send email to
>>> clojure+u...@googlegroups.com <javascript:>
>>> For more options, visit this group at
>>> http://groups.google.com/group/clojure?hl=en
>>>
>>
>>
>>
>> -- 
>> László Török
>>
>>  -- 
>> You received this message because you are subscribed to the Google
>> Groups "Clojure" group.
>> To post to this group, send email to clo...@googlegroups.com<javascript:>
>> Note that posts from new members are moderated - please be patient with 
>> your first post.
>> To unsubscribe from this group, send email to
>> clojure+u...@googlegroups.com <javascript:>
>> For more options, visit this group at
>> http://groups.google.com/group/clojure?hl=en
>>
>
>
>
> -- 
> On Clojure http://clj-me.cgrand.net/
> Clojure Programming http://clojurebook.com
> Training, Consulting & Contracting http://lambdanext.eu/ 
>
> 

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to