Hi,

There are indeed too much allocationfoing on in your r/map.
You don't need the rmap,

Start from a plain old reduce like your reduce-by-naive, replace reduce by
r/fold, remove the seed and add the combine-fn (which now provides the
seed):

(defn group-by-red [f coll]
  (r/fold
    (partial merge-with concat)
    (fn [groups a]
      (assoc groups (f a) (conj (get groups a []) a)))
    coll))


Crude benchmark;
 (let [v (vec (range 1000000))]
  (dotimes [n 5]
    (println "Run" n)
    (time (group-by #(mod % 29) v))
    (time (group-by-naive #(mod % 29) v))
    (time (group-by-red #(mod % 29) v))))
Run 0
"Elapsed time: 938.878 msecs"
"Elapsed time: 778.161 msecs"
"Elapsed time: 1035.997 msecs"
Run 1
"Elapsed time: 715.352 msecs"
"Elapsed time: 771.913 msecs"
"Elapsed time: 840.264 msecs"
Run 2
"Elapsed time: 656.588 msecs"
"Elapsed time: 700.114 msecs"
"Elapsed time: 777.323 msecs"
Run 3
"Elapsed time: 731.781 msecs"
"Elapsed time: 713.547 msecs"
"Elapsed time: 875.801 msecs"
Run 4
"Elapsed time: 689.711 msecs"
"Elapsed time: 710.728 msecs"
"Elapsed time: 1112.609 msecs"
nil

Let's play with the granularity (defaults to 512)
(defn group-by-red [f coll]
  (r/fold 2048
    (partial merge-with concat)
    (fn [groups a]
      (assoc groups (f a) (conj (get groups a []) a)))
    coll))

Re-benchmark:
Run 0
"Elapsed time: 810.676 msecs"
"Elapsed time: 736.764 msecs"
"Elapsed time: 547.651 msecs"
Run 1
"Elapsed time: 681.249 msecs"
"Elapsed time: 850.046 msecs"
"Elapsed time: 521.27 msecs"
Run 2
"Elapsed time: 669.385 msecs"
"Elapsed time: 712.15 msecs"
"Elapsed time: 518.502 msecs"
Run 3
"Elapsed time: 673.108 msecs"
"Elapsed time: 745.688 msecs"
"Elapsed time: 542.837 msecs"
Run 4
"Elapsed time: 654.196 msecs"
"Elapsed time: 723.074 msecs"
"Elapsed time: 506.861 msecs"

Note that you are not exactly comparing apples to apples since group-by-red
is returning maps whose values are unrealized lazy sequence (concats of
concats of ocncats of vactors) instead of vectors

(defn group-by-red [f coll]
  (r/fold 2048
    (partial merge-with into)
    (fn [groups a]
      (assoc groups (f a) (conj (get groups a []) a)))
    coll))

Run 0
"Elapsed time: 763.455 msecs"
"Elapsed time: 763.501 msecs"
"Elapsed time: 681.075 msecs"
Run 1
"Elapsed time: 645.52 msecs"
"Elapsed time: 731.545 msecs"
"Elapsed time: 476.381 msecs"
Run 2
"Elapsed time: 660.775 msecs"
"Elapsed time: 728.19 msecs"
"Elapsed time: 475.543 msecs"
Run 3
"Elapsed time: 657.255 msecs"
"Elapsed time: 725.995 msecs"
"Elapsed time: 494.038 msecs"
Run 4
"Elapsed time: 647.53 msecs"
"Elapsed time: 731.085 msecs"
"Elapsed time: 538.649 msecs"

hth,

Christophe

On Fri, Dec 7, 2012 at 10:30 AM, Balint Erdi <balint.e...@gmail.com> wrote:

> BTW I understood the most about reducers (still not quite there yet,
> though :) ) from Rich Hickey's talk at EuroClojure:
>
> https://vimeo.com/45561411
>
>
> On Friday, December 7, 2012 10:21:59 AM UTC+1, Balint Erdi wrote:
>>
>> Hey,
>>
>> Reducers is fascinating and quite complex at the same time. I feel like
>> "best practices" around it has not quite solidified yet.
>>
>> Here is how I made your example work:
>>
>> (ns group-by-reducers.core
>>   (:require [clojure.core.reducers :as r :only [fold reduce map]])
>>   (:require [criterium.core :as c]))
>>
>> (defn group-by-naive [f coll]
>>   (reduce
>>     (fn [groups a]
>>       (assoc groups (f a) (conj (get groups a []) a)))
>>     {}
>>     coll))
>>
>> (defn group-by-red [f coll]
>>   (letfn [(reduce-groups
>>             ([] {})
>>             ([g1 g2]
>>               (merge-with concat g1 g2)))]
>>     (r/fold reduce-groups (r/map #(hash-map (f %) [%]) coll))))
>>
>> (defn -main
>>   [& args]
>>   (c/quick-bench (group-by #(mod % 29) (range 10000)))
>>   (c/quick-bench (group-by-naive #(mod % 29) (range 10000)))
>>   (c/quick-bench (group-by-red #(mod % 29) (range 10000))))
>>
>> (available as a gist here: 
>> https://gist.github.com/**4232044<https://gist.github.com/4232044>
>> )
>>
>> The core and naive versions perform roughly equally (~4ms) while the
>> reducers version takes 10x as much time (~40ms) on my two-core laptop.
>>
>> I'm absolutely sure I'm doing something wrong (there is probably too much
>> memory allocated by the map function) and would like to hear the opinion of
>> someone more knowledgeable with reducers. I just did not want the subject
>> to be buried.
>>
>> Hope that still helps a bit,
>> Balint
>>
>> ps. There are some tips and tricks in this post: http://www.thebusby.com/
>> **2012/07/tips-tricks-with-**clojure-reducers.html<http://www.thebusby.com/2012/07/tips-tricks-with-clojure-reducers.html>
>>
>> On Monday, December 3, 2012 5:04:17 PM UTC+1, Las wrote:
>>>
>>> Hi,
>>>
>>> As I was trying to wrap my head around the reducers library[1], I
>>> thought implementing group-by would be a good exercise to gain some insight.
>>>
>>> After spending a few hours with it, I'm still pretty much clueless, so
>>> hope to find someone here to help me out:
>>>
>>> So if I understood the reducer lingo introduced in [2],[3] and group-by
>>> correctly, it reduces the following reducing function on a collection
>>>
>>> (fn group-by-reducef [keyfn ret x]
>>>      (let [k (keyfn x)]
>>>         (assoc ret k (conj (get ret k []) x))))
>>>
>>> where keyfn is provided by a partial function application.
>>>
>>> fold needs a combining function that takes two result maps that have
>>> already been grouped and merges them.
>>> A naive implementation could look like
>>>
>>> (defn group-by-combinef
>>>   ([] {})
>>>   ([g1 g2]
>>>      (persistent!
>>>       (reduce (fn [res k v]
>>>                 (assoc! res k (into (get res k []) v)))
>>>               (transient g1) g2))))
>>>
>>> (defn group-by [f coll]
>>>   (fold (partial gr-by-reducef f) gr-by-combinef coll))
>>>
>>> Now couple of questions:
>>> 1) I expected fold to actually perform the operation, how can I force it
>>> to give me the result?
>>> 2) Can somehow the actual reducing at the leaf nodes still take
>>> advantage of transient collections?
>>> 3) I took a look at flatten as it seems the "closest" match. Again, if I
>>> call (flatten [[1 2] [2 4]]), I don't actually get the result. How do I get
>>> to the result?
>>>
>>>
>>> Thanks!
>>>
>>> [1] https://github.com/**clojure/clojure/blob/master/**
>>> src/clj/clojure/core/reducers.**clj<https://github.com/clojure/clojure/blob/master/src/clj/clojure/core/reducers.clj>
>>> [2] http://clojure.com/blog/**2012/05/08/reducers-a-library-**
>>> and-model-for-collection-**processing.html<http://clojure.com/blog/2012/05/08/reducers-a-library-and-model-for-collection-processing.html>
>>> [3] 
>>> http://clojure.com/blog/**2012/05/15/anatomy-of-reducer.**html<http://clojure.com/blog/2012/05/15/anatomy-of-reducer.html>
>>> --
>>> László Török
>>>
>>>   --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clojure@googlegroups.com
> Note that posts from new members are moderated - please be patient with
> your first post.
> To unsubscribe from this group, send email to
> clojure+unsubscr...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
>



-- 
On Clojure http://clj-me.cgrand.net/
Clojure Programming http://clojurebook.com
Training, Consulting & Contracting http://lambdanext.eu/

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to