Re: fast parallel reduction into hash-set/map

Jules Sun, 16 Feb 2014 03:30:08 -0800

Thinking about it a bit more, it would be good to have an interface e.g. 
Spliceable which e.g. 'into' could take advantage of when it found itself 
concatenating two seq of the same implementation...


Further digging might demonstrate that a similar trick could be used with 
other seq types ?

Splice currently only supports 2 params - splice(left, right). If you could 
splice(lhs, rhses...) then I think it could be used more efficiently in a 
reducers combination phase... I would like to be able to recombine the 
subresults of multiple cores in a single iteration.

Finally, I would also be keen to investigate my idea about the efficient 
'cleave'-ing of tree-based seqs so that they can be used as inputs to the 
reducers library, as mentioned in my original post...


Jules


On Sunday, 16 February 2014 10:57:35 UTC, Jules wrote:
>
> I would have thought so - it's only my first cut - seems to work but I 
> wouldn't like to stake my life on it. It really needs a developer who is 
> familiar with PersistentHashMap to look it over and give it the thumbs 
> up...Still, I guess if it was marked "experimental" ...:-)
>
> Jules
>
>
> On Sunday, 16 February 2014 10:49:38 UTC, Mikera wrote:
>>
>> Wow - that's a pretty big win. I think we should try and get this into 
>> Clojure ASAP.
>>
>> Are we too late for 1.6?
>>
>> On Sunday, 16 February 2014 18:48:09 UTC+8, Jules wrote:
>>>
>>> Thanks, Mikera
>>>
>>> You are right about merge:
>>>
>>> user=> (def m1 (apply hash-map (range 10000000)))
>>> #'user/m1
>>> user=> (def m2 (apply hash-map (range 5000000 15000000)))
>>> #'user/m2
>>> user=> (time (def m3 (merge m1 m2)))
>>> "Elapsed time: 5432.184582 msecs"
>>> #'user/m3
>>> user=> (time (def m4 (clojure.lang.PersistentHashMap/splice m1 m2)))
>>> "Elapsed time: 1064.268269 msecs"
>>> #'user/m4
>>> user=> (= m3 m4)
>>> true
>>> user=> 
>>>
>>> as you would expect, a splice is faster and causes less of a memory 
>>> spike.
>>>
>>>
>>> Jules
>>>
>>>
>>> On Sunday, 16 February 2014 10:01:04 UTC, Mikera wrote:
>>>>
>>>> +1 for this approach - I've wanted something like this several times.
>>>>
>>>> It's only an "optimisation", but it's a very useful one. Same technique 
>>>> can probably be used to accelerate "merge" significantly  which is a 
>>>> pretty 
>>>> common operation when you are building map-like structures.
>>>>
>>>> On Sunday, 16 February 2014 07:06:24 UTC+8, Jules wrote:
>>>>>
>>>>> Guys,
>>>>>
>>>>> I've been playing with reducers on and off for a while but have been 
>>>>> frustrated because they don't seem to fit a particular usecase that I 
>>>>> have 
>>>>> in mind... specifically: getting as many associations into a hash-map as 
>>>>> as 
>>>>> I can in as short a time as possible.
>>>>>
>>>>> My understanding of the reason for this is that reducers practice a 
>>>>> divide and conquer strategy. The incoming sequence is divided up. Each 
>>>>> sub-sequence is reduced into a sub-result (possibly in parallel) and then 
>>>>> the sub-results are combined into the the final outgoing result.
>>>>>
>>>>> Unfortunately, there does not seem to be a better way of combining two 
>>>>> hash-maps other than to read each entry from one and create a new 
>>>>> corresponding association in the other. This means that each 
>>>>> recombination 
>>>>> in the above process essentially repeats most of the work already 
>>>>> performed 
>>>>> in the previous reduction stage.
>>>>>
>>>>> Hash-sets are implemented via hash-maps, and simpler with which to 
>>>>> demonstrate this problem:
>>>>>
>>>>> user=> (def a (doall (range 10000000)))
>>>>> #'user/a
>>>>> user=> (def b (doall (range 5000000 15000000)))
>>>>> #'user/b
>>>>> user=> (time (def c (into #{} a)))
>>>>> "Elapsed time: 6319.392669 msecs"
>>>>> #'user/c
>>>>> user=> (time (def d (into #{} b)))
>>>>> "Elapsed time: 5389.805233 msecs"
>>>>> #'user/d
>>>>> user=> (time (def e (into c d)))
>>>>> "Elapsed time: 8486.032191 msecs"
>>>>> #'user/e
>>>>>
>>>>>
>>>>> In the example above, you can see that the reduction into hash-sets of 
>>>>> two overlapping lists of 10,000,000 elements takes 6.3 and 5.4 seconds. 
>>>>> This stage can be carried out in parallel i.e. time elapsed for this 
>>>>> stage 
>>>>> would be 6.3 seconds - but we now have two hash-sets and we want one, so 
>>>>> we 
>>>>> have to combine them.
>>>>>
>>>>>
>>>>> user=> (time (def e (into c d)))
>>>>> "Elapsed time: 8486.032191 msecs"
>>>>> #'user/e
>>>>>
>>>>> As you can see, all the advantages of splitting the original sequence 
>>>>> into 2 and processing the two halves in parallel are lost since the 
>>>>> recombination or their results takes 8.5 seconds - more than we saved by 
>>>>> doing the reduction in parallel.
>>>>>
>>>>> So, what can we do about it ?
>>>>>
>>>>> I had a look at the code for PersistentHashMap (PersistentHashSet uses 
>>>>> PersistantHashMap internally). I realised that it was possible to 
>>>>> "splice" 
>>>>> together the internal structure of two hash maps into a single one 
>>>>> without 
>>>>> repeating most of the work required to build one from scratch. So, I had 
>>>>> a 
>>>>> go at implementing it:
>>>>>
>>>>>
>>>>> user=> (time (def f (clojure.lang.PersistentHashSet/splice c d)))
>>>>> "Elapsed time: 3052.690911 msecs"
>>>>> #'user/f
>>>>>
>>>>> and:
>>>>>
>>>>> user=> (= e f)
>>>>> true
>>>>>
>>>>> Whilst this is still adding 3 seconds to our time, that 3 seconds is 
>>>>> half the time that we would have added had we executed the second 
>>>>> reduction 
>>>>> in serial, rather than in parallel.
>>>>>
>>>>> This means that we can now reduce large datasets into sets/maps more 
>>>>> quickly in parallel than we can in serial :-) As an added benefit, 
>>>>> because 
>>>>> splice reuses as much of the internal structure of both inputs as 
>>>>> possible, 
>>>>> it's impact in terms of heap consumption and churn is less - although I 
>>>>> think that a full implementation might add some Java-side code complexity.
>>>>>
>>>>> If you would like to give 'splice' a try out, you will need to clone 
>>>>> my fork of clojure at github 
>>>>>
>>>>> https://github.com/JulesGosnell/clojure
>>>>>
>>>>> Please let me know if you try out the code. I would be interested to 
>>>>> hear if people think it is worth pursuing.
>>>>>
>>>>> I was also thinking that it should be possible to use a similar trick 
>>>>> to quickly and cheaply split a map/set into [roughly] equal sized pieces 
>>>>> (assuming an good hash distribution). This would enable the use of a 
>>>>> map/set as an input sequence into the parallel reduction process outlined 
>>>>> above. Currently, I believe that only a vector can be used in this way. 
>>>>> It 
>>>>> would be harder to arrange that 'count' could be implemented efficiently 
>>>>> on 
>>>>> these sub-maps/sets, but this is not important for the reduction process.
>>>>>
>>>>> BTW - benchmarks were run on a 3.2ghz Phenom II / clojure/master / 
>>>>> openjdk-1.7.0_51 / Fedora 20 with min and max 4gb ram.
>>>>>
>>>>> regards,
>>>>>
>>>>>
>>>>>
>>>>> Jules
>>>>>
>>>>>
>>>>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: fast parallel reduction into hash-set/map

Reply via email to