Re: fast parallel reduction into hash-set/map

Jules Thu, 20 Feb 2014 16:02:58 -0800

I did give this some thought :-)

Reading the code for subvec, it looks like if you try to take a subvec of a 
subvec the top subvec cuts the intermediate subvec out of the process by 
recalculating a new projection on the underlying vector, relevant to the 
offsets imposed by the intermediate subvec. i.e. you will only ever travel 
through a single subvec before hitting the underlying vector.


I thought about doing something like this for supervec, but unlike subvec 
where you would only ever have to deal with a single underlying vector, 
with supervec i would have to manage an array of underlying vectors. I 
decided for the first cut, to just go with a binary impl.

In future I could try something like subvec, but I am also considering 
multivec :-) - both a subvec and a supervec. This would be able to collapse 
subvecs and supervecs into a single layer / set of offset calcs...

The jury is still out, but I thought I should put my ideas out onto the 
forum to kick off useful discussions just like this one ...

thanks for your interest,


Jules

On Thursday, 20 February 2014 23:51:46 UTC, shlomi...@gmail.com wrote:
>
> Hey Jules, 
>
> Really nice stuff your making!
> One note about your SuperVecs idea though, it seems that using that 
> approach we could get a deeply nested tree to represent our a vector, and I 
> think this is the exact thing vectors are trying to avoid.. 
>
>
> On Friday, February 21, 2014 1:39:56 AM UTC+2, Jules wrote:
>>
>> Hi, Andy.
>>
>> No I haven't - Thanks for the pointer - I'll take a look.
>>
>> I have a very specific usecase - the speeding up of the re-combination of 
>> the results of parallel reductions.
>>
>> I've found that the cost of this recombination often impacts badly on the 
>> overall timing of such a reduction, but with a little thought we can speed 
>> it up because we know that both seqs involved in the recombination are of 
>> the same type...
>>
>> Given the proliferation of cores and FP's promise of simple and elegant 
>> parallel code, I think that it is critical that Clojure really delivers on 
>> this promise and this issue is the last blocker for me for a couple of 
>> usecases that I have :-)
>>
>> I hope that explains where I am coming from - does it seem reasonable ? 
>> am I missing anything ?
>>
>> thanks again,
>>
>>
>> Jules
>>   
>>
>> On Thursday, 20 February 2014 23:25:29 UTC, Andy Fingerhut wrote:
>>>
>>> Have you looked at core.rrb-vector, which implements Relaxed Radix Trees 
>>> for vectors that implement subvec and concatenating two arbitrary vectors 
>>> in O(log n) time?
>>>
>>>     https://github.com/clojure/core.rrb-vector
>>>
>>> Unless I am missing something, the fast concatenation is what you are 
>>> trying to achieve?
>>>
>>> Andy
>>>
>>>
>>> On Thu, Feb 20, 2014 at 2:59 PM, Jules <jules....@gmail.com> wrote:
>>>
>>>>
>>>> So, having broken the back of fast re-combination of hash sets and 
>>>> maps, I wanted to take a look at doing a similar sort of thing for vectors 
>>>> - another type of seq that I use very heavily in this sort of situation.
>>>>
>>>> Let's see the cake first:
>>>>
>>>> seqspert.vector=> (def a (into [] (range 10000000)))
>>>> #'seqspert.vector/a
>>>> seqspert.vector=> (time (def b (into a a)))
>>>> "Elapsed time: 3695.682654 msecs"
>>>> #'seqspert.vector/b
>>>> seqspert.vector=> (time (def c (supervec a a)))
>>>> "Elapsed time: 0.253922 msecs"
>>>> #'seqspert.vector/c
>>>> seqspert.vector=> (= b c)
>>>> true
>>>>
>>>> As you can see, the usual method of combining two vectors into a third 
>>>> takes 3-4 secs to combine two 10M entry vectors.
>>>>
>>>> Using 'supervec' it takes MUCH less time.....
>>>>
>>>> How ?
>>>>
>>>> I looked at vector's impl and quickly realised that I would not be able 
>>>> to use the same trick that I had used for hash set/map. Then I looked at 
>>>> subvec and realised that I could do a very similar trick to create 
>>>> supervec.
>>>>
>>>> Subvec provides a view on a subseq of a vector that behaves like a full 
>>>> vector. Supervec provides a similar view that makes two vectors behave 
>>>> like 
>>>> a single one:
>>>>
>>>> seqspert.vector=> (supervec [1 2 3] [4 5 6])
>>>> [1 2 3 4 5 6]
>>>> seqspert.vector=> (type (supervec [1 2 3] [4 5 6]))
>>>> clojure.lang.APersistentVector$SuperVector
>>>>
>>>> We can use seqspert (see my other posting this evening) to look inside 
>>>> a SuperVector:
>>>>
>>>> seqspert.vector=> (decloak (supervec [1 2 3] [4 5 6]))
>>>> #seqspert.vector.SuperVector{:left #seqspert.vector.Vector{:cnt 3, 
>>>> :shift 5, :root #seqspert.vector.VectorNode{:array [nil nil nil nil nil 
>>>> nil 
>>>> nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil 
>>>> nil 
>>>> nil nil nil nil nil nil nil]}, :tail [1 2 3]}, :right 
>>>> #seqspert.vector.Vector{:cnt 3, :shift 5, :root 
>>>> #seqspert.vector.VectorNode{:array [nil nil nil nil nil nil nil nil nil 
>>>> nil 
>>>> nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil 
>>>> nil 
>>>> nil nil nil]}, :tail [4 5 6]}, :middle 3, :count 6}
>>>> seqspert.vector=> (p/pprint (decloak (supervec [1 2 3] [4 5 6])))
>>>> {:left
>>>>  {:cnt 3,
>>>>   :shift 5,
>>>>   :root
>>>>   {:array
>>>>    [nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil]},
>>>>   :tail [1 2 3]},
>>>>  :right
>>>>  {:cnt 3,
>>>>   :shift 5,
>>>>   :root
>>>>   {:array
>>>>    [nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil
>>>>     nil]},
>>>>   :tail [4 5 6]},
>>>>  :middle 3,
>>>>  :count 6}
>>>> nil
>>>> seqspert.vector=> 
>>>>
>>>> It looks very complicated - that is the two clojurer vectors held 
>>>> inside ! A SuperVector has fields left, right (the two vectors that we are 
>>>> combining), middle (the length of the left vector), and count (the sum of 
>>>> the counts of left and right).
>>>>
>>>> With just this info and very little code, I  found myself with a 
>>>> working impl (except 'pop' but who 'pop's a vector?).
>>>>
>>>> OK - there is a tiny extra runtime overhead associated with using a 
>>>> supervec - but, when you look at what is involved in dereffing into a 
>>>> vector, this quickly appears insignificant. It's about the same a using a 
>>>> subvec.
>>>>
>>>> The SuperVector code is checked into my clojure fork on github - see 
>>>> beginning of this thread. Seqspert is also at github - see the 
>>>> announcement 
>>>> I posted this evening.
>>>>
>>>> Finally 'supervec' is currently defined as:
>>>>
>>>> (defn supervec [l r] (APersistentVector$SuperVector. {} l r))
>>>>
>>>> I am thinking or renaming 'splice' from my earlier posting to 'combine' 
>>>> and creating a Combinable interface, to be implemented by any seq type 
>>>> that 
>>>> is able to do a fast re-combination with another seq of the same type. 
>>>> This 
>>>> could be leveraged by e.g. 'into' to deliver significant performance 
>>>> benefits and footprint reduction.
>>>>
>>>> more as and when,
>>>>
>>>>
>>>> Jules
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Saturday, 15 February 2014 23:06:24 UTC, Jules wrote:
>>>>
>>>>> Guys,
>>>>>
>>>>> I've been playing with reducers on and off for a while but have been 
>>>>> frustrated because they don't seem to fit a particular usecase that I 
>>>>> have 
>>>>> in mind... specifically: getting as many associations into a hash-map as 
>>>>> as 
>>>>> I can in as short a time as possible.
>>>>>
>>>>> My understanding of the reason for this is that reducers practice a 
>>>>> divide and conquer strategy. The incoming sequence is divided up. Each 
>>>>> sub-sequence is reduced into a sub-result (possibly in parallel) and then 
>>>>> the sub-results are combined into the the final outgoing result.
>>>>>
>>>>> Unfortunately, there does not seem to be a better way of combining two 
>>>>> hash-maps other than to read each entry from one and create a new 
>>>>> corresponding association in the other. This means that each 
>>>>> recombination 
>>>>> in the above process essentially repeats most of the work already 
>>>>> performed 
>>>>> in the previous reduction stage.
>>>>>
>>>>> Hash-sets are implemented via hash-maps, and simpler with which to 
>>>>> demonstrate this problem:
>>>>>
>>>>> user=> (def a (doall (range 10000000)))
>>>>> #'user/a
>>>>> user=> (def b (doall (range 5000000 15000000)))
>>>>> #'user/b
>>>>> user=> (time (def c (into #{} a)))
>>>>> "Elapsed time: 6319.392669 msecs"
>>>>> #'user/c
>>>>> user=> (time (def d (into #{} b)))
>>>>> "Elapsed time: 5389.805233 msecs"
>>>>> #'user/d
>>>>> user=> (time (def e (into c d)))
>>>>> "Elapsed time: 8486.032191 msecs"
>>>>> #'user/e
>>>>>
>>>>>
>>>>> In the example above, you can see that the reduction into hash-sets of 
>>>>> two overlapping lists of 10,000,000 elements takes 6.3 and 5.4 seconds. 
>>>>> This stage can be carried out in parallel i.e. time elapsed for this 
>>>>> stage 
>>>>> would be 6.3 seconds - but we now have two hash-sets and we want one, so 
>>>>> we 
>>>>> have to combine them.
>>>>>
>>>>>
>>>>> user=> (time (def e (into c d)))
>>>>> "Elapsed time: 8486.032191 msecs"
>>>>> #'user/e
>>>>>
>>>>> As you can see, all the advantages of splitting the original sequence 
>>>>> into 2 and processing the two halves in parallel are lost since the 
>>>>> recombination or their results takes 8.5 seconds - more than we saved by 
>>>>> doing the reduction in parallel.
>>>>>
>>>>> So, what can we do about it ?
>>>>>
>>>>> I had a look at the code for PersistentHashMap (PersistentHashSet uses 
>>>>> PersistantHashMap internally). I realised that it was possible to 
>>>>> "splice" 
>>>>> together the internal structure of two hash maps into a single one 
>>>>> without 
>>>>> repeating most of the work required to build one from scratch. So, I had 
>>>>> a 
>>>>> go at implementing it:
>>>>>
>>>>>
>>>>> user=> (time (def f (clojure.lang.PersistentHashSet/splice c d)))
>>>>> "Elapsed time: 3052.690911 msecs"
>>>>> #'user/f
>>>>>
>>>>> and:
>>>>>
>>>>> user=> (= e f)
>>>>> true
>>>>>
>>>>> Whilst this is still adding 3 seconds to our time, that 3 seconds is 
>>>>> half the time that we would have added had we executed the second 
>>>>> reduction 
>>>>> in serial, rather than in parallel.
>>>>>
>>>>> This means that we can now reduce large datasets into sets/maps more 
>>>>> quickly in parallel than we can in serial :-) As an added benefit, 
>>>>> because 
>>>>> splice reuses as much of the internal structure of both inputs as 
>>>>> possible, 
>>>>> it's impact in terms of heap consumption and churn is less - although I 
>>>>> think that a full implementation might add some Java-side code complexity.
>>>>>
>>>>> If you would like to give 'splice' a try out, you will need to clone 
>>>>> my fork of clojure at github 
>>>>>
>>>>> https://github.com/JulesGosnell/clojure
>>>>>
>>>>> Please let me know if you try out the code. I would be interested to 
>>>>> hear if people think it is worth pursuing.
>>>>>
>>>>> I was also thinking that it should be possible to use a similar trick 
>>>>> to quickly and cheaply split a map/set into [roughly] equal sized pieces 
>>>>> (assuming an good hash distribution). This would enable the use of a 
>>>>> map/set as an input sequence into the parallel reduction process outlined 
>>>>> above. Currently, I believe that only a vector can be used in this way. 
>>>>> It 
>>>>> would be harder to arrange that 'count' could be implemented efficiently 
>>>>> on 
>>>>> these sub-maps/sets, but this is not important for the reduction process.
>>>>>
>>>>> BTW - benchmarks were run on a 3.2ghz Phenom II / clojure/master / 
>>>>> openjdk-1.7.0_51 / Fedora 20 with min and max 4gb ram.
>>>>>
>>>>> regards,
>>>>>
>>>>>
>>>>>
>>>>> Jules
>>>>>
>>>>>
>>>>>  -- 
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Clojure" group.
>>>> To post to this group, send email to clo...@googlegroups.com
>>>> Note that posts from new members are moderated - please be patient with 
>>>> your first post.
>>>> To unsubscribe from this group, send email to
>>>> clojure+u...@googlegroups.com
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/clojure?hl=en
>>>> --- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Clojure" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to clojure+u...@googlegroups.com.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: fast parallel reduction into hash-set/map

Reply via email to