I did give this some thought :-) Reading the code for subvec, it looks like if you try to take a subvec of a subvec the top subvec cuts the intermediate subvec out of the process by recalculating a new projection on the underlying vector, relevant to the offsets imposed by the intermediate subvec. i.e. you will only ever travel through a single subvec before hitting the underlying vector.
I thought about doing something like this for supervec, but unlike subvec where you would only ever have to deal with a single underlying vector, with supervec i would have to manage an array of underlying vectors. I decided for the first cut, to just go with a binary impl. In future I could try something like subvec, but I am also considering multivec :-) - both a subvec and a supervec. This would be able to collapse subvecs and supervecs into a single layer / set of offset calcs... The jury is still out, but I thought I should put my ideas out onto the forum to kick off useful discussions just like this one ... thanks for your interest, Jules On Thursday, 20 February 2014 23:51:46 UTC, shlomi...@gmail.com wrote: > > Hey Jules, > > Really nice stuff your making! > One note about your SuperVecs idea though, it seems that using that > approach we could get a deeply nested tree to represent our a vector, and I > think this is the exact thing vectors are trying to avoid.. > > > On Friday, February 21, 2014 1:39:56 AM UTC+2, Jules wrote: >> >> Hi, Andy. >> >> No I haven't - Thanks for the pointer - I'll take a look. >> >> I have a very specific usecase - the speeding up of the re-combination of >> the results of parallel reductions. >> >> I've found that the cost of this recombination often impacts badly on the >> overall timing of such a reduction, but with a little thought we can speed >> it up because we know that both seqs involved in the recombination are of >> the same type... >> >> Given the proliferation of cores and FP's promise of simple and elegant >> parallel code, I think that it is critical that Clojure really delivers on >> this promise and this issue is the last blocker for me for a couple of >> usecases that I have :-) >> >> I hope that explains where I am coming from - does it seem reasonable ? >> am I missing anything ? >> >> thanks again, >> >> >> Jules >> >> >> On Thursday, 20 February 2014 23:25:29 UTC, Andy Fingerhut wrote: >>> >>> Have you looked at core.rrb-vector, which implements Relaxed Radix Trees >>> for vectors that implement subvec and concatenating two arbitrary vectors >>> in O(log n) time? >>> >>> https://github.com/clojure/core.rrb-vector >>> >>> Unless I am missing something, the fast concatenation is what you are >>> trying to achieve? >>> >>> Andy >>> >>> >>> On Thu, Feb 20, 2014 at 2:59 PM, Jules <jules....@gmail.com> wrote: >>> >>>> >>>> So, having broken the back of fast re-combination of hash sets and >>>> maps, I wanted to take a look at doing a similar sort of thing for vectors >>>> - another type of seq that I use very heavily in this sort of situation. >>>> >>>> Let's see the cake first: >>>> >>>> seqspert.vector=> (def a (into [] (range 10000000))) >>>> #'seqspert.vector/a >>>> seqspert.vector=> (time (def b (into a a))) >>>> "Elapsed time: 3695.682654 msecs" >>>> #'seqspert.vector/b >>>> seqspert.vector=> (time (def c (supervec a a))) >>>> "Elapsed time: 0.253922 msecs" >>>> #'seqspert.vector/c >>>> seqspert.vector=> (= b c) >>>> true >>>> >>>> As you can see, the usual method of combining two vectors into a third >>>> takes 3-4 secs to combine two 10M entry vectors. >>>> >>>> Using 'supervec' it takes MUCH less time..... >>>> >>>> How ? >>>> >>>> I looked at vector's impl and quickly realised that I would not be able >>>> to use the same trick that I had used for hash set/map. Then I looked at >>>> subvec and realised that I could do a very similar trick to create >>>> supervec. >>>> >>>> Subvec provides a view on a subseq of a vector that behaves like a full >>>> vector. Supervec provides a similar view that makes two vectors behave >>>> like >>>> a single one: >>>> >>>> seqspert.vector=> (supervec [1 2 3] [4 5 6]) >>>> [1 2 3 4 5 6] >>>> seqspert.vector=> (type (supervec [1 2 3] [4 5 6])) >>>> clojure.lang.APersistentVector$SuperVector >>>> >>>> We can use seqspert (see my other posting this evening) to look inside >>>> a SuperVector: >>>> >>>> seqspert.vector=> (decloak (supervec [1 2 3] [4 5 6])) >>>> #seqspert.vector.SuperVector{:left #seqspert.vector.Vector{:cnt 3, >>>> :shift 5, :root #seqspert.vector.VectorNode{:array [nil nil nil nil nil >>>> nil >>>> nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil >>>> nil >>>> nil nil nil nil nil nil nil]}, :tail [1 2 3]}, :right >>>> #seqspert.vector.Vector{:cnt 3, :shift 5, :root >>>> #seqspert.vector.VectorNode{:array [nil nil nil nil nil nil nil nil nil >>>> nil >>>> nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil >>>> nil >>>> nil nil nil]}, :tail [4 5 6]}, :middle 3, :count 6} >>>> seqspert.vector=> (p/pprint (decloak (supervec [1 2 3] [4 5 6]))) >>>> {:left >>>> {:cnt 3, >>>> :shift 5, >>>> :root >>>> {:array >>>> [nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil]}, >>>> :tail [1 2 3]}, >>>> :right >>>> {:cnt 3, >>>> :shift 5, >>>> :root >>>> {:array >>>> [nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil >>>> nil]}, >>>> :tail [4 5 6]}, >>>> :middle 3, >>>> :count 6} >>>> nil >>>> seqspert.vector=> >>>> >>>> It looks very complicated - that is the two clojurer vectors held >>>> inside ! A SuperVector has fields left, right (the two vectors that we are >>>> combining), middle (the length of the left vector), and count (the sum of >>>> the counts of left and right). >>>> >>>> With just this info and very little code, I found myself with a >>>> working impl (except 'pop' but who 'pop's a vector?). >>>> >>>> OK - there is a tiny extra runtime overhead associated with using a >>>> supervec - but, when you look at what is involved in dereffing into a >>>> vector, this quickly appears insignificant. It's about the same a using a >>>> subvec. >>>> >>>> The SuperVector code is checked into my clojure fork on github - see >>>> beginning of this thread. Seqspert is also at github - see the >>>> announcement >>>> I posted this evening. >>>> >>>> Finally 'supervec' is currently defined as: >>>> >>>> (defn supervec [l r] (APersistentVector$SuperVector. {} l r)) >>>> >>>> I am thinking or renaming 'splice' from my earlier posting to 'combine' >>>> and creating a Combinable interface, to be implemented by any seq type >>>> that >>>> is able to do a fast re-combination with another seq of the same type. >>>> This >>>> could be leveraged by e.g. 'into' to deliver significant performance >>>> benefits and footprint reduction. >>>> >>>> more as and when, >>>> >>>> >>>> Jules >>>> >>>> >>>> >>>> >>>> >>>> On Saturday, 15 February 2014 23:06:24 UTC, Jules wrote: >>>> >>>>> Guys, >>>>> >>>>> I've been playing with reducers on and off for a while but have been >>>>> frustrated because they don't seem to fit a particular usecase that I >>>>> have >>>>> in mind... specifically: getting as many associations into a hash-map as >>>>> as >>>>> I can in as short a time as possible. >>>>> >>>>> My understanding of the reason for this is that reducers practice a >>>>> divide and conquer strategy. The incoming sequence is divided up. Each >>>>> sub-sequence is reduced into a sub-result (possibly in parallel) and then >>>>> the sub-results are combined into the the final outgoing result. >>>>> >>>>> Unfortunately, there does not seem to be a better way of combining two >>>>> hash-maps other than to read each entry from one and create a new >>>>> corresponding association in the other. This means that each >>>>> recombination >>>>> in the above process essentially repeats most of the work already >>>>> performed >>>>> in the previous reduction stage. >>>>> >>>>> Hash-sets are implemented via hash-maps, and simpler with which to >>>>> demonstrate this problem: >>>>> >>>>> user=> (def a (doall (range 10000000))) >>>>> #'user/a >>>>> user=> (def b (doall (range 5000000 15000000))) >>>>> #'user/b >>>>> user=> (time (def c (into #{} a))) >>>>> "Elapsed time: 6319.392669 msecs" >>>>> #'user/c >>>>> user=> (time (def d (into #{} b))) >>>>> "Elapsed time: 5389.805233 msecs" >>>>> #'user/d >>>>> user=> (time (def e (into c d))) >>>>> "Elapsed time: 8486.032191 msecs" >>>>> #'user/e >>>>> >>>>> >>>>> In the example above, you can see that the reduction into hash-sets of >>>>> two overlapping lists of 10,000,000 elements takes 6.3 and 5.4 seconds. >>>>> This stage can be carried out in parallel i.e. time elapsed for this >>>>> stage >>>>> would be 6.3 seconds - but we now have two hash-sets and we want one, so >>>>> we >>>>> have to combine them. >>>>> >>>>> >>>>> user=> (time (def e (into c d))) >>>>> "Elapsed time: 8486.032191 msecs" >>>>> #'user/e >>>>> >>>>> As you can see, all the advantages of splitting the original sequence >>>>> into 2 and processing the two halves in parallel are lost since the >>>>> recombination or their results takes 8.5 seconds - more than we saved by >>>>> doing the reduction in parallel. >>>>> >>>>> So, what can we do about it ? >>>>> >>>>> I had a look at the code for PersistentHashMap (PersistentHashSet uses >>>>> PersistantHashMap internally). I realised that it was possible to >>>>> "splice" >>>>> together the internal structure of two hash maps into a single one >>>>> without >>>>> repeating most of the work required to build one from scratch. So, I had >>>>> a >>>>> go at implementing it: >>>>> >>>>> >>>>> user=> (time (def f (clojure.lang.PersistentHashSet/splice c d))) >>>>> "Elapsed time: 3052.690911 msecs" >>>>> #'user/f >>>>> >>>>> and: >>>>> >>>>> user=> (= e f) >>>>> true >>>>> >>>>> Whilst this is still adding 3 seconds to our time, that 3 seconds is >>>>> half the time that we would have added had we executed the second >>>>> reduction >>>>> in serial, rather than in parallel. >>>>> >>>>> This means that we can now reduce large datasets into sets/maps more >>>>> quickly in parallel than we can in serial :-) As an added benefit, >>>>> because >>>>> splice reuses as much of the internal structure of both inputs as >>>>> possible, >>>>> it's impact in terms of heap consumption and churn is less - although I >>>>> think that a full implementation might add some Java-side code complexity. >>>>> >>>>> If you would like to give 'splice' a try out, you will need to clone >>>>> my fork of clojure at github >>>>> >>>>> https://github.com/JulesGosnell/clojure >>>>> >>>>> Please let me know if you try out the code. I would be interested to >>>>> hear if people think it is worth pursuing. >>>>> >>>>> I was also thinking that it should be possible to use a similar trick >>>>> to quickly and cheaply split a map/set into [roughly] equal sized pieces >>>>> (assuming an good hash distribution). This would enable the use of a >>>>> map/set as an input sequence into the parallel reduction process outlined >>>>> above. Currently, I believe that only a vector can be used in this way. >>>>> It >>>>> would be harder to arrange that 'count' could be implemented efficiently >>>>> on >>>>> these sub-maps/sets, but this is not important for the reduction process. >>>>> >>>>> BTW - benchmarks were run on a 3.2ghz Phenom II / clojure/master / >>>>> openjdk-1.7.0_51 / Fedora 20 with min and max 4gb ram. >>>>> >>>>> regards, >>>>> >>>>> >>>>> >>>>> Jules >>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To post to this group, send email to clo...@googlegroups.com >>>> Note that posts from new members are moderated - please be patient with >>>> your first post. >>>> To unsubscribe from this group, send email to >>>> clojure+u...@googlegroups.com >>>> For more options, visit this group at >>>> http://groups.google.com/group/clojure?hl=en >>>> --- >>>> You received this message because you are subscribed to the Google >>>> Groups "Clojure" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to clojure+u...@googlegroups.com. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.