On Mon, Jan 24, 2011 at 10:29 AM, Michael Gardner <gardne...@gmail.com> wrote: > Thanks for the work, Ken.
You're welcome. > Clojure's multi-threaded performance can be mysterious indeed I think that may be generally true of multi-processing of every kind. :) > Anyway, for now I will probably just stick with pmap + shuffle, since it's > dead-simple > and should be enough my purposes. Probably a good idea, if keeping the input in order is not important. If you do need to keep the ordering, you can still do the items in a somewhat perturbed order, if you're willing to give up some laziness: (defn permute "Applies a permutation to a vector. Permutation should be a vector of same length with (range n) in some shuffled order. If v is shorter than perm, just returns v." [perm v] (if (< (count v) (count perm)) v (vec (map #(nth v %) perm)))) (defn i-perm "Inverts a permutation." [perm] (let [r (range (count perm))] (persistent! (reduce (fn [out [pi ix]] (assoc! out pi ix)) (transient (vec r)) (map vector perm r))))) (defn shuffle "Shuffles a vector" [v] (let [s (count v)] (loop [out (transient v) n (dec s)] (if (zero? n) (persistent! out) (let [i (rand-int (inc n)) t (out n)] (recur (assoc! (assoc! out n (out i)) i t) (dec n))))))) Test that shuffle works: user=> (frequencies (map shuffle (take 500 (repeat [1 2 3])))) {[3 1 2] 88, [2 3 1] 91, [1 2 3] 83, [2 1 3] 84, [3 2 1] 82, [1 3 2] 72} All six possible permutations occur, at reasonably similar rates. (defn jumbleize "Returns a pair of functions one of which will jumble and one of which will unjumble a seq; the seq is chunked into parts of size n and those parts are jumbled. The functions are lazy in that only one chunk is realized at a time." [n] (let [rn (vec (range n)) perms (repeatedly #(shuffle rn)) ips (map i-perm perms) jumble (fn [s perm-seq] (mapcat permute perm-seq (map vec (partition n n [] s))))] [#(jumble % perms) #(jumble % ips)])) (defn jumbled-pmap "Like pmap, only the order in which it sees sequence elements is jumbled somewhat, which may improve efficiency if easy and difficult jobs have a regular arrangement in the input. Chunks the input by chunk-size and jumbles each chunk internally before calling pmap; puts the output back in order." [chunk-size f & colls] (let [[jumbler unjumbler] (jumbleize chunk-size)] (unjumbler (apply pmap f (map jumbler colls))))) user=> (drop 97 (jumbled-pmap 16 #(* % %) (range 100))) (9409 9604 9801) user=> (take 3 (drop 50 (jumbled-pmap 16 #(* % %) (range 100)))) (2500 2601 2704) user=> (take 3 (drop 50 (jumbled-pmap 16 #(+ (* 1000 %1) %2) (range 100) (range 100 180)))) (50150 51151 52152) That last, BTW, shows it working properly even with multiple input colls of different lengths. user=> (take 3 (drop 50 (jumbled-pmap 16 #(+ (* 1000 %1) %2) (iterate inc 0) (drop 100 (iterate inc 0))))) (50150 51151 52152) And this shows that it's still lazy enough not to blow up on infinite inputs. :) The internal sequences of random permutations and their inverses, as well as the input sequences, lose their heads: user=> (take 3 (drop 50000000 (jumbled-pmap 16 #(+ (* 1000 %1) %2) (iterate inc 0) (drop 100 (iterate inc 0))))) (50050000100 50050001101 50050002102) Memory use stayed constant at a bit more than 64MB, which was probably the -Xms parameter to the JVM instance Enclojure ran its REPL in. -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en