On Jun 24, 2:43 am, Christophe Grand <christo...@cgrand.net> wrote: > On Wed, Jun 24, 2009 at 7:53 AM, Richard Newman <holyg...@gmail.com> wrote: > > > > Even with the optimization, sort somehow beats top for speed. It looks > > > like top is best used to avoid major memory consumption for long seqs; > > > if you have the memory and need the speed, sort's better. > > > This is an interesting characteristic I've noticed in sorting code. In > > the past (I'm thinking of one Common Lisp app in particular) I've > > spent time carefully crafting tree-based accumulators to collect > > values into a sorted collection... only to find that it was slower > > than just collecting the whole damn list and sorting it, even for very > > large collections. Seems all those tree/map operations take time, and > > library sort functions are fast! > > Then use sort! > > Prototype implementation: > (defn top2 [n comparator coll] > (let [qtop #(take n (sort comparator (concat %1 %2)))] > (reduce qtop (partition (* 10 n) coll)))) > > user=> (time (top2 10 #(> %1 %2) (range 5000000))) > "Elapsed time: 2990.945916 msecs" > (4999999 4999998 4999997 4999996 4999995 4999994 4999993 4999992 4999991 > 4999990) > user=> (time (take 10 (sort #(> %1 %2) (range 5000000)))) > "Elapsed time: 4900.345365 msecs" > (4999999 4999998 4999997 4999996 4999995 4999994 4999993 4999992 4999991 > 4999990)
Interesting. This sorts the top n and a block of new values, keeps the top n, and repeats. But there's a magic number there, the 10 in the last line. Let's make it a parameter: user=> (defn top2 [n magic comparator coll] (let [qtop #(take n (sort comparator (concat %1 %2)))] (reduce qtop (partition (* magic n) coll)))) #'user/top2 Now we need to time it while varying "magic". Because "time" doesn't give programmatic access to the time results: user=> (defmacro nanotime [& body] `(let [start (System/nanotime)] ~...@body (- (System/nanotime) start))) #'user/nanotime And here's the actual timing: user=> (reduce (fn [m [k v]] (assoc m k v)) (sorted-map) (map (fn [x] [(inc x) (nanotime (top2 10 (inc x) #(> %1 %2) (range 5000000)))]) (range 50))) {1 5452225640, 2 3988404840, 3 3915592360, 4 3871197280, 5 3867781520, 6 3212079480, 7 3017036040, 8 2948082640, 9 3042258600, 10 3451045200, 11 3354897120, 12 3588940040, 13 3417661520, 14 3662480640, 15 3399276720, 16 3775287280, 17 3472468440, 18 3632485520, 19 3573744640, 20 5075479520, 21 3676971680, 22 3371000880, 23 3348059360, 24 3279114720, 25 3265615800, 26 3293525960, 27 3332391880, 28 3365864600, 29 3348332480, 30 3347164440, 31 3370046040, 32 3491169480, 33 3391363560, 34 3458141240, 35 3431893440, 36 3483074840, 37 3457540320, 38 3454273840, 39 3453719200, 40 3416257240, 41 3422167400, 42 3392571160, 43 3365732840, 44 3492066200, 45 3384547200, 46 3401731000, 47 3426350920, 48 3429235560, 49 3583296480, 50 3710430000} Apparently, a magic factor of 8 gives the fastest time. The trend is back upward from there. It might not hold as the number to take varies away from 10, however. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---