On Jun 24, 2:43 am, Christophe Grand <christo...@cgrand.net> wrote:
> On Wed, Jun 24, 2009 at 7:53 AM, Richard Newman <holyg...@gmail.com> wrote:
>
> > > Even with the optimization, sort somehow beats top for speed. It looks
> > > like top is best used to avoid major memory consumption for long seqs;
> > > if you have the memory and need the speed, sort's better.
>
> > This is an interesting characteristic I've noticed in sorting code. In
> > the past (I'm thinking of one Common Lisp app in particular) I've
> > spent time carefully crafting tree-based accumulators to collect
> > values into a sorted collection... only to find that it was slower
> > than just collecting the whole damn list and sorting it, even for very
> > large collections. Seems all those tree/map operations take time, and
> > library sort functions are fast!
>
> Then use sort!
>
> Prototype implementation:
> (defn top2 [n comparator coll]
>   (let [qtop #(take n (sort comparator (concat %1 %2)))]
>     (reduce qtop (partition (* 10 n) coll))))
>
> user=> (time (top2 10 #(> %1 %2) (range 5000000)))
> "Elapsed time: 2990.945916 msecs"
> (4999999 4999998 4999997 4999996 4999995 4999994 4999993 4999992 4999991
> 4999990)
> user=> (time (take 10 (sort #(> %1 %2) (range 5000000))))
> "Elapsed time: 4900.345365 msecs"
> (4999999 4999998 4999997 4999996 4999995 4999994 4999993 4999992 4999991
> 4999990)

Interesting. This sorts the top n and a block of new values, keeps the
top n, and repeats. But there's a magic number there, the 10 in the
last line.

Let's make it a parameter:

user=> (defn top2 [n magic comparator coll]
  (let [qtop #(take n (sort comparator (concat %1 %2)))]
    (reduce qtop (partition (* magic n) coll))))
#'user/top2

Now we need to time it while varying "magic".

Because "time" doesn't give programmatic access to the time results:

user=> (defmacro nanotime [& body] `(let [start (System/nanotime)]
~...@body (- (System/nanotime) start)))
#'user/nanotime

And here's the actual timing:

user=> (reduce (fn [m [k v]] (assoc m k v)) (sorted-map) (map (fn [x]
[(inc x) (nanotime (top2 10 (inc x) #(> %1 %2) (range 5000000)))])
(range 50)))
{1 5452225640, 2 3988404840, 3 3915592360, 4 3871197280, 5 3867781520,
6 3212079480, 7 3017036040, 8 2948082640, 9 3042258600, 10 3451045200,
11 3354897120, 12 3588940040, 13 3417661520, 14 3662480640, 15
3399276720, 16 3775287280, 17 3472468440, 18 3632485520, 19
3573744640, 20 5075479520, 21 3676971680, 22 3371000880, 23
3348059360, 24 3279114720, 25 3265615800, 26 3293525960, 27
3332391880, 28 3365864600, 29 3348332480, 30 3347164440, 31
3370046040, 32 3491169480, 33 3391363560, 34 3458141240, 35
3431893440, 36 3483074840, 37 3457540320, 38 3454273840, 39
3453719200, 40 3416257240, 41 3422167400, 42 3392571160, 43
3365732840, 44 3492066200, 45 3384547200, 46 3401731000, 47
3426350920, 48 3429235560, 49 3583296480, 50 3710430000}

Apparently, a magic factor of 8 gives the fastest time. The trend is
back upward from there.

It might not hold as the number to take varies away from 10, however.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to