On Jun 23, 4:25 am, Jules <julesjac...@gmail.com> wrote:
> Let N be the total number of elements in your collection (e.g.
> 1000,000), and n the number of elements that you want to take out of
> this collection (e.g 10).
>
> By sorting the collection of N elements you spend N log N time. By
> repeatedly adding an to the small collection and removing the minimum
> you spend N log n time (if you have a good sorted collection that
> takes log n time to insert).
>
> The algorithm spelled out in imperative pseudocode is:
>
> def top_n(n,coll)
>   let result = new SortedCollection(take n from coll)
>   for x in (drop n from coll):
>     result.add(x)
>     result.deleteMinimum() // adding 1 element and removing 1 keeps
> the result collection at size n
>   return result
>
> Hope this helps,
> Jules

Here's the functional version (working Clojure code):

(defn top [n comptr coll]
  (let [m (reduce #(assoc %1 %2 true) (sorted-map-by comptr)
            (take n coll))]
    (keys
      (reduce #(let [t (assoc %1 %2 true)]
                 (dissoc t (first (last t)))) m (drop n coll)))))

Why a map with useless values? There's no sorted-set-by in
clojure.core. :(

This is the result:

user=> (time (top 10 #(> %1 %2) (range 5000000)))
"Elapsed time: 15755.155 msecs"
(4999999 4999998 4999997 4999996 4999995 4999994 4999993 4999992
4999991 4999990)
user=> (time (take 10 (sort #(> %1 %2) (range 5000000))))
"Elapsed time: 7938.67752 msecs"
(4999999 4999998 4999997 4999996 4999995 4999994 4999993 4999992
4999991 4999990)

This is a worst-case sort, in a sense, since the sequence has to be
completely reversed. Curiously, take 10 sort is faster than top 10 by
a factor of 2. It seems sorting five million items all at once is
faster than five million sequential sorted inserts into a ten-item
tree.

So top is using about twice as much time, and one five-hundred-
thousandth as much memory. Lesson: use top when you don't want a big
lazy sequence residing in memory all at once and use sort when you
want speed. :)

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to