Let N be the total number of elements in your collection (e.g. 1000,000), and n the number of elements that you want to take out of this collection (e.g 10).
By sorting the collection of N elements you spend N log N time. By repeatedly adding an to the small collection and removing the minimum you spend N log n time (if you have a good sorted collection that takes log n time to insert). The algorithm spelled out in imperative pseudocode is: def top_n(n,coll) let result = new SortedCollection(take n from coll) for x in (drop n from coll): result.add(x) result.deleteMinimum() // adding 1 element and removing 1 keeps the result collection at size n return result Hope this helps, Jules On Jun 23, 10:14 am, Daniel Lyons <fus...@storytotell.org> wrote: > On Jun 23, 2009, at 1:47 AM, Christophe Grand wrote: > > > > > > > I don't know if it has an official name but basically it's a > > modified tree-sort: for each item you insert a value in a sorted > > coll of size N and remove one item from this sorted coll, both ops > > are O(sqrt(N)) thus O(n*sqrt(N)) for processing an input whose > > length is n. > > > I'm away from a REPL but I had something like that in mind: > > (defn top-n [coll n] > > (let [add #(assoc %1 %2 (inc (%1 %2 0)) > > prune-min #(let [[[min n]] (rseq %)] (if (== 1 n) (dissoc % > > min) (assoc % (dec n)))) > > seed (reduce add sorted-map (take n coll))] > > (reduce (comp prune-min add) seed (drop n coll)))) > > > (top 3 [5 4 3 5 8 2 7]) ; should return {5 1, 7 1, 8 1} > > ; but > > (top 3 [5 4 3 5 8 2]) ; should return {5 2, 8 1} > > I have to admit I don't really understand your code, so my apologies > if I've missed something obvious. > > I think if you consider each element of N and do an operation that > costs sqrt(N) with it, you'd arrive at O(N*sqrt(N)), which I think > would be worse than O(N log N) which is what you'd get from just > sorting the whole structure and taking the top n items. Is it really > sqrt(N)? I'm under the impression insertion into a balanced binary > tree is O(log(N)), but wouldn't you still need to have N of them? This > algorithm reminds me of heap sort. > > — > Daniel Lyons --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en -~----------~----~----~----~------~----~------~--~---