Let N be the total number of elements in your collection (e.g.
1000,000), and n the number of elements that you want to take out of
this collection (e.g 10).

By sorting the collection of N elements you spend N log N time. By
repeatedly adding an to the small collection and removing the minimum
you spend N log n time (if you have a good sorted collection that
takes log n time to insert).

The algorithm spelled out in imperative pseudocode is:

def top_n(n,coll)
  let result = new SortedCollection(take n from coll)
  for x in (drop n from coll):
    result.add(x)
    result.deleteMinimum() // adding 1 element and removing 1 keeps
the result collection at size n
  return result

Hope this helps,
Jules

On Jun 23, 10:14 am, Daniel Lyons <fus...@storytotell.org> wrote:
> On Jun 23, 2009, at 1:47 AM, Christophe Grand wrote:
>
>
>
>
>
> > I don't know if it has an official name but basically it's a  
> > modified tree-sort: for each item you insert a value in a sorted  
> > coll of size N and remove one item from this sorted coll, both ops  
> > are O(sqrt(N)) thus O(n*sqrt(N)) for processing an input whose  
> > length is n.
>
> > I'm away from a REPL but I had something like that in mind:
> > (defn top-n [coll n]
> >   (let [add #(assoc %1 %2 (inc (%1 %2 0))
> >          prune-min #(let [[[min n]] (rseq %)] (if (== 1 n) (dissoc %  
> > min) (assoc % (dec n))))
> >          seed (reduce add sorted-map (take n coll))]
> >     (reduce (comp prune-min add) seed (drop n coll))))
>
> > (top 3 [5 4 3 5 8 2 7]) ; should return {5 1, 7 1, 8 1}
> > ; but
> > (top 3 [5 4 3 5 8 2]) ; should return {5 2, 8 1}
>
> I have to admit I don't really understand your code, so my apologies  
> if I've missed something obvious.
>
> I think if you consider each element of N and do an operation that  
> costs sqrt(N) with it, you'd arrive at O(N*sqrt(N)), which I think  
> would be worse than O(N log N) which is what you'd get from just  
> sorting the whole structure and taking the top n items. Is it really  
> sqrt(N)? I'm under the impression insertion into a balanced binary  
> tree is O(log(N)), but wouldn't you still need to have N of them? This  
> algorithm reminds me of heap sort.
>
> —
> Daniel Lyons
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to