John Patterson wrote:


Yonik Seeley wrote:
On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote:
Most things in an inverted index are sorted (terms, matching document
ids, term positions within a field, etc).  Can you be more specific
about what you are trying to accomplish?


Sorry, I mean sorting the documents in an order other than the order they
are added.  The my search could just return docs in index order.  For the
most common sorting I could collect only the first x docs and then
short-circuit the search like we previously discussed.

These questions already have an answer in Nutch (see the org.apache.nutch.indexer.IndexSorter, and org.apache.nutch.searcher.LuceneQueryOptimizer$LimitedCollector).


I was wondering if it is possible to apply a sort at merge time?

One method that I'm familiar with is the following: you can split the result set into several large-ish bins, and apply arbitrary sorting methods within each bin. Studies show that if you pick the right bin size, users will rarely look into the second and the following bins, so the task is reduced to the sorting of the first bin, e.g. 100 top scoring docs.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to