John Patterson wrote:
Yonik Seeley wrote:
On 10/26/07, John Patterson <[EMAIL PROTECTED]> wrote:
Most things in an inverted index are sorted (terms, matching document
ids, term positions within a field, etc). Can you be more specific
about what you are trying to accomplish?
Sorry, I mean sorting the documents in an order other than the order they
are added. The my search could just return docs in index order. For the
most common sorting I could collect only the first x docs and then
short-circuit the search like we previously discussed.
These questions already have an answer in Nutch (see the
org.apache.nutch.indexer.IndexSorter, and
org.apache.nutch.searcher.LuceneQueryOptimizer$LimitedCollector).
I was wondering if it is possible to apply a sort at merge time?
One method that I'm familiar with is the following: you can split the
result set into several large-ish bins, and apply arbitrary sorting
methods within each bin. Studies show that if you pick the right bin
size, users will rarely look into the second and the following bins, so
the task is reduced to the sorting of the first bin, e.g. 100 top
scoring docs.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]