Re: Sorting posting lists before intersection

Andrzej Bialecki Mon, 13 Oct 2008 08:00:50 -0700

Renaud Delbru wrote:

Hi Andrzej,
sorry for the late reply.
I have looked at the code. As far as I understand, you sort the postinglists based on the first doc skip. The first posting list will be theone who have the first biggest document skip.Do the sparseness of posting lists is a good predictor for sampling andordering posting lists ? Do you know evaluation of such technique ?

It is _some_ predictor ... :) whether it's a good one is anotherquestion. It's certainly very inexpensive - we don't do any additionalIO except what we have to do anyway, which is scorer.skipTo().

In general case it's costly to calculate the frequency (or sparseness)of matches in a scorer without actually running the scorer through allits matches.

In order to implement sorting based on frequency, we need the documentfrequency of each term. This information should be propagated throughthe Scorer classes (from TermScorer to higher level class such asConjunctiveScorer). This will require a call toIndexReader.docFreq(term) for each of the term queries. Is docFreq callmean another IO access ?

It sounds like you plan to order scorers by term frequency ... but ingeneral case they won't all be TermScorers, so the frequency ofdocuments matching a scorer won't have any particular connection to asingle term freq.

Answering your question: docFreq call uses TermInfo information, whichuses a small RAM cache. If you're lucky then it won't cause any IO,otherwise it needs to read this info from the .ti file.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Sorting posting lists before intersection

Reply via email to