Re: Query processing with Lucene

Marjan Celikik Tue, 08 Jan 2008 13:24:47 -0800

Doron Cohen wrote:

Hi Marjan,


Lucene process the query in what can be called
one-doc-at-a-time.

For the example query - x y - (not the phrase query "x y") - all
documents containing either x or y are considered a match.

When processing the query - x y - the posting lists of these two
index terms are traversed, and for each document met on the way,
a score is computed (taking into account both terms), and "collected".
At the end of the traversal, usually best N collected docs are returned as
search result. So, this is an exhaustive computation creating a union of
the two posting. For the query - +x +y - in intersection rather than
union is required, and the way Lucene does it is again to traverse
the two posting lists, just that only documents seen in both lists
are scored and collected. This allows to optimize the search,
skipping large chunks of the posting lists, especially when
one term is rarer than the other.

Thank you for your answer.

I am having trouble finding the function which traverses the documentssuch that they get scored. Can youplease tell me where the posting lists (for a +x +y query) getintersected after they get read (by next() I guess)

from the index?

In particular, I am interested in how does Lucene get the new positions(offsets) of the documents seenin both posting lists, i.e. positions (in a document) for the query wordx, and positions for the query word y.


Thank you in advance!

Marjan.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Query processing with Lucene

Reply via email to