Doron Cohen wrote:
Hi Marjan,
Lucene process the query in what can be called
one-doc-at-a-time.
For the example query - x y - (not the phrase query "x y") - all
documents containing either x or y are considered a match.
When processing the query - x y - the posting lists of these two
index terms are traversed, and for each document met on the way,
a score is computed (taking into account both terms), and "collected".
At the end of the traversal, usually best N collected docs are returned as
search result. So, this is an exhaustive computation creating a union of
the two posting. For the query - +x +y - in intersection rather than
union is required, and the way Lucene does it is again to traverse
the two posting lists, just that only documents seen in both lists
are scored and collected. This allows to optimize the search,
skipping large chunks of the posting lists, especially when
one term is rarer than the other.
Thank you for your answer.
I am having trouble finding the function which traverses the documents
such that they get scored. Can you
please tell me where the posting lists (for a +x +y query) get
intersected after they get read (by next() I guess)
from the index?
In particular, I am interested in how does Lucene get the new positions
(offsets) of the documents seen
in both posting lists, i.e. positions (in a document) for the query word
x, and positions for the query word y.
Thank you in advance!
Marjan.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]