This is done by Lucene's scorers. You should however start in http://lucene.apache.org/java/docs/scoring.html, - scorers are described in the "Algorithm" section. "Offsets" are used by Phrase Scorers and by Span Scorer.
Doron On Jan 8, 2008 11:24 PM, Marjan Celikik < [EMAIL PROTECTED]> wrote: > Doron Cohen wrote: > > Hi Marjan, > > > > Lucene process the query in what can be called > > one-doc-at-a-time. > > > > For the example query - x y - (not the phrase query "x y") - all > > documents containing either x or y are considered a match. > > > > When processing the query - x y - the posting lists of these two > > index terms are traversed, and for each document met on the way, > > a score is computed (taking into account both terms), and "collected". > > At the end of the traversal, usually best N collected docs are returned > as > > search result. So, this is an exhaustive computation creating a union of > > the two posting. For the query - +x +y - in intersection rather than > > union is required, and the way Lucene does it is again to traverse > > the two posting lists, just that only documents seen in both lists > > are scored and collected. This allows to optimize the search, > > skipping large chunks of the posting lists, especially when > > one term is rarer than the other. > > > Thank you for your answer. > > I am having trouble finding the function which traverses the documents > such that they get scored. Can you > please tell me where the posting lists (for a +x +y query) get > intersected after they get read (by next() I guess) > from the index? > > In particular, I am interested in how does Lucene get the new positions > (offsets) of the documents seen > in both posting lists, i.e. positions (in a document) for the query word > x, and positions for the query word y. > > Thank you in advance! > > Marjan. >