Re: Query processing with Lucene

Doron Cohen Tue, 08 Jan 2008 13:49:50 -0800

This is done by Lucene's scorers. You should however start
in http://lucene.apache.org/java/docs/scoring.html, - scorers
are described in the "Algorithm" section. "Offsets" are used
by Phrase Scorers and by Span Scorer.


Doron

On Jan 8, 2008 11:24 PM, Marjan Celikik < [EMAIL PROTECTED]> wrote:

> Doron Cohen wrote:
> > Hi Marjan,
> >
> > Lucene process the query in what can be called
> > one-doc-at-a-time.
> >
> > For the example query - x y - (not the phrase query "x y") - all
> > documents containing either x or y are considered a match.
> >
> > When processing the query - x y - the posting lists of these two
> > index terms are traversed, and for each document met on the way,
> > a score is computed (taking into account both terms), and "collected".
> > At the end of the traversal, usually best N collected docs are returned
> as
> > search result. So, this is an exhaustive computation creating a union of
> > the two posting. For the query - +x +y - in intersection rather than
> > union is required, and the way Lucene does it is again to traverse
> > the two posting lists, just that only documents seen in both lists
> > are scored and collected. This allows to optimize the search,
> > skipping large chunks of the posting lists, especially when
> > one term is rarer than the other.
> >
> Thank you for your answer.
>
> I am having trouble finding the function which traverses the documents
> such that they get scored. Can you
> please tell me where the posting lists (for a +x +y query) get
> intersected after they get read (by next() I guess)
> from the index?
>
> In particular, I am interested in how does Lucene get the new positions
> (offsets) of the documents seen
> in both posting lists, i.e. positions (in a document) for the query word
> x, and positions for the query word y.
>
> Thank you in advance!
>
> Marjan.
>

Re: Query processing with Lucene

Reply via email to