Re: Query processing with Lucene

Doron Cohen Tue, 08 Jan 2008 11:30:21 -0800

Hi Marjan,

Lucene process the query in what can be called
one-doc-at-a-time.

For the example query - x y - (not the phrase query "x y") - all
documents containing either x or y are considered a match.

When processing the query - x y - the posting lists of these two
index terms are traversed, and for each document met on the way,
a score is computed (taking into account both terms), and "collected".
At the end of the traversal, usually best N collected docs are returned as
search result. So, this is an exhaustive computation creating a union of
the two posting. For the query - +x +y - in intersection rather than
union is required, and the way Lucene does it is again to traverse
the two posting lists, just that only documents seen in both lists
are scored and collected. This allows to optimize the search,
skipping large chunks of the posting lists, especially when
one term is rarer than the other.

You can read more on Lucene scoring in Lucene's documentation,
http://lucene.apache.org/java/docs/scoring.html is a good starting
point,

HTH,
Doron

On Jan 6, 2008 2:13 PM, Marjan Celikik <[EMAIL PROTECTED]> wrote:

> Dear all,
>
> Maybe this topic is already discussed (then can I get a reference
> please?)... I would like to know how does Lucene actually process the
> query. For example, take a 2-word query "x y". Does Lucene fetch the
> lists of "x" and "y" and intersect them, or do they do something more
> fancy, for example, top-k techniques that try to avoid a full scan of
> the index lists for "x" and "y" ?
>
> Marjan.
>

Re: Query processing with Lucene

Reply via email to