Hi Marjan, Lucene process the query in what can be called one-doc-at-a-time.
For the example query - x y - (not the phrase query "x y") - all documents containing either x or y are considered a match. When processing the query - x y - the posting lists of these two index terms are traversed, and for each document met on the way, a score is computed (taking into account both terms), and "collected". At the end of the traversal, usually best N collected docs are returned as search result. So, this is an exhaustive computation creating a union of the two posting. For the query - +x +y - in intersection rather than union is required, and the way Lucene does it is again to traverse the two posting lists, just that only documents seen in both lists are scored and collected. This allows to optimize the search, skipping large chunks of the posting lists, especially when one term is rarer than the other. You can read more on Lucene scoring in Lucene's documentation, http://lucene.apache.org/java/docs/scoring.html is a good starting point, HTH, Doron On Jan 6, 2008 2:13 PM, Marjan Celikik <[EMAIL PROTECTED]> wrote: > Dear all, > > Maybe this topic is already discussed (then can I get a reference > please?)... I would like to know how does Lucene actually process the > query. For example, take a 2-word query "x y". Does Lucene fetch the > lists of "x" and "y" and intersect them, or do they do something more > fancy, for example, top-k techniques that try to avoid a full scan of > the index lists for "x" and "y" ? > > Marjan. >