thanks! So it seems like lucene does a sequential read of both terms
information in the index and then skips around using docid iterators? Is it
able to push the skipping around into the index so it doesn't need to read
as much?

On Thu, Sep 25, 2008 at 10:56 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Thu, Sep 25, 2008 at 1:39 PM, David Lee <[EMAIL PROTECTED]> wrote:
> > I was wondering when lucene queries two or more terms, does that mean the
> > time it takes will be twice as long? For example if I search +lucene
> > +apache, then does lucene get all the documents that match 'lucene' and
> all
> > the documents that match 'apache', and then combine them together? Or can
> it
> > limit the amount of things it needs to retrieve from the index for
> 'apache'
> > based on what it has already retrieved for 'lucene'?
>
> Closer to the latter.  Lucene evaluates all clauses in parallel.  A
> document id iterator is created for each term, and they are all
> "skipped" to the highest id yet seen until all iterators are on the
> same id, which yields a match.  So an AND query of a rare term and a
> common term can be faster than the common term alone.
>
> > Is there documentation on how queries work in lucene in regards to how it
> > deals with the actual index files?
>
> TermScorer and ConjunctionScorer are the classes you would want to look at.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to