Re: Index statistics

Michael McCandless Tue, 05 Jul 2011 12:39:39 -0700

This API doesn't exist today.

Lucene has long needed for queries impls to do this, so that we can
properly plan/optimize how the query is run.  EG an AND query would
use this to pick the more restrictive clause to drive the
intersection.

For TermQuery you could just call IR.docFreq?  (Doesn't take deletions
into account so it'll always be an upper bound).

For other queries... you could pull the scorer, iterate over some
number of docs, and then "guestimate" based on what docID you got up
to vs how many docs you asked for, how many matches there would be for
the full index?  This would assume matches are uniformly distributed
throughout the index (eg, that docs are indexed in random order) which
is definitely not the case typically in practice.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jul 5, 2011 at 2:19 PM, Andres Taylor
<andres.tay...@neotechnology.com> wrote:
> Hi there,
>
> A work with Neo4j <http://neo4j.org/>, a NoSQL graph
> database tightly coupled with Lucene. I am now working on an optimizing
> execution engine. To do this well, I would like to know more about the
> existing Lucene indices. Ideally, I'd like to be able to ask a Lucene index
> how many hits a query might give me, before I actually run the query. The
> answer will probably just be an estimation, but that's fine.
>
> Is this possible today?
>
> Best regards,
>
> Andrés
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Index statistics

Reply via email to