Hello.  I've got a problem perhaps some of you have  help with.

I have an application that has to use fairly long queries (containing about 30 
terms or'ed together) against an index of about 500K documents.  Because of the 
limited vocabulary I'm indexing and querying over (~2000 terms), the size of 
the query, and the number of documents involved my search times are running a 
little long.  I'd like to speed them up a little more if possible.

One approach I've tried is structuring the queries such that one (or more) of a 
subset of the entire 30 terms is required, the rest being optional, as in:

+(term1 term2 term3 ... term10) term11 term12 term13 ... term30

this yielded a search time (on average) of about 50 msecs.

I then assumed that if I reduced the size of the required set from 10 to 5, I 
would get fewer documents to score against and query performance would 
increase.  So I tried something like this:

+(term1 term2 term3 term4 term5) term6 term7 ... term30

To my surprise, the performance of the overall query didn't change (actually, 
it was slower, at about 63 msecs on average).   My expectation about the way 
lucene would interpret and execute this query was apparently incorrect.   

The obvious answer here might be to use a filter for the first (required) 
clause and then query again using that filter for the other  terms.  The 
problem I forsee with that solution is that I can't easily re-use the filters 
because of the sheer number of combinations of terms and the need to re-open my 
readers/searchers every few minutes to expose the steady stream of updates to 
querying on a regular basis.  As I understand it re-using a filter (rather than 
creating it, using it, and discarding it) is integral to it's value as a time 
saver and thus maybe not appropriate in this case.

Any thoughts or advice would be appreciated.  Many thanks in advance!

Greg Conway
Textwise Labs






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to