Hello. I've got a problem perhaps some of you have help with. I have an application that has to use fairly long queries (containing about 30 terms or'ed together) against an index of about 500K documents. Because of the limited vocabulary I'm indexing and querying over (~2000 terms), the size of the query, and the number of documents involved my search times are running a little long. I'd like to speed them up a little more if possible.
One approach I've tried is structuring the queries such that one (or more) of a subset of the entire 30 terms is required, the rest being optional, as in: +(term1 term2 term3 ... term10) term11 term12 term13 ... term30 this yielded a search time (on average) of about 50 msecs. I then assumed that if I reduced the size of the required set from 10 to 5, I would get fewer documents to score against and query performance would increase. So I tried something like this: +(term1 term2 term3 term4 term5) term6 term7 ... term30 To my surprise, the performance of the overall query didn't change (actually, it was slower, at about 63 msecs on average). My expectation about the way lucene would interpret and execute this query was apparently incorrect. The obvious answer here might be to use a filter for the first (required) clause and then query again using that filter for the other terms. The problem I forsee with that solution is that I can't easily re-use the filters because of the sheer number of combinations of terms and the need to re-open my readers/searchers every few minutes to expose the steady stream of updates to querying on a regular basis. As I understand it re-using a filter (rather than creating it, using it, and discarding it) is integral to it's value as a time saver and thus maybe not appropriate in this case. Any thoughts or advice would be appreciated. Many thanks in advance! Greg Conway Textwise Labs --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]