2500 vs 84. Wow. That's quite a few OR statements I would be saving
following your guide of just indexing the parts of the datetime I plan
to search on. Every ms count.
Now I have a clear picture of how range query works. Great stuff. Thanks.
Btw, coming from a db background I'm so used to writing queries in the
fashion where I put the most distinct comparison statement, the one
likely to return the least number of rows, first, in the where
statement. Db can still be pretty dumb with bad statistics and choose
the wrong execution plan so I like optimize for them when all possible
and force the issue.
If I have a sample lucene query:
"+a:abc +b:cde +d:bbd +date:[2001 TO 2005] -e:noway"
Does Lucene's execution engine try to figure out via statistics,
guesstimate, which path to take first? Or does it just go brute force
and follow the execution plan from left to right? Or does it just do all
of them individually, not executing the next search on the results of
the prior, and then ORing them at the end?
Xing
Erik Hatcher wrote:
On Jul 11, 2005, at 1:45 AM, [EMAIL PROTECTED] wrote:
Did a google serach on the problem when using the range search phrase
of "+datefield:[199801 TO 200512]" (date stored as "YYYYMMDD") which
returns 1 million hits.
error: org.apache.lucene.search.BooleanQuery$TooManyClauses
Adding "-Dorg.apache.lucene.maxClauseCount=2400" to java option
allowed the search query to run without error. The actual value
needed is between 2300 and 2400. At 2300 the query fails.
My question is how does Lucene perform range query? As a bunch of
smaller boolean queries? How does one estimate the number of clauses
required for a general query and more specifically on a range query?
RangeQuery expands under the covers to a BooleanQuery with all matching
terms OR'd together.
In your case, if you've indexed a term for every day in that range
using YYYYMMDD then you've got 2,524 terms roughly = 7 * 365 - 31
(minus 31 because you'd omit December '05 since you are only going to
200512). If all you need is YYYYMM range searching, then index it as
that (that'd be 7 years * 12 months/year = 84 terms).
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]