David Smiley created LUCENE-4418:
------------------------------------
Summary: Improve RecursivePrefixTreeFilter's performance heuristic
tunables
Key: LUCENE-4418
URL: https://issues.apache.org/jira/browse/LUCENE-4418
Project: Lucene - Core
Issue Type: Improvement
Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor
RecursivePrefixTreeFilter recursively decomposes grid cells until it gets to a
threshold grid level (e.g. 4 away from max levels), at which point it does a
brute force scan because it's faster once the number of terms is smaller. So
if max levels is 10, then if the threshold is 4 then it will switch to scanning
at 6. Ideally, the filter would know exactly how many terms there are in that
grid -- i.e. given a hi & lo term, determine how many indexed terms are
in-between without actually iterating to find out.
Instead, it could use the # docs that a grid cell has as a heuristic. It's not
perfect but I think its much better because it's dynamic based on density of
actual indexed data. It's not perfect because many documents could refer to
the same indexed point, or few documents with multi-valued data could refer to
many indexed points.
Before I do this, I need to re-invigorate my testing efforts so I can come up
with a default threshold. And it's also dependent on things like query shape
complexity.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]