[
https://issues.apache.org/jira/browse/LUCENE-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-5418:
---------------------------------------
Attachment: LUCENE-5418.patch
Patch, I think it's ready.
I simplified DrillSideways by removing the collector method and cutting over to
DISI (not DocsEnum), and then using Bits.get from a Filter when it supports
random access. I also cutover DrillDownQuery to FilteredQuery.
> Don't use .advance on costly (e.g. distance range facets) filters
> -----------------------------------------------------------------
>
> Key: LUCENE-5418
> URL: https://issues.apache.org/jira/browse/LUCENE-5418
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Fix For: 5.0, 4.7
>
> Attachments: LUCENE-5418.patch
>
>
> If you use a distance filter today (see
> http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html
> ), then drill down on one of those ranges, under the hood Lucene is using
> .advance on the Filter, which is very costly because we end up computing
> distance on (possibly many) hits that don't match the query.
> It's better performance to find the hits matching the Query first, and then
> check the filter.
> FilteredQuery can already do this today, when you use its
> QUERY_FIRST_FILTER_STRATEGY. This essentially accomplishes the same thing as
> Solr's "post filters" (I think?) but with a far simpler/better/less code
> approach.
> E.g., I believe ElasticSearch uses this API when it applies costly filters.
> Longish term, I think Query/Filter ought to know itself that it's expensive,
> and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g.
> ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's
> passed to IndexSearcher.search, we should also be "smart" here and not call
> .advance on such clauses. But that'd be a biggish change ... so for today
> the "workaround" is the user must carefully construct the FilteredQuery
> themselves.
> In the mean time, as another workaround, I want to fix DrillSideways so that
> when you drill down on such filters it doesn't use .advance; this should give
> a good speedup for the "normal path" API usage with a costly filter.
> I'm iterating on the lucene server branch (LUCENE-5376) but once it's working
> I plan to merge this back to trunk / 4.7.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]