[
https://issues.apache.org/jira/browse/LUCENE-6547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610054#comment-14610054
]
Michael McCandless commented on LUCENE-6547:
--------------------------------------------
Thanks Nick!
bq. Addresses LUCENE-6562 to share ranges across segments. To guard against OOM
exceptions the ranges needed to be purged once all segments have been visited.
See GeoPointTermQuery line 87 for this check.
OK, it looks like this patch also absorbed the latest patch on
LUCENE-6562? It's clever how you clear ranges after the last segment!
Unfortunately I think it's unsafe ... e.g. if IndexSearcher is using
an executor, multiple threads can call getTermsEnum I think ... and
also, in the cached case, on opening a new NRT reader, I think the
query would only see e.g. the 1 or 2 new segments.
Net/net I'm worried about the complexity of getting this working
correctly; I think for now it's best/simpler to just re-compute the ranges
per segment?
We should anyway try to keep the issue separate from this one...
{quote}
bq. Can we remove the separate bbox from GeoPointInPolygonQuery's ctor?
Yes! I believe so. The intent was to improve performance for detailed polygons
(those with many vertices) by providing precomputed or cached bounding boxes
rather than having the query recompute. Might be worth benchmarking to be
certain whether we want to take away this utility. Sure it can be used
maliciously, maybe good documentation can raise awareness?
{quote}
But that cost will be miniscule compared to the overall query execution time
right?
E.g. to filter just a single "borderline" hit, we also must walk the full
polygon? Multiplied by the number of borderline hits ...
It just strikes me as such a trivial optimization that it's not worth polluting
the API over ... APIs are hard enough as it is :) And e.g. I've already
screwed it up at least once when I passed an invalid bbox before (when working
on the test).
{quote}
Large ranges were previously discouraged because it takes on the order of
~2secs to correctly compute the ranges for large search areas. Since this
occurred for every segment large search over large data was quite time
consuming. Reusing ranges across segments has brought this down to the amount
of time it takes to compute the ranges.
{quote}
OK that's a good improvement but I think large ranges are still very dangerous
for this query? E.g. the query will suddenly take ... 10s of MBs heap to hold
its ranges as well? Maybe (separately!) we could improve how we store the
ranges in RAM, e.g. use two PrefixCodedTerms or something.
I think, like AutomatonQuery, we need to place a default limit on the number of
ranges? User can increase it if they know what they are doing...
I put this nocommit in my last patch:
bq. // nocommit why on earth are we needing to pass lat1/lon1 here :) the
query doesn't get these...
Any idea how to fix? maxLat/maxLon cannot possibly matter here: they are
randomly generated and not used by the query, yet they are used by
radiusQueryCanBeWrong. This must be a test bug?
> Add dateline crossing support to GeoPointInBBox and GeoPointDistance Queries
> ----------------------------------------------------------------------------
>
> Key: LUCENE-6547
> URL: https://issues.apache.org/jira/browse/LUCENE-6547
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Nicholas Knize
> Attachments: LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch,
> LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch,
> LUCENE-6547.patch
>
>
> The current GeoPointInBBoxQuery only supports bounding boxes that are within
> the standard -180:180 longitudinal bounds. While its perfectly fine to
> require users to split dateline crossing bounding boxes in two,
> GeoPointDistanceQuery should support distance queries that cross the
> dateline. Since morton encoding doesn't support unwinding this issue will
> add dateline crossing to GeoPointInBBoxQuery and GeoPointDistanceQuery
> classes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]