[ 
https://issues.apache.org/jira/browse/LUCENE-6547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610054#comment-14610054
 ] 

Michael McCandless commented on LUCENE-6547:
--------------------------------------------

Thanks Nick!

bq. Addresses LUCENE-6562 to share ranges across segments. To guard against OOM 
exceptions the ranges needed to be purged once all segments have been visited. 
See GeoPointTermQuery line 87 for this check.

OK, it looks like this patch also absorbed the latest patch on
LUCENE-6562?  It's clever how you clear ranges after the last segment!

Unfortunately I think it's unsafe ... e.g. if IndexSearcher is using
an executor, multiple threads can call getTermsEnum I think ... and
also, in the cached case, on opening a new NRT reader, I think the
query would only see e.g. the 1 or 2 new segments.

Net/net I'm worried about the complexity of getting this working
correctly; I think for now it's best/simpler to just re-compute the ranges
per segment?

We should anyway try to keep the issue separate from this one...

{quote}
bq. Can we remove the separate bbox from GeoPointInPolygonQuery's ctor?

Yes! I believe so. The intent was to improve performance for detailed polygons 
(those with many vertices) by providing precomputed or cached bounding boxes 
rather than having the query recompute. Might be worth benchmarking to be 
certain whether we want to take away this utility. Sure it can be used 
maliciously, maybe good documentation can raise awareness?
{quote}

But that cost will be miniscule compared to the overall query execution time 
right?

E.g. to filter just a single "borderline" hit, we also must walk the full 
polygon?  Multiplied by the number of borderline hits ...

It just strikes me as such a trivial optimization that it's not worth polluting 
the API over ... APIs are hard enough as it is :)  And e.g. I've already 
screwed it up at least once when I passed an invalid bbox before (when working 
on the test).

{quote}
Large ranges were previously discouraged because it takes on the order of 
~2secs to correctly compute the ranges for large search areas. Since this 
occurred for every segment large search over large data was quite time 
consuming. Reusing ranges across segments has brought this down to the amount 
of time it takes to compute the ranges.
{quote}

OK that's a good improvement but I think large ranges are still very dangerous 
for this query?  E.g. the query will suddenly take ... 10s of MBs heap to hold 
its ranges as well?  Maybe (separately!) we could improve how we store the 
ranges in RAM, e.g. use two PrefixCodedTerms or something.

I think, like AutomatonQuery, we need to place a default limit on the number of 
ranges?  User can increase it if they know what they are doing...

I put this nocommit in my last patch:

bq. // nocommit why on earth are we needing to pass lat1/lon1 here :)  the 
query doesn't get these...

Any idea how to fix?  maxLat/maxLon cannot possibly matter here: they are 
randomly generated and not used by the query, yet they are used by 
radiusQueryCanBeWrong.  This must be a test bug?


> Add dateline crossing support to GeoPointInBBox and GeoPointDistance Queries
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-6547
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6547
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch, 
> LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch, LUCENE-6547.patch, 
> LUCENE-6547.patch
>
>
> The current GeoPointInBBoxQuery only supports bounding boxes that are within 
> the standard -180:180 longitudinal bounds. While its perfectly fine to 
> require users to split dateline crossing bounding boxes in two, 
> GeoPointDistanceQuery should support distance queries that cross the 
> dateline.  Since morton encoding doesn't support unwinding this issue will 
> add dateline crossing to GeoPointInBBoxQuery and GeoPointDistanceQuery 
> classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to