[
https://issues.apache.org/jira/browse/LUCENE-7179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226899#comment-15226899
]
Robert Muir commented on LUCENE-7179:
-------------------------------------
{quote}
Because its a 32 bit space, though, data truncation is inevitable.
{quote}
No, we are choosing to truncate the *user's data*. This is something to be
taken seriously.
I care very much about the details about exactly how this truncation happens,
that is what I keep bringing up on this issue:
* stability
* rounding
* overflow
GeoPoint has a much slower bounding box query than LatLonPoint because it can't
take advantage of *exactly how its truncation happens* for these silly reasons.
If these were fixed, this query would be faster.
I just commented on the port of distance sort (LUCENE-7180) about how important
integer space bounding box is to reduce cpu in compareBottom.
And you can see speedups in LUCENE-7177 for GeoPoint's polygon queries which
are based on operations in integer space (i had to incorporate significant hair
to accomodate the current untamed quantization).
So there is 3 use cases right now on the table for why we should fix this:
faster bounding box, sorting, polygon queries. And code can still be simple and
the quantization "effect" is easier to reason about: e.g. fixing rounding means
the 'double' we treat it as is always the closest double in our integer space
that is <= the user's value, instead of "rounded half-hazardly in an unknown
direction". It just requires we pay attention and fix the bugs and write good
tests.
> GeoPoint and LatLonPoint test data should quantize once
> -------------------------------------------------------
>
> Key: LUCENE-7179
> URL: https://issues.apache.org/jira/browse/LUCENE-7179
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Nicholas Knize
> Attachments: LUCENE-7179.patch
>
>
> {{LatLonPoint}} and {{GeoPointField}} tests pre quantizes test data to ensure
> consistency with indexed (encoded) data. The pre quantized data then becomes
> indexed, undergoing another quantization. To guarantee numerical stability
> this should be changed such that the test data is quantized after indexing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]