[ 
https://issues.apache.org/jira/browse/LUCENE-7179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15226899#comment-15226899
 ] 

Robert Muir commented on LUCENE-7179:
-------------------------------------

{quote}
Because its a 32 bit space, though, data truncation is inevitable.
{quote}

No, we are choosing to truncate the *user's data*. This is something to be 
taken seriously. 

I care very much about the details about exactly how this truncation happens, 
that is what I keep bringing up on this issue:
* stability
* rounding
* overflow

GeoPoint has a much slower bounding box query than LatLonPoint because it can't 
take advantage of *exactly how its truncation happens* for these silly reasons. 
If these were fixed, this query would be faster.

I just commented on the port of distance sort (LUCENE-7180) about how important 
integer space bounding box is to reduce cpu in compareBottom.

And you can see speedups in LUCENE-7177 for GeoPoint's polygon queries which 
are based on operations in integer space (i had to incorporate significant hair 
to accomodate the current untamed quantization).

So there is 3 use cases right now on the table for why we should fix this: 
faster bounding box, sorting, polygon queries. And code can still be simple and 
the quantization "effect" is easier to reason about: e.g. fixing rounding means 
the 'double' we treat it as is always the closest double in our integer space 
that is <= the user's value, instead of "rounded half-hazardly in an unknown 
direction". It just requires we pay attention and fix the bugs and write good 
tests.


> GeoPoint and LatLonPoint test data should quantize once
> -------------------------------------------------------
>
>                 Key: LUCENE-7179
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7179
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Nicholas Knize
>         Attachments: LUCENE-7179.patch
>
>
> {{LatLonPoint}} and {{GeoPointField}} tests pre quantizes test data to ensure 
> consistency with indexed (encoded) data. The pre quantized data then becomes 
> indexed, undergoing another quantization. To guarantee numerical stability 
> this should be changed such that the test data is quantized after indexing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to