Ignacio Vera created LUCENE-8888:
------------------------------------

             Summary: Improve distribution of points with data dimension in BKD 
tree leaves
                 Key: LUCENE-8888
                 URL: https://issues.apache.org/jira/browse/LUCENE-8888
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Ignacio Vera


In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
duplicated points. This works well with indexed dimension as the process of 
partition the space and the final sorting of leaves groups points with equal 
indexed dimensions.

This is not the case all the time if the point contain data dimensions. It 
might happen that if two points have the same indexed dimensions but different 
data dimensions, the distribution on the leaves is not the most optimal.

A good example is if a user tries to index a bounding box using LatLonShape. 
The resulting tessellation of a bounding box is two triangles with the same 
indexed dimensions but different data dimensions. If there are two documents 
indexing the same bounding box, the result in the leaf is the triangles from 
one document followed by the triangles of the second document. This is  because 
the current sorting/selection algorithms  use one indexed dimension and 
tie-break on the 
docID.

The most optimal distribution in the case above is two group together the equal 
triangles. Therefore what it is propose here is to update the selection/ 
sorting algorithms to use the data dimensions when they exist as tie-breakers 
before using the docID.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to