[ 
https://issues.apache.org/jira/browse/LUCENE-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878449#comment-16878449
 ] 

ASF subversion and git services commented on LUCENE-8888:
---------------------------------------------------------

Commit 5bf6cf2eddf60a0d2696f31b9a252eb7af6f9c32 in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5bf6cf2 ]

LUCENE-8888: Improve distribution of points with data dimensions in BKD tree 
leaves (#747)



> Improve distribution of points with data dimension in BKD tree leaves
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-8888
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8888
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains 
> duplicated points. This works well with indexed dimension as the process of 
> partition the space and the final sorting of leaves groups points with equal 
> indexed dimensions.
> This is not the case all the time if the point contain data dimensions. It 
> might happen that if two points have the same indexed dimensions but 
> different data dimensions, the distribution on the leaves is not the most 
> optimal.
> A good example is if a user tries to index a bounding box using LatLonShape. 
> The resulting tessellation of a bounding box is two triangles with the same 
> indexed dimensions but different data dimensions. If there are two documents 
> indexing the same bounding box, the result in the leaf is the triangles from 
> one document followed by the triangles of the second document. This is  
> because the current sorting/selection algorithms  use one indexed dimension 
> and tie-break on the 
> docID.
> The most optimal distribution in the case above is two group together the 
> equal triangles. Therefore what it is propose here is to update the 
> selection/ sorting algorithms to use the data dimensions when they exist as 
> tie-breakers before using the docID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to