[
https://issues.apache.org/jira/browse/LUCENE-8888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878449#comment-16878449
]
ASF subversion and git services commented on LUCENE-8888:
---------------------------------------------------------
Commit 5bf6cf2eddf60a0d2696f31b9a252eb7af6f9c32 in lucene-solr's branch
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5bf6cf2 ]
LUCENE-8888: Improve distribution of points with data dimensions in BKD tree
leaves (#747)
> Improve distribution of points with data dimension in BKD tree leaves
> ---------------------------------------------------------------------
>
> Key: LUCENE-8888
> URL: https://issues.apache.org/jira/browse/LUCENE-8888
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Priority: Major
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> In LUCENE-8688 it was introduce a new storing strategy for leaves contains
> duplicated points. This works well with indexed dimension as the process of
> partition the space and the final sorting of leaves groups points with equal
> indexed dimensions.
> This is not the case all the time if the point contain data dimensions. It
> might happen that if two points have the same indexed dimensions but
> different data dimensions, the distribution on the leaves is not the most
> optimal.
> A good example is if a user tries to index a bounding box using LatLonShape.
> The resulting tessellation of a bounding box is two triangles with the same
> indexed dimensions but different data dimensions. If there are two documents
> indexing the same bounding box, the result in the leaf is the triangles from
> one document followed by the triangles of the second document. This is
> because the current sorting/selection algorithms use one indexed dimension
> and tie-break on the
> docID.
> The most optimal distribution in the case above is two group together the
> equal triangles. Therefore what it is propose here is to update the
> selection/ sorting algorithms to use the data dimensions when they exist as
> tie-breakers before using the docID.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]