[
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ignacio Vera resolved LUCENE-8868.
----------------------------------
Resolution: Fixed
Assignee: Ignacio Vera
Fix Version/s: 8.2
master (9.0)
> New storing strategy for BKD tree leaves with low cardinality
> -------------------------------------------------------------
>
> Key: LUCENE-8868
> URL: https://issues.apache.org/jira/browse/LUCENE-8868
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Assignee: Ignacio Vera
> Priority: Major
> Fix For: master (9.0), 8.2
>
> Time Spent: 3h 50m
> Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf
> is treated the same way as it all values are different. It many cases it can
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1. When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> * low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2)
> where two is the estimated size of storing the cardinality. This is an
> overestimation as in some cases you will only need one byte to store the
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note
> that -1 is when all values are equal.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]