[ 
https://issues.apache.org/jira/browse/LUCENE-8868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera resolved LUCENE-8868.
----------------------------------
       Resolution: Fixed
         Assignee: Ignacio Vera
    Fix Version/s: 8.2
                   master (9.0)

> New storing strategy for BKD tree leaves with low cardinality
> -------------------------------------------------------------
>
>                 Key: LUCENE-8868
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8868
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Assignee: Ignacio Vera
>            Priority: Major
>             Fix For: master (9.0), 8.2
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently if a leaf on the BKD tree contains only few values, then the leaf 
> is treated the same way as it all values are different. It many cases it can 
> be much more efficient to store the distinct values with the cardinality.
> The strategy is the following:
> 1.  When writing a leaf block the cardinality is computed.
> 2. Perform some naive calculation to compute if it is better to store the 
> leaf as a low cardinality leaf. The storage cost are calculated as follows:
> *   low cardinality: leafCardinality * (packedBytesLength - prefixLenSum + 2) 
> where two is the estimated size of storing the cardinality. This is an 
> overestimation as in some cases you will only need one byte to store the 
> cardinality.
> * High cardinality: count * (packedBytesLength - prefixLenSum). We are not 
> taking into account the runlen compression.
> 3. If the tree has low cardinality then we set the compressed dim to -2. Note 
> that -1 is when all values are equal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to