iverase edited a comment on issue #730: LUCENE-8868: New storing strategy for BKD tree leaves with low cardinality URL: https://github.com/apache/lucene-solr/pull/730#issuecomment-504378432 I had a look in how good is the formula to decide to use this optimisation and results are very interesting. I have only done it for 1D so far but it seems we are underestimating the compression in 1 dimension so the result is a bigger index. The test has been done randomly ingesting 10M intPoint, in each iteration the cardinality has been increased. The size of the index has been calculated after indexing the data single threaded and after force merge into one segment. - Considering the size of vint of 1 Byte results are pretty bad for some cardinalities. ``` leafCardinality * (packedBytesLength - prefixLenSum + 1) <= count * (packedBytesLength - prefixLenSum) ```  - Considering the size of vint of 2 Byte results improve but still there is a region were index becomes bigger. ``` leafCardinality * (packedBytesLength - prefixLenSum + 2) <= count * (packedBytesLength - prefixLenSum) ```  - Considering the size of vint of 3 Byte results improve but still there is a region were index becomes bigger. ``` leafCardinality * (packedBytesLength - prefixLenSum + 3) <= count * (packedBytesLength - prefixLenSum) ``` 
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
