iverase edited a comment on issue #730: LUCENE-8868: New storing strategy for 
BKD tree leaves with low cardinality
URL: https://github.com/apache/lucene-solr/pull/730#issuecomment-504378432
 
 
   I had a look in how good is the formula to decide to use this optimisation 
and results are very interesting. I have only done it for 1D so far but it 
seems we are underestimating the compression in 1 dimension so the result is a 
bigger index.
   
   The test has been done randomly ingesting 10M intPoint, in each iteration 
the cardinality has been  increased. The size of the index has been calculated 
after indexing the data single threaded and after force merge into one segment.
   
   - Considering the size of vint of 1 Byte results are pretty bad for some 
cardinalities.  
   
   ```
   leafCardinality * (packedBytesLength - prefixLenSum + 1)  <= count * 
(packedBytesLength - prefixLenSum)
   ```
   
   
![image](https://user-images.githubusercontent.com/29038686/59916904-c23aab80-9420-11e9-8e23-1e6c538be2ca.png)
   
   
   - Considering the size of vint of 2 Byte results improve but still there is 
a region were index becomes bigger.
   
   ```
   leafCardinality * (packedBytesLength - prefixLenSum + 2)  <= count * 
(packedBytesLength - prefixLenSum)
   ```
   
   
![image](https://user-images.githubusercontent.com/29038686/59916137-a2a28380-941e-11e9-8aab-d3bb9c80f7f2.png)
     
    - Considering the size of vint of 3 Byte results improve but still there is 
a region were index becomes bigger.
   
   ```
   leafCardinality * (packedBytesLength - prefixLenSum + 3)  <= count * 
(packedBytesLength - prefixLenSum)
   ```
   
   
![image](https://user-images.githubusercontent.com/29038686/59916754-51938f00-9420-11e9-8d68-98c9427a41fd.png)
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to