Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

via GitHub Mon, 04 Mar 2024 05:14:30 -0800


msokolov commented on issue #13147:
URL: https://github.com/apache/lucene/issues/13147#issuecomment-1976556731


   I tried increasing the usage of dense encoding by enabling it when it would 
consume up to 3/2 as many bits as packed bits encoding, rather than using it 
only when it would use up to the same amount. The wikipedia index produced is 
larger than the first candidate but still smaller than the baseline. Query 
performance does seem somewhat improved. I'm going to stop posting walls of 
numbers here until I have a chance to try some further improvements:
   
   1.  writing only the bytes needed rather than full-width longs
   2.  possibly we can save some time in advance by combining bit-counting with 
finding the next set bit
   3. Maybe there is a way to save the overhead of the conditionals that 
determine which block encoding to decode. EG by introducing a block-reader 
although then we might get another function call overhead in place of the 
conditional
   4. I want to test with other Amazon indexes as well
   5. It would be nice to have a theory about why the faceting test cases seem 
to see worse perf. I guess they are different in that they do not use top-N 
collection, so no score-based skipping?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Try encoding very frequent terms using a dense bitmap [lucene]

Reply via email to