msokolov commented on issue #13147: URL: https://github.com/apache/lucene/issues/13147#issuecomment-1976556731
I tried increasing the usage of dense encoding by enabling it when it would consume up to 3/2 as many bits as packed bits encoding, rather than using it only when it would use up to the same amount. The wikipedia index produced is larger than the first candidate but still smaller than the baseline. Query performance does seem somewhat improved. I'm going to stop posting walls of numbers here until I have a chance to try some further improvements: 1. writing only the bytes needed rather than full-width longs 2. possibly we can save some time in advance by combining bit-counting with finding the next set bit 3. Maybe there is a way to save the overhead of the conditionals that determine which block encoding to decode. EG by introducing a block-reader although then we might get another function call overhead in place of the conditional 4. I want to test with other Amazon indexes as well 5. It would be nice to have a theory about why the faceting test cases seem to see worse perf. I guess they are different in that they do not use top-N collection, so no score-based skipping? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
