uschindler commented on PR #13133: URL: https://github.com/apache/lucene/pull/13133#issuecomment-1964672015
> I'm surprised by how slow this is with AVX off given that this can be implemented with SSE2 :(. This is why we try to avoid the incubating vector API as much as possible. The code needs to be tested with all platforms and bitsizes using extensive benchmarking. The problem is: - If Hotspot does not have an optimization for the actual CPU -> slow - If you use a JDK with Graal -> slow - If you use non-Hotspot (e.g., OpenJ9) -> slow - If you disable teiered compilation / use client compiler -> slow Actually the code is slow because without support in hardware, the code executes as written in Java code, producing hundreds of instances. Actually another thing: Once you fixed the code, make sure to show a benchmark on real queries. Just because the group vint decoding is 30% faster, it does not mean you would see any difference in production. Normally we would only accept incubator vector optimizations if the results are at least 4 times faster than scalar code (e.g., the float dotproduct is 12 to 16 times faster, but still the effect on query/merging performance is not 16 times faster, just about 15%). So if the effect on queries is <5% on query performance, I would disagree to merge this. Sorry! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
