On Thu, 16 Nov 2023 10:05:14 GMT, Tobias Hartmann <thartm...@openjdk.org> wrote:
>> No, we don't mix: the SSE code is used as fallback only when the length is >> below 32 (if length is above 32 we check the tail with AVX code by >> shifting). >> >> I would suggest factoring out so that the implementations don't mix as much, >> mainly to reduce the number of possible variants to test and not to >> constrain one too much with the design of the other. We now have AVX3-only, >> AVX3+SSE, SSE-only and plain, and I suggest dropping AVX3+SSE and fixing the >> AVX3-only so that it more efficiently handles strings of length 16-31 by >> duplicating (or using AVX instructions for copying 16 and 8 chars at once. >> Some code duplication perhaps, but simpler flow through each variant. > > That seems reasonable but would be out of scope for this RFE. Yes, for now we mainly need to make sure this works. There are a few regressions in microbenchmarks that I'm trying to get on top of, and the x64 intrinsics in particular seem problematic, but it seems reasonable to not hold up this PR and work on such improvements separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16425#discussion_r1395517581