On Fri, 30 Aug 2024 22:04:39 GMT, Francesco Nigro <d...@openjdk.org> wrote:
> All of these strategies are better than what we have now, probably because > the existing instrinsics still perform some poor decision, but I haven't dug > yet into perfasm out to see what it does wrong; maybe is something which > could be fixed in the intrinsic itself? I'm no intrinsics expert, but if I had to guess I'd say that the intrinsics we have do not specialize for small sizes. Also, the use of vector instructions typically comes with additional alignment constraints - meaning that we need a pre-loop (and sometimes a post-loop). This logic, while faster for bigger sizes, has some drawbacks for smaller sizes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20712#issuecomment-2324276883