On Wed, 7 Jan 2026 17:33:42 GMT, Yi Wu <[email protected]> wrote: >You mean move it down, like Op_AddReductionVI and Op_AddReductionVL to use >return !VM_Version::use_neon_for_vector(length_in_bytes);?
Yes, that was what I mean. > It doesn't to make much of a difference. So what does `8B/16B/32B` mean? I guess it means the real vector size of the reduction operation? But how did you test these cases, as I noticed the code of benchmarks do not have any parallelization differences. Is the vectorization factor decided by using different `MaxVectorSize` vm option ? If so, then I think the partial cases are not touched. Could you please check whether instruction of `VectorMaskGenNode` is generated from the generated code? I assume there should be difference, because for partial cases (vector_size < MaxVectorSize), it uses the SVE predicated instructions before, while it uses NEON instructions after. And the instruction latency/throughput of SVE reduction are much worse than NEON ones. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2670981173
