On Wed, 7 Jan 2026 17:33:42 GMT, Yi Wu <[email protected]> wrote:

>You mean move it down, like Op_AddReductionVI and Op_AddReductionVL to use 
>return !VM_Version::use_neon_for_vector(length_in_bytes);?

Yes, that was what I mean.

> It doesn't to make much of a difference.

So what does `8B/16B/32B` mean? I guess it means the real vector size of the 
reduction operation? But how did you test these cases, as I noticed the code of 
benchmarks do not have any parallelization differences. Is the vectorization 
factor decided by using different `MaxVectorSize` vm option ? If so, then I 
think the partial cases are not touched. Could you please check whether 
instruction of `VectorMaskGenNode` is generated from the generated code? I 
assume there should be difference, because for partial cases (vector_size < 
MaxVectorSize), it uses the SVE predicated instructions before, while it uses 
NEON instructions after. And the instruction latency/throughput of SVE 
reduction are much worse than NEON ones.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2670981173

Reply via email to