On Thu, 24 Apr 2025 09:37:07 GMT, erifan <d...@openjdk.org> wrote:

>> src/hotspot/share/opto/vectornode.cpp line 2243:
>> 
>>> 2241:     in1 = in1->in(1);
>>> 2242:   }
>>> 2243:   if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 ||
>> 
>> Checks on outcnt on line 2243 and 2238 can be removed. Idealization looks 
>> for a specific graph palette and replaces it with a new node whose inputs 
>> are the same as the inputs of the palette. GVN will do the retention job if 
>> any intermediate node has users beyond the pattern being replaced.
>
> Thanks for telling me this information. Another more important reason to 
> check outcnt here is to prevent this optimization when the uses of 
> VectorMaskCmp is greater than 1, because this optimization may not be 
> worthwhile. For example:
> 
> 
>   public static void testVectorMaskCmp() {
>     IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0);
>     IntVector av = IntVector.fromArray(I_SPECIES, ia, 0);
>     VectorMask<Integer> m1 = av.compare(VectorOperators.NE, bv);  // two uses
>     VectorMask<Integer> m2 =m1.not();
>     m1.intoArray(m, 0);
>     av.lanewise(VectorOperators.ABS, m2).intoArray(ia, 0);
>   }
> 
> 
> If we do not check outcnt and still do this optimization, two VectorMaskCmp 
> nodes will be generated, and finally two VectorMaskCmp instructions will be 
> generated. This is unreasonable because VectorMaskCmp has much higher latency 
> than xor instruction on aarch64.

Thanks, we can add this comment to the code where we are checking outcnt. What 
if all the other users are also XorNodes?.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2059874975

Reply via email to