On Fri, 25 Apr 2025 09:17:02 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> Thanks for telling me this information. Another more important reason to 
>> check outcnt here is to prevent this optimization when the uses of 
>> VectorMaskCmp is greater than 1, because this optimization may not be 
>> worthwhile. For example:
>> 
>> 
>>   public static void testVectorMaskCmp() {
>>     IntVector bv = IntVector.fromArray(I_SPECIES, ib, 0);
>>     IntVector av = IntVector.fromArray(I_SPECIES, ia, 0);
>>     VectorMask<Integer> m1 = av.compare(VectorOperators.NE, bv);  // two uses
>>     VectorMask<Integer> m2 =m1.not();
>>     m1.intoArray(m, 0);
>>     av.lanewise(VectorOperators.ABS, m2).intoArray(ia, 0);
>>   }
>> 
>> 
>> If we do not check outcnt and still do this optimization, two VectorMaskCmp 
>> nodes will be generated, and finally two VectorMaskCmp instructions will be 
>> generated. This is unreasonable because VectorMaskCmp has much higher 
>> latency than xor instruction on aarch64.
>
> Thanks, we can add this comment to the code where we are checking outcnt. 
> What if all the other users are also XorNodes?.

At present, you are checking for one XOR user; shouldn't it be all or one 
scenario?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2077378879

Reply via email to