Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

Jatin Bhateja Wed, 23 Apr 2025 05:17:18 -0700

On Fri, 18 Apr 2025 01:36:10 GMT, erifan <[email protected]> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
>> negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
>> `-XX:UseSVE=2`:
>> 
>> Benchmark                    Unit    Before          Score Error     After   
>>         Score Error     Uplift
>> testCompareEQMaskNotByte     ops/s   7912127.225     2677.289518     
>> 10266136.26     8955.008548     1.29
>> testCompareEQMaskNotDouble   ops/s   884737.6799     446.963779      
>> 1179760.772     448.031844      1.33
>> testCompareEQMaskNotFloat    ops/s   1765045.787     682.332214      
>> 2359520.803     896.305743      1.33
>> testCompareEQMaskNotInt              ops/s   1787221.411     977.743935      
>> 2353952.519     960.069976      1.31
>> testCompareEQMaskNotLong     ops/s   895297.1974     673.44808       
>> 1178449.02      323.804205      1.31
>> testCompareEQMaskNotShort    ops/s   3339987.002     3415.2226       
>> 4712761.965     2110.862053     1.41
>> testCompareGEMaskNotByte     ops/s   7907615.16      4094.243652     
>> 10251646.9      9486.699831     1.29
>> testCompareGEMaskNotInt              ops/s   1683738.958     4233.813092     
>> 2352855.205     1251.952546     1.39
>> testCompareGEMaskNotLong     ops/s   854496.1561     8594.598885     
>> 1177811.493     521.1229        1.37
>> testCompareGEMaskNotShort    ops/s   3341860.309     1578.975338     
>> 4714008.434     1681.10365      1.41
>> testCompareGTMaskNotByte     ops/s   7910823.674     2993.367032     
>> 10245063.58     9774.75138      1.29
>> testCompareGTMaskNotInt              ops/s   1673393.928     3153.099431     
>> 2353654.521     1190.848583     1.4
>> testCompareGTMaskNotLong     ops/s   849405.9159     2432.858159     
>> 1177952.041     359.96413       1.38
>> testCompareGTMaskNotShort    ops/s   3339509.141     3339.976585     
>> 4711442.496     2673.364893     1.41
>> testCompareLEMaskNotByte     ops/s   7911340.004     3114.69191      
>> 10231626.5      27134.20035     1.29
>> testCompareLEMaskNotInt              ops/s   1675812.113     1340.969885     
>> 2353255.341     1452.4522       1.4
>> testCompareLEMaskNotLong     ops/s   848862.8036     6564.841731     
>> 1177763.623     539.290106      1.38
>> testCompareLEMaskNotShort    ops/s   3324951.54      2380.29473      
>> 4712116.251     1544.559684     1.41
>> testCompareLTMaskNotByte     ops/s   7910390.844     2630.861436     
>> 10239567.69     6487.441672     1.29
>> testCompareLTMaskNotInt              ops/s   16721...
>
> erifan has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains two additional commits since the 
> last revision:
> 
>  - Merge branch 'master' into JDK-8354242
>  - 8354242: VectorAPI: combine vector not operation with compare
>    
>    This patch optimizes the following patterns:
>    For integer types:
>    ```
>    (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>        => (VectorMaskCmp src1 src2 ncond)
>    (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>        => (VectorMaskCmp src1 src2 ncond)
>    ```
>    cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>    negative comparison of cond.
>    
>    For float and double types:
>    ```
>    (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>        => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>    (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>        => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>    ```
>    cond can be eq or ne.
>    
>    Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>    With option `-XX:UseSVE=2`:
>    ```
>    Benchmark                  Unit    Before          Score Error     After   
>         Score Error     Uplift
>    testCompareEQMaskNotByte   ops/s   7912127.225     2677.289518     
> 10266136.26     8955.008548     1.29
>    testCompareEQMaskNotDouble ops/s   884737.6799     446.963779      
> 1179760.772     448.031844      1.33
>    testCompareEQMaskNotFloat  ops/s   1765045.787     682.332214      
> 2359520.803     896.305743      1.33
>    testCompareEQMaskNotInt            ops/s   1787221.411     977.743935      
> 2353952.519     960.069976      1.31
>    testCompareEQMaskNotLong   ops/s   895297.1974     673.44808       
> 1178449.02      323.804205      1.31
>    testCompareEQMaskNotShort  ops/s   3339987.002     3415.2226       
> 4712761.965     2110.862053     1.41
>    testCompareGEMaskNotByte   ops/s   7907615.16      4094.243652     
> 10251646.9      9486.699831     1.29
>    testCompareGEMaskNotInt            ops/s   1683738.958     4233.813092     
> 2352855.205     1251.952546     1.39
>    testCompareGEMaskNotLong   ops/s   854496.1561     8594.598885     
> 1177811.493     521.1229        1.37
>    testCompareGEMaskNotShort  ops/s   3341860.309     1578.975338     
> 4714008.434     1681.10365      1.41
>    testCompareGTMaskNotByte   ops/s   7910823.674     2993.367032     
> 10245063.58     9774.75138      1.29
>    testCompareGTMaskNotInt            ops/s   1673393.928     3153.099431     
> 2353654.521     1190.848583     1.4
>    testCompareGTMaskNotLong   ops/s   849405.9159     2432.858159     
> 1177952.041     359.96413       1.38
>    testCompareGTMaskNotShort  ops/s   3339509.141     3339.976585     
> 4711442.496     2673.364893     1.41
>    testCompareLEMaskNotByte   ops/s   7911340.004...


src/hotspot/share/opto/vectornode.cpp line 2234:

> 2232:   // XorV/XorVMask is commutative, swap VectorMaskCmp/Op_VectorMaskCast 
> to in1.
> 2233:   if (in1->Opcode() != Op_VectorMaskCmp && in1->Opcode() != 
> Op_VectorMaskCast) {
> 2234:     swap(in1, in2);

Swapping inputs like this without refreshing GVN bookkeeping is not safe. I 
guess you wanted to use Node::swap_edges.

src/hotspot/share/opto/vectornode.cpp line 2265:

> 2263:     vmcmp = new VectorMaskCastNode(phase->transform(vmcmp), vmcast_vt);
> 2264:   }
> 2265:   return vmcmp;

It would be preferable if you could kindly re-factor the code such that we only 
call VectorNode::Ideal once at return to comply with aesthetics of other 
idealization routines.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2055908691
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2055911839

Re: RFR: 8354242: VectorAPI: combine vector not operation with compare [v2]

Reply via email to