On Fri, 18 Apr 2025 01:36:10 GMT, erifan <d...@openjdk.org> wrote:

>> This patch optimizes the following patterns:
>> For integer types:
>> 
>> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>>     => (VectorMaskCmp src1 src2 ncond)
>> 
>> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the 
>> negative comparison of cond.
>> 
>> For float and double types:
>> 
>> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>>     => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>> 
>> cond can be eq or ne.
>> 
>> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option 
>> `-XX:UseSVE=2`:
>> 
>> Benchmark                    Unit    Before          Score Error     After   
>>         Score Error     Uplift
>> testCompareEQMaskNotByte     ops/s   7912127.225     2677.289518     
>> 10266136.26     8955.008548     1.29
>> testCompareEQMaskNotDouble   ops/s   884737.6799     446.963779      
>> 1179760.772     448.031844      1.33
>> testCompareEQMaskNotFloat    ops/s   1765045.787     682.332214      
>> 2359520.803     896.305743      1.33
>> testCompareEQMaskNotInt              ops/s   1787221.411     977.743935      
>> 2353952.519     960.069976      1.31
>> testCompareEQMaskNotLong     ops/s   895297.1974     673.44808       
>> 1178449.02      323.804205      1.31
>> testCompareEQMaskNotShort    ops/s   3339987.002     3415.2226       
>> 4712761.965     2110.862053     1.41
>> testCompareGEMaskNotByte     ops/s   7907615.16      4094.243652     
>> 10251646.9      9486.699831     1.29
>> testCompareGEMaskNotInt              ops/s   1683738.958     4233.813092     
>> 2352855.205     1251.952546     1.39
>> testCompareGEMaskNotLong     ops/s   854496.1561     8594.598885     
>> 1177811.493     521.1229        1.37
>> testCompareGEMaskNotShort    ops/s   3341860.309     1578.975338     
>> 4714008.434     1681.10365      1.41
>> testCompareGTMaskNotByte     ops/s   7910823.674     2993.367032     
>> 10245063.58     9774.75138      1.29
>> testCompareGTMaskNotInt              ops/s   1673393.928     3153.099431     
>> 2353654.521     1190.848583     1.4
>> testCompareGTMaskNotLong     ops/s   849405.9159     2432.858159     
>> 1177952.041     359.96413       1.38
>> testCompareGTMaskNotShort    ops/s   3339509.141     3339.976585     
>> 4711442.496     2673.364893     1.41
>> testCompareLEMaskNotByte     ops/s   7911340.004     3114.69191      
>> 10231626.5      27134.20035     1.29
>> testCompareLEMaskNotInt              ops/s   1675812.113     1340.969885     
>> 2353255.341     1452.4522       1.4
>> testCompareLEMaskNotLong     ops/s   848862.8036     6564.841731     
>> 1177763.623     539.290106      1.38
>> testCompareLEMaskNotShort    ops/s   3324951.54      2380.29473      
>> 4712116.251     1544.559684     1.41
>> testCompareLTMaskNotByte     ops/s   7910390.844     2630.861436     
>> 10239567.69     6487.441672     1.29
>> testCompareLTMaskNotInt              ops/s   16721...
>
> erifan has updated the pull request with a new target base due to a merge or 
> a rebase. The incremental webrev excludes the unrelated changes brought in by 
> the merge/rebase. The pull request contains two additional commits since the 
> last revision:
> 
>  - Merge branch 'master' into JDK-8354242
>  - 8354242: VectorAPI: combine vector not operation with compare
>    
>    This patch optimizes the following patterns:
>    For integer types:
>    ```
>    (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1))
>        => (VectorMaskCmp src1 src2 ncond)
>    (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1))
>        => (VectorMaskCmp src1 src2 ncond)
>    ```
>    cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the
>    negative comparison of cond.
>    
>    For float and double types:
>    ```
>    (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1))
>        => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>    (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1))
>        => (VectorMaskCast (VectorMaskCmp src1 src2 ncond))
>    ```
>    cond can be eq or ne.
>    
>    Benchmarks on Nvidia Grace machine with 128-bit SVE2:
>    With option `-XX:UseSVE=2`:
>    ```
>    Benchmark                  Unit    Before          Score Error     After   
>         Score Error     Uplift
>    testCompareEQMaskNotByte   ops/s   7912127.225     2677.289518     
> 10266136.26     8955.008548     1.29
>    testCompareEQMaskNotDouble ops/s   884737.6799     446.963779      
> 1179760.772     448.031844      1.33
>    testCompareEQMaskNotFloat  ops/s   1765045.787     682.332214      
> 2359520.803     896.305743      1.33
>    testCompareEQMaskNotInt            ops/s   1787221.411     977.743935      
> 2353952.519     960.069976      1.31
>    testCompareEQMaskNotLong   ops/s   895297.1974     673.44808       
> 1178449.02      323.804205      1.31
>    testCompareEQMaskNotShort  ops/s   3339987.002     3415.2226       
> 4712761.965     2110.862053     1.41
>    testCompareGEMaskNotByte   ops/s   7907615.16      4094.243652     
> 10251646.9      9486.699831     1.29
>    testCompareGEMaskNotInt            ops/s   1683738.958     4233.813092     
> 2352855.205     1251.952546     1.39
>    testCompareGEMaskNotLong   ops/s   854496.1561     8594.598885     
> 1177811.493     521.1229        1.37
>    testCompareGEMaskNotShort  ops/s   3341860.309     1578.975338     
> 4714008.434     1681.10365      1.41
>    testCompareGTMaskNotByte   ops/s   7910823.674     2993.367032     
> 10245063.58     9774.75138      1.29
>    testCompareGTMaskNotInt            ops/s   1673393.928     3153.099431     
> 2353654.521     1190.848583     1.4
>    testCompareGTMaskNotLong   ops/s   849405.9159     2432.858159     
> 1177952.041     359.96413       1.38
>    testCompareGTMaskNotShort  ops/s   3339509.141     3339.976585     
> 4711442.496     2673.364893     1.41
>    testCompareLEMaskNotByte   ops/s   7911340.004...

src/hotspot/share/opto/vectornode.cpp line 2243:

> 2241:     in1 = in1->in(1);
> 2242:   }
> 2243:   if (in1->Opcode() != Op_VectorMaskCmp || in1->outcnt() > 1 ||

Checks on outcnt on line 2243 and 2238 can be removed. Idealization looks for a 
specific graph palette and replaces it with a new node whose inputs are the 
same as the inputs of the palette. GVN will do the retention job if any 
intermediate node has users beyond the pattern being replaced.

test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 38:

> 36:  * @summary test combining vector not operation with compare
> 37:  * @modules jdk.incubator.vector
> 38:  * @requires ((os.arch!="x86" & os.arch!="i386" & os.arch!="amd64" & 
> os.arch!="x86_64") | vm.cpu.features ~= ".*avx.*")

You can remove this platform limitation and forward the constraints to @IR 
rules using applyIfCPUFeatureOr

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2055741909
PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2055787414

Reply via email to