On Wed, 14 May 2025 02:44:14 GMT, erifan <d...@openjdk.org> wrote: >> This patch optimizes the following patterns: >> For integer types: >> >> (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) >> => (VectorMaskCmp src1 src2 ncond) >> (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) >> => (VectorMaskCmp src1 src2 ncond) >> >> cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the >> negative comparison of cond. >> >> For float and double types: >> >> (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) >> => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) >> >> cond can be eq or ne. >> >> Benchmarks on Nvidia Grace machine with 128-bit SVE2: With option >> `-XX:UseSVE=2`: >> >> Benchmark Unit Before Score Error After >> Score Error Uplift >> testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 >> 10266136.26 8955.008548 1.29 >> testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 >> 1179760.772 448.031844 1.33 >> testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 >> 2359520.803 896.305743 1.33 >> testCompareEQMaskNotInt ops/s 1787221.411 977.743935 >> 2353952.519 960.069976 1.31 >> testCompareEQMaskNotLong ops/s 895297.1974 673.44808 >> 1178449.02 323.804205 1.31 >> testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 >> 4712761.965 2110.862053 1.41 >> testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 >> 10251646.9 9486.699831 1.29 >> testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 >> 2352855.205 1251.952546 1.39 >> testCompareGEMaskNotLong ops/s 854496.1561 8594.598885 >> 1177811.493 521.1229 1.37 >> testCompareGEMaskNotShort ops/s 3341860.309 1578.975338 >> 4714008.434 1681.10365 1.41 >> testCompareGTMaskNotByte ops/s 7910823.674 2993.367032 >> 10245063.58 9774.75138 1.29 >> testCompareGTMaskNotInt ops/s 1673393.928 3153.099431 >> 2353654.521 1190.848583 1.4 >> testCompareGTMaskNotLong ops/s 849405.9159 2432.858159 >> 1177952.041 359.96413 1.38 >> testCompareGTMaskNotShort ops/s 3339509.141 3339.976585 >> 4711442.496 2673.364893 1.41 >> testCompareLEMaskNotByte ops/s 7911340.004 3114.69191 >> 10231626.5 27134.20035 1.29 >> testCompareLEMaskNotInt ops/s 1675812.113 1340.969885 >> 2353255.341 1452.4522 1.4 >> testCompareLEMaskNotLong ops/s 848862.8036 6564.841731 >> 1177763.623 539.290106 1.38 >> testCompareLEMaskNotShort ops/s 3324951.54 2380.29473 >> 4712116.251 1544.559684 1.41 >> testCompareLTMaskNotByte ops/s 7910390.844 2630.861436 >> 10239567.69 6487.441672 1.29 >> testCompareLTMaskNotInt ops/s 16721... > > erifan has updated the pull request with a new target base due to a merge or > a rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains 10 additional commits since the > last revision: > > - Refactor the JTReg tests for compare.xor(maskAll) > > Also made a bit change to support pattern `VectorMask.fromLong()`. > - Merge branch 'master' into JDK-8354242 > - Refactor code > > Add a new function XorVNode::Ideal_XorV_VectorMaskCmp to do this > optimization, making the code more modular. > - Merge branch 'master' into JDK-8354242 > - Update the jtreg test > - Merge branch 'master' into JDK-8354242 > - Addressed some review comments > > 1. Call VectorNode::Ideal() only once in XorVNode::Ideal. > 2. Improve code comments. > - Merge branch 'master' into JDK-8354242 > - Merge branch 'master' into JDK-8354242 > - 8354242: VectorAPI: combine vector not operation with compare > > This patch optimizes the following patterns: > For integer types: > ``` > (XorV (VectorMaskCmp src1 src2 cond) (Replicate -1)) > => (VectorMaskCmp src1 src2 ncond) > (XorVMask (VectorMaskCmp src1 src2 cond) (MaskAll m1)) > => (VectorMaskCmp src1 src2 ncond) > ``` > cond can be eq, ne, le, ge, lt, gt, ule, uge, ult and ugt, ncond is the > negative comparison of cond. > > For float and double types: > ``` > (XorV (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (Replicate -1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > (XorVMask (VectorMaskCast (VectorMaskCmp src1 src2 cond)) (MaskAll m1)) > => (VectorMaskCast (VectorMaskCmp src1 src2 ncond)) > ``` > cond can be eq or ne. > > Benchmarks on Nvidia Grace machine with 128-bit SVE2: > With option `-XX:UseSVE=2`: > ``` > Benchmark Unit Before Score Error After > Score Error Uplift > testCompareEQMaskNotByte ops/s 7912127.225 2677.289518 > 10266136.26 8955.008548 1.29 > testCompareEQMaskNotDouble ops/s 884737.6799 446.963779 > 1179760.772 448.031844 1.33 > testCompareEQMaskNotFloat ops/s 1765045.787 682.332214 > 2359520.803 896.305743 1.33 > testCompareEQMaskNotInt ops/s 1787221.411 977.743935 > 2353952.519 960.069976 1.31 > testCompareEQMaskNotLong ops/s 895297.1974 673.44808 > 1178449.02 323.804205 1.31 > testCompareEQMaskNotShort ops/s 3339987.002 3415.2226 > 4712761.965 2110.862053 1.41 > testCompareGEMaskNotByte ops/s 7907615.16 4094.243652 > 10251646.9 9486.699831 1.29 > testCompareGEMaskNotInt ops/s 1683738.958 4233.813092 > 2352855...
@erifan Looks like you really improved things, nice work! I have some more comments below :) src/hotspot/share/opto/vectornode.cpp line 2213: > 2211: Node* in1 = in(1); > 2212: Node* in2 = in(2); > 2213: // Transformations for predicated IRs are not supported for now. Suggestion: // Transformations for predicated vectors are not supported for now. src/hotspot/share/opto/vectornode.cpp line 2215: > 2213: // Transformations for predicated IRs are not supported for now. > 2214: if (is_predicated_vector() || in1->is_predicated_vector() || > 2215: in2->is_predicated_vector()) { I would either put all on the same line, or all on separate lines. src/hotspot/share/opto/vectornode.cpp line 2219: > 2217: } > 2218: > 2219: // XorV/XorVMask is commutative, swap VectorMaskCmp/Op_VectorMaskCast > to in1. Suggestion: // XorV/XorVMask is commutative, swap VectorMaskCmp/VectorMaskCast to in1. Would look a little cleaner, and you did also not write `Op_VectorMaskCmp` either ;) src/hotspot/share/opto/vectornode.cpp line 2225: > 2223: } > 2224: > 2225: const TypeVect* vmcast_vt = nullptr; Suggestion: const TypeVect* vector_mask_cast_vt = nullptr; I think it would not hurt to write it out. Otherwise, the reader always has to reconstruct that in their head. src/hotspot/share/opto/vectornode.cpp line 2230: > 2228: vmcast_vt = in1->as_Vector()->vect_type(); > 2229: in1 = in1->in(1); > 2230: } Add a comment why you check `in1->outcnt() == 1`. src/hotspot/share/opto/vectornode.cpp line 2233: > 2231: if (in2->Opcode() == Op_VectorMaskCast) { > 2232: in2 = in2->in(1); > 2233: } Wow, this seems to be an addition that is not covered in the patterns you mention above, right? But is that even necessary? I suppose here `in2 = VectorMaskCast(all_ones_vector)`. Would we not already want to transform this pattern in `VectorMaskCast::Ideal`, is that not possible and more powerful? src/hotspot/share/opto/vectornode.cpp line 2244: > 2242: // BoolTest doesn't support unsigned comparisons. > 2243: BoolTest::mask neg_cond = > 2244: (BoolTest::mask) (((VectorMaskCmpNode*) in1)->get_predicate() ^ > 4); What is the hard-coded `^ 4` here? This whole line looks like we are looking at internals of the `VectorMaskCmpNode` or its predicate, and we should probably do that in some method there? Or maybe it should be part of the `BoolTest(::mask)` interface? src/hotspot/share/opto/vectornode.cpp line 2251: > 2249: predicate_node, vt); > 2250: if (vmcast_vt != nullptr) { > 2251: // We optimized out an VectorMaskCast, and in order to ensure type Suggestion: // We optimized out a VectorMaskCast, and in order to ensure type src/hotspot/share/opto/vectornode.cpp line 2253: > 2251: // We optimized out an VectorMaskCast, and in order to ensure type > 2252: // correctness, we need to regenerate one. VectorMaskCast will be > encoded as > 2253: // empty for types with the same size. Suggestion: // a no-op (identity function) for types with the same size. Or what do you mean by "empty"? `TOP`? All zeros? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 49: > 47: private static final VectorSpecies<Long> L_SPECIES = > LongVector.SPECIES_MAX; > 48: private static final VectorSpecies<Float> F_SPECIES = > FloatVector.SPECIES_MAX; > 49: private static final VectorSpecies<Double> D_SPECIES = > DoubleVector.SPECIES_MAX; @jatin-bhateja Do you think it is sufficient to only test on `MAX` sizes here? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 96: > 94: Generator<Long> lGen = RD.uniformLongs(Long.MIN_VALUE, > Long.MAX_VALUE); > 95: Generator<Float> fGen = RD.uniformFloats(Float.MIN_VALUE, > Float.MAX_VALUE); > 96: Generator<Double> dGen = RD.uniformDoubles(Double.MIN_VALUE, > Double.MAX_VALUE); Are you sure you only want to draw from the uniform distribution? If you don't super care about the distribution, please just take `RD.ints/longs/floats/doubles()`. That way, you get all sorts of distributions, and also some that include NaN values etc. I think that would be important for your float cmp cases, no? test/hotspot/jtreg/compiler/vectorapi/VectorMaskCompareNotTest.java line 237: > 235: // Byte tests > 236: @Test > 237: @IR(counts = { IRNode.XOR_V_MASK, "= 0", IRNode.XOR_VB, "= 0" }, Could you still assert the presence of some other vectors, just to make sure we are indeed getting vectors here? test/micro/org/openjdk/bench/jdk/incubator/vector/MaskCompareNotBenchmark.java line 49: > 47: private static final VectorSpecies<Long> L_SPECIES = > LongVector.SPECIES_MAX; > 48: private static final VectorSpecies<Float> F_SPECIES = > FloatVector.SPECIES_MAX; > 49: private static final VectorSpecies<Double> D_SPECIES = > DoubleVector.SPECIES_MAX; Are you taking `SPECIES_MAX` on purpose here, or could we take `SPECIES_PREFERRED` instead? @jatin-bhateja What is the best to do in these tests? I suppose best would be to test with all vector lengths... ------------- PR Review: https://git.openjdk.org/jdk/pull/24674#pullrequestreview-2874740189 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111679899 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111679178 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111681447 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111684880 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111688348 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111695428 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111699396 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111705378 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111706856 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111726416 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111711829 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111717556 PR Review Comment: https://git.openjdk.org/jdk/pull/24674#discussion_r2111723220