Re: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v4]

2023-05-30 Thread Chang Peng
On Tue, 30 May 2023 22:20:23 GMT, David Holmes wrote: > What testing was done on this fix before integration? I don't even see Git > Hub Actions being run. @dholmes-ora I did see earlier that Github Action ran (In the 'Checks' tab) and finished, and I believed the Windows failure is not relat

Integrated: 8307795: AArch64: Optimize VectorMask.truecount() on Neon

2023-05-30 Thread Chang Peng
On Mon, 15 May 2023 02:58:46 GMT, Chang Peng wrote: > In Vector API Java level, vector mask is represented as a boolean array with > 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it > is loaded into vector register, e.g. Neon, the in-memory forma

Re: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v4]

2023-05-28 Thread Chang Peng
Unit > testInt 723.822 ± 1.029 1182.375 ± 12.363ops/ms > testLong 632.154 ± 0.197 1382.74 ± 2.188ops/ms > testShort 788.665 ± 1.852 1152.38 ± 3.77 ops/ms > > [1]: > https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd

Re: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v3]

2023-05-18 Thread Chang Peng
On Mon, 15 May 2023 10:59:11 GMT, Andrew Haley wrote: > > > This looks like it might be removed by loop opts. I think you might need > > > a blackhole somewhere. > > > > > > `m` will be updated in every iteration of this loop, so `m` is not a > > loop-invariants actually. I can see the assemb

Re: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v3]

2023-05-18 Thread Chang Peng
Unit > testInt 723.822 ± 1.029 1182.375 ± 12.363ops/ms > testLong 632.154 ± 0.197 1382.74 ± 2.188ops/ms > testShort 788.665 ± 1.852 1152.38 ± 3.77 ops/ms > > [1]: > https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337bd

Re: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon [v2]

2023-05-18 Thread Chang Peng
Unit > testInt 723.822 ± 1.029 1182.375 ± 12.363ops/ms > testLong 632.154 ± 0.197 1382.74 ± 2.188ops/ms > testShort 788.665 ± 1.852 1152.38 ± 3.77 ops/ms > > [1]: > https://github.com/openjdk/jdk/blob/e1e758a7b43c29840296d337b

Re: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon

2023-05-15 Thread Chang Peng
On Mon, 15 May 2023 08:57:30 GMT, Andrew Haley wrote: > This looks like it might be removed by loop opts. I think you might need a > blackhole somewhere. ```m``` will be updated in every iteration of this loop, so ```m``` is not a loop-invariants actually. I can see the assembly code of this l

Re: RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon

2023-05-15 Thread Chang Peng
On Mon, 15 May 2023 08:56:37 GMT, Andrew Haley wrote: > That makes sense. Is it likely that there are more of these combined > operations on vector masks that could be matched? if so, it might make sense > to do the job earlier, in the C2 optimizer. Thanks for your review. I have tried to opt

RFR: 8307795: AArch64: Optimize VectorMask.truecount() on Neon

2023-05-14 Thread Chang Peng
In Vector API Java level, vector mask is represented as a boolean array with 0x00/0x01 (8 bits of each element) as values, aka in-memory format. When it is loaded into vector register, e.g. Neon, the in-memory format will be converted to in-register format with 0/-1 value for each lane (lane wid

RFR: 8307523: [vectorapi] Optimize MaskFromLongBenchmark.java

2023-05-05 Thread Chang Peng
To avoid dead code elimination, a use-point laneIsSet() is added in each benchmark method in MaskFromLongBenchmark.java. However, currently laneIsSet() [1] is implemented by toLong(). So it may generate a fromLong-toLong pair [2], making this benchmark to be noneffective after inlining laneIsSe