On Sat, 6 May 2023 02:01:20 GMT, Chang Peng <d...@openjdk.org> wrote:
> To avoid dead code elimination, a use-point laneIsSet() is added in each > benchmark method in MaskFromLongBenchmark.java. > > However, currently laneIsSet() [1] is implemented by toLong(). So it may > generate a fromLong-toLong pair [2], making this benchmark to be noneffective > after inlining laneIsSet() into the outer method. The assembly of > maskFromLong_byte128 benchmark on SVE2 is shown in [3]. We cannot see the > bdep instruction used by fromLong on AArch64 [4]. So, in this case, we cannot > measure fromLong()'s performance by using this benchmark. > > This patch uses trueCount() [5] instead of toLong() to measure the > fromLong()'s performance effectively. After this patch, we can see the bdep > instruction in the hot loop [6] of maskFromLong_byte128 benchmark. > > [1]: > https://github.com/openjdk/jdk/blob/96fa2751e8bbc05d6d064d80c07720cc9db05c54/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractMask.java#L70 > [2]: > https://github.com/openjdk/jdk/blob/ff368d504e9101e11c7182185f56255f429d31e3/src/hotspot/share/opto/vectornode.cpp#L1736 > [3]: https://gist.github.com/changpeng1997/467f6056f78d99c055030fa5888b6baa > [4]: > https://github.com/openjdk/jdk/blob/787832a58677205c9a11ae100dd8a2fbddb30a4a/src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp#L1099 > [5]: > https://docs.oracle.com/en/java/javase/16/docs/api/jdk.incubator.vector/jdk/incubator/vector/VectorMask.html#trueCount() > [6]: https://gist.github.com/changpeng1997/79bea0a9f80530bec89978950897000d Storing into a boolean array should be safer as `trueCount` can be implemented as `bitCount(toLong())`. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13851#issuecomment-1536988490