On Wed, 26 Feb 2025 07:04:58 GMT, Nicole Xu <d...@openjdk.org> wrote:
>> Suite `MaskedLogicOpts.maskedLogicOperationsLong512()` failed on both x86 >> and AArch64 with the following error: >> >> >> java.lang.IndexOutOfBoundsException: Index 252 out of bounds for length 249 >> >> >> The variable `long256_arr_idx` is misused when indexing `LongVector l2`, >> `l3`, `l4`, `l5` in function `maskedLogicOperationsLongKernel()` resulting >> in the IndexOutOfBoundsException error. On the other hand, the unified index >> for 128-bit, 256-bit and 512-bit species might not be proper since it leaves >> gaps in between when accessing the data for 128-bit and 256-bit species. >> This will unnecessarily include the noise due to cache misses or (on some >> targets) prefetching additional cache lines which are not usable, thereby >> impacting the crispness of microbenchmark. >> >> Hence, we improved the benchmark from several aspects, >> 1. Used sufficient number of predicated operations within the vector loop >> while minimizing the noise due to memory operations. >> 2. Modified the index computation logic which can now withstand any ARRAYLEN >> without resulting in an IOOBE. >> 3. Removed redundant vector read/writes to instance fields, thus eliminating >> significant boxing penalty which translates into throughput gains. > > Nicole Xu has updated the pull request incrementally with two additional > commits since the last revision: > > - 8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to > IndexOutOfBoundsException > > Suite MaskedLogicOpts.maskedLogicOperationsLong512() failed on both x86 > and AArch64 with the following error: > > ``` > java.lang.IndexOutOfBoundsException: Index 252 out of bounds for length 249 > ``` > > The variable `long256_arr_idx` is misused when indexing `LongVector l2`, > `l3`, `l4`, `l5` in function `maskedLogicOperationsLongKernel()` > resulting in the IndexOutOfBoundsException error. On the other hand, the > unified index for 128-bit, 256-bit and 512-bit species might not be > proper since it leaves gaps in between when accessing the data > for 128-bit and 256-bit species. This will unnecessarily include the > noise due to cache misses or (on some targets) prefetching additional > cache lines which are not usable, thereby impacting the crispness of > microbenchmark. > > Hence, we improved the benchmark from several aspects, > 1. Used sufficient number of predicated operations within the vector > loop while minimizing the noise due to memory operations. > 2. Modified the index computation logic which can now withstand any > ARRAYLEN without resulting in an IOOBE. > 3. Removed redundant vector read/writes to instance fields, thus > eliminating significant boxing penalty which translates into throughput > gains. > > Change-Id: Ie8a9d495b1ca5e36f1eae069ff70a815a2de00c0 > - Revert "8346954: [JMH] jdk.incubator.vector.MaskedLogicOpts fails due to > IndexOutOfBoundsException" > > This reverts commit 083bedec04d5ab78a420e156e74c1257ce30aee8. LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22963#pullrequestreview-2643351624