On Tue, 5 Aug 2025 11:39:43 GMT, Galder Zamarreño <gal...@openjdk.org> wrote:
>> I've added support to vectorize `MoveD2L`, `MoveL2D`, `MoveF2I` and >> `MoveI2F` nodes. The implementation follows a similar pattern to what is >> done with conversion (`Conv*`) nodes. The tests in >> `TestCompatibleUseDefTypeSize` have been updated with the new expectations. >> >> Also added a JMH benchmark which measures throughput (the higher the number >> the better) for methods that exercise these nodes. On darwin/aarch64 it >> shows: >> >> >> Benchmark (seed) (size) Mode Cnt >> Base Patch Units Diff >> VectorBitConversion.doubleToLongBits 0 2048 thrpt 8 >> 1168.782 1157.717 ops/ms -1% >> VectorBitConversion.doubleToRawLongBits 0 2048 thrpt 8 >> 3999.387 7353.936 ops/ms +83% >> VectorBitConversion.floatToIntBits 0 2048 thrpt 8 >> 1200.338 1188.206 ops/ms -1% >> VectorBitConversion.floatToRawIntBits 0 2048 thrpt 8 >> 4058.248 14792.474 ops/ms +264% >> VectorBitConversion.intBitsToFloat 0 2048 thrpt 8 >> 3050.313 14984.246 ops/ms +391% >> VectorBitConversion.longBitsToDouble 0 2048 thrpt 8 >> 3022.691 7379.360 ops/ms +144% >> >> >> The improvements observed are a result of vectorization. The lack of >> vectorization in `doubleToLongBits` and `floatToIntBits` demonstrates that >> these changes do not affect their performance. These methods do not >> vectorize because of flow control. >> >> I've run the tier1-3 tests on linux/aarch64 and didn't observe any >> regressions. > > Galder Zamarreño has updated the pull request incrementally with one > additional commit since the last revision: > > Check at the very least that auto vectorization is supported src/hotspot/share/opto/superword.cpp line 1635: > 1633: } else if (VectorNode::is_convert_opcode(opc)) { > 1634: retValue = VectorCastNode::implemented(opc, size, > velt_basic_type(p0->in(1)), velt_basic_type(p0)); > 1635: } else if (VectorNode::is_reinterpret_opcode(opc)) { How does this affect `Op_ReinterpretHF2S` that is also in `VectorNode::is_reinterpret_opcode`? I'm afraid that we need to test this with hardware or Intel's SDE, to make sure we have it running on a VM that actually supports Float16. Otherwise these instructions may not be used, and hence not tested right. @galderz Can you run the relevant tests? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/26457#discussion_r2265119804