On Tue, 12 May 2026 04:15:53 GMT, Jatin Bhateja <[email protected]> wrote:
>> Patch optimizes Float16 to integral conversion operations. Currently, its a >> two step process where by first a Float16 value is >> converted to a single precision floating point value followed by a >> conversion to an integral value. >> >> x86 targets supporting AVX512-FP16 feature (Intel Sapphire Rapids+ and >> upcoming AMD Zen6) provides direct instruction to convert a Float16 value to >> integral value. >> >> Following are the performance numbers of micro benchmark included with the >> patch on Granite Rapids with and without auto-vectorization. >> >> <img width="1125" height="636" alt="image" >> src="https://github.com/user-attachments/assets/ca6e6757-1579-475f-8307-9454c7c025c1" >> /> >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Jatin Bhateja has updated the pull request incrementally with one additional > commit since the last revision: > > Review comments resolution src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4543: > 4541: __ bind(stub.entry()); > 4542: __ subptr(rsp, 8); > 4543: __ movl(Address(rsp), src); The src also could be a higher bank register for APX, which could increase the stub size by another byte. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4568: > 4566: > 4567: // Using the APX extended general purpose registers increases the > instruction encoding size by 1 byte. > 4568: int max_size = 23 + (UseAPX ? 1 : 0); This should be increase by 2 bytes here for APX. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5070: > 5068: } > 5069: > 5070: void > C2_MacroAssembler::vector_cast_float16_to_int_special_cases(XMMRegister dst, > XMMRegister src, XMMRegister xtmp1, This function is very similar to existing vector_cast_float_to_int_special_cases_evex and vector_cast_double_to_int_special_cases_evex. It would be good to combine these with a parameter. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5092: > 5090: } > 5091: > 5092: void > C2_MacroAssembler::vector_cast_float16_to_long_special_cases_evex(XMMRegister > dst, XMMRegister src, XMMRegister xtmp1, This function is very similar to existing vector_cast_float_to_long_special_cases_evex and vector_cast_double_to_long_special_cases_evex. It would be good to combine these with a parameter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/30928#discussion_r3365352480 PR Review Comment: https://git.openjdk.org/jdk/pull/30928#discussion_r3365381872 PR Review Comment: https://git.openjdk.org/jdk/pull/30928#discussion_r3365780560 PR Review Comment: https://git.openjdk.org/jdk/pull/30928#discussion_r3365779714
