Re: RFR: 8382523: Optimize Float16 to integral conversion operations for AVX512-FP16 targets [v2]

Sandhya Viswanathan Fri, 05 Jun 2026 16:30:56 -0700

On Tue, 12 May 2026 04:15:53 GMT, Jatin Bhateja <[email protected]> wrote:


>> Patch optimizes Float16 to integral conversion operations. Currently, its a 
>> two step process where by first a Float16 value is
>> converted to a single precision floating point value followed by a 
>> conversion to an integral value.
>> 
>> x86 targets supporting AVX512-FP16 feature (Intel Sapphire Rapids+ and 
>> upcoming AMD Zen6) provides direct instruction to convert a Float16 value to 
>> integral value.
>> 
>> Following are the performance numbers of micro benchmark included with the 
>> patch on Granite Rapids with and without auto-vectorization.
>> 
>> <img width="1125" height="636" alt="image" 
>> src="https://github.com/user-attachments/assets/ca6e6757-1579-475f-8307-9454c7c025c1";
>>  />
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Jatin Bhateja has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Review comments resolution

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4543:

> 4541:   __ bind(stub.entry());
> 4542:   __ subptr(rsp, 8);
> 4543:   __ movl(Address(rsp), src);

The src also could be a higher bank register for APX, which could increase the 
stub size by another byte.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4568:

> 4566: 
> 4567:   // Using the APX extended general purpose registers increases the 
> instruction encoding size by 1 byte.
> 4568:   int max_size = 23 + (UseAPX ? 1 : 0);

This should be increase by 2 bytes here for APX.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5070:

> 5068: }
> 5069: 
> 5070: void 
> C2_MacroAssembler::vector_cast_float16_to_int_special_cases(XMMRegister dst, 
> XMMRegister src, XMMRegister xtmp1,

This function is very similar to existing 
vector_cast_float_to_int_special_cases_evex and 
vector_cast_double_to_int_special_cases_evex. It would be good to combine these 
with a parameter.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5092:

> 5090: }
> 5091: 
> 5092: void 
> C2_MacroAssembler::vector_cast_float16_to_long_special_cases_evex(XMMRegister 
> dst, XMMRegister src, XMMRegister xtmp1,

This function is very similar to existing 
vector_cast_float_to_long_special_cases_evex and 
vector_cast_double_to_long_special_cases_evex. It would be good to combine 
these with a parameter.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/30928#discussion_r3365352480
PR Review Comment: https://git.openjdk.org/jdk/pull/30928#discussion_r3365381872
PR Review Comment: https://git.openjdk.org/jdk/pull/30928#discussion_r3365780560
PR Review Comment: https://git.openjdk.org/jdk/pull/30928#discussion_r3365779714

Re: RFR: 8382523: Optimize Float16 to integral conversion operations for AVX512-FP16 targets [v2]

Reply via email to