Re: RFR: 8342103: C2 compiler support for Float16 type and associated operations

Sandhya Viswanathan Tue, 19 Nov 2024 18:51:41 -0800

On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:


> Hi All,
> 
> This patch adds C2 compiler support for various Float16 operations added by 
> [PR#22128](https://github.com/openjdk/jdk/pull/22128)
> 
> Following is the summary of changes included with this patch:-
> 
> 1. Detection of various Float16 operations through inline expansion or 
> pattern folding idealizations.
> 2. Float16 operations like add, sub, mul, div, max, and min are inferred 
> through pattern folding idealization.
> 3. Float16 SQRT and FMA operation are inferred through inline expansion and 
> their corresponding entry points are defined in the newly added Float16Math 
> class.
>       -    These intrinsics receive unwrapped short arguments encoding IEEE 
> 754 binary16 values.
> 5. New specialized IR nodes for Float16 operations, associated idealizations, 
> and constant folding routines.
> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please 
> refer to [FAQs 
> ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more 
> details.
> 7. Since Float16 uses short as its storage type, hence raw FP16 values are 
> always loaded into general purpose register, but FP16 ISA instructions 
> generally operate over floating point registers, therefore compiler injectes 
> reinterpretation IR before and after Float16 operation nodes to move short 
> value to floating point register and vice versa.
> 8. New idealization routines to optimize redundant reinterpretation chains. 
> HF2S + S2HF = HF
> 6. Auto-vectorization of newly supported scalar operations.
> 7. X86 and AARCH64 backend implementation for all supported intrinsics.
> 9. Functional and Performance validation tests.
> 
> **Missing Pieces:-**
> **-  AARCH64 Backend.**
> 
> Kindly review and share your feedback.
> 
> Best Regards,
> Jatin

src/hotspot/cpu/x86/assembler_x86.cpp line 3481:

> 3479: void Assembler::vmovw(XMMRegister dst, Register src) {
> 3480:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 3481:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false".

src/hotspot/cpu/x86/assembler_x86.cpp line 3483:

> 3481:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);
> 3482:   attributes.set_is_evex_instruction();
> 3483:   int encode = vex_prefix_and_encode(dst->encoding(), 0, 
> src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes);

I think we need to change this to:
int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), 
VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true);
Please note the last argument for APX encoding when src is in higher register 
bank.

src/hotspot/cpu/x86/assembler_x86.cpp line 3489:

> 3487: void Assembler::vmovw(Register dst, XMMRegister src) {
> 3488:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 3489:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false".

src/hotspot/cpu/x86/assembler_x86.cpp line 3491:

> 3489:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);
> 3490:   attributes.set_is_evex_instruction();
> 3491:   int encode = vex_prefix_and_encode(src->encoding(), 0, 
> dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes);

I think we need to change this to:
int encode = vex_prefix_and_encode(src->encoding(), 0, dst->encoding(), 
VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true);
Please note the last argument for APX encoding when dst is in higher register 
bank.

src/hotspot/cpu/x86/assembler_x86.cpp line 8464:

> 8462: void Assembler::evaddph(XMMRegister dst, XMMRegister nds, XMMRegister 
> src, int vector_len) {
> 8463:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8464:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false".

src/hotspot/cpu/x86/assembler_x86.cpp line 8483:

> 8481: void Assembler::evsubph(XMMRegister dst, XMMRegister nds, XMMRegister 
> src, int vector_len) {
> 8482:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8483:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8502:

> 8500: void Assembler::evmulph(XMMRegister dst, XMMRegister nds, XMMRegister 
> src, int vector_len) {
> 8501:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8502:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8521:

> 8519: void Assembler::evminph(XMMRegister dst, XMMRegister nds, XMMRegister 
> src, int vector_len) {
> 8520:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8521:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8540:

> 8538: void Assembler::evmaxph(XMMRegister dst, XMMRegister nds, XMMRegister 
> src, int vector_len) {
> 8539:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8540:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8559:

> 8557: void Assembler::evdivph(XMMRegister dst, XMMRegister nds, XMMRegister 
> src, int vector_len) {
> 8558:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8559:   InstructionAttr attributes(vector_len, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ true);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8576:

> 8574: }
> 8575: 
> 8576: void Assembler::evsqrtph(XMMRegister dst, XMMRegister src1, int 
> vector_len) {

A nitpick src1 could be src :).

src/hotspot/cpu/x86/assembler_x86.cpp line 8614:

> 8612: }
> 8613: 
> 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {

This should be vaddsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8616:

> 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {
> 8615:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8616:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8622:

> 8620: }
> 8621: 
> 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {

This should be vsubsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8624:

> 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {
> 8623:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8624:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8630:

> 8628: }
> 8629: 
> 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {

This should be vdivsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8632:

> 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {
> 8631:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8632:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8638:

> 8636: }
> 8637: 
> 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {

This should be vmulsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8640:

> 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {
> 8639:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8640:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8646:

> 8644: }
> 8645: 
> 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {

This should be vmaxsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8648:

> 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {
> 8647:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8648:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8654:

> 8652: }
> 8653: 
> 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {

This should be vminsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8656:

> 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister 
> src) {
> 8655:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8656:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/assembler_x86.cpp line 8662:

> 8660: }
> 8661: 
> 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) {

This should be vsqrtsh.

src/hotspot/cpu/x86/assembler_x86.cpp line 8664:

> 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) {
> 8663:   assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16");
> 8664:   InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ 
> false, /* no_mask_reg */ true, /* uses_vl */ false);

It will be good to have the second argument with comment as "/* vex_w */ false"

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3974:

> 3972:   generate_libm_stubs();
> 3973: 
> 3974:   StubRoutines::_fmod = generate_libmFmod(); // from 
> stubGenerator_x86_64_fmod.cpp

Good to retain the is_intrinsic_available checks.

src/hotspot/cpu/x86/x86.ad line 4518:

> 4516: #ifdef _LP64
> 4517: instruct ReplS_imm(vec dst, immH con, rRegI rtmp) %{
> 4518:   predicate(VM_Version::supports_avx512_fp16() && 
> Matcher::vector_element_basic_type(n) == T_SHORT);

I have a question about the predicate for ReplS_imm. What happens if the 
predicate is false? There doesn't seem to be any other instruct rule to cover 
that situation. Also I don't see any check in match rule supported on Replicate 
node.

src/hotspot/cpu/x86/x86.ad line 10895:

> 10893:   format %{ "esqrtsh $dst, $src" %}
> 10894:   ins_encode %{
> 10895:     int opcode = this->ideal_Opcode();

opcode is unused.

src/hotspot/cpu/x86/x86.ad line 10936:

> 10934:   ins_encode %{
> 10935:     int vlen_enc = vector_length_encoding(this);
> 10936:     int opcode = this->ideal_Opcode();

opcode unused later.

src/hotspot/cpu/x86/x86.ad line 10949:

> 10947:   ins_encode %{
> 10948:     int vlen_enc = vector_length_encoding(this);
> 10949:     int opcode = this->ideal_Opcode();

opcode unused later.

src/hotspot/cpu/x86/x86.ad line 10964:

> 10962:   match(Set dst (SubVHF src1 src2));
> 10963:   format %{ "evbinopfp16_reg $dst, $src1, $src2" %}
> 10964:   ins_cost(450);

Why ins_cost 450 here for reg version and 150 for mem version of binOps?  
Whereas sqrt above has 150 cost for both reg and mem version. Good to be 
consistent.

src/hotspot/cpu/x86/x86.ad line 11012:

> 11010:   effect(DEF dst);
> 11011:   format %{ "evfmaph_reg $dst, $src1, $src2\t# $dst = $dst * $src1 + 
> $src2 fma packedH" %}
> 11012:   ins_cost(450);

Good to be consistent with ins_cost for reg vs mem version.

src/hotspot/cpu/x86/x86.ad line 11015:

> 11013:   ins_encode %{
> 11014:     int vlen_enc = vector_length_encoding(this);
> 11015:     __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, 
> $src1$$XMMRegister, vlen_enc);

Wondering if for auto vectorization the natural fma form is dst = dst + src1 * 
src2 i.e.
     match(Set dst (FmaVHF dst (Binary src1 src2)));
which then leads to fmadd231.

src/hotspot/share/adlc/output_h.cpp line 1298:

> 1296:         case Form::idealD: type = "Type::DOUBLE";   break;
> 1297:         case Form::idealL: type = "TypeLong::LONG"; break;
> 1298:         case Form::idealH: type = "Type::HALF_LONG"; break;

This should be Type::HALF_FLOAT

src/hotspot/share/classfile/vmSymbols.hpp line 143:

> 141:   template(java_util_DualPivotQuicksort,              
> "java/util/DualPivotQuicksort")             \
> 142:   template(jdk_internal_misc_Signal,                  
> "jdk/internal/misc/Signal")                 \
> 143:   template(jdk_internal_math_Float16Math,             
> "jdk/internal/math/Float16Math")            \

This seems to be leftover template.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870304
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843899813
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870852
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843902337
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843871328
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843906656
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843908957
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843910609
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843912897
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843914392
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843916999
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922125
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922490
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843923239
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843924299
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925126
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925319
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926551
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926789
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928252
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928447
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929519
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929686
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843930969
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843931641
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847403451
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847400518
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844234786
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844237825
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844238487
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844244532
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847443990
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847448109
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847470619
PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847475384

Re: RFR: 8342103: C2 compiler support for Float16 type and associated operations

Reply via email to