On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
> Hi All, > > This patch adds C2 compiler support for various Float16 operations added by > [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or > pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred > through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and > their corresponding entry points are defined in the newly added Float16Math > class. > - These intrinsics receive unwrapped short arguments encoding IEEE > 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, > and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please > refer to [FAQs > ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more > details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are > always loaded into general purpose register, but FP16 ISA instructions > generally operate over floating point registers, therefore compiler injectes > reinterpretation IR before and after Float16 operation nodes to move short > value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. > HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/cpu/x86/assembler_x86.cpp line 3481: > 3479: void Assembler::vmovw(XMMRegister dst, Register src) { > 3480: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 3481: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 3483: > 3481: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); > 3482: attributes.set_is_evex_instruction(); > 3483: int encode = vex_prefix_and_encode(dst->encoding(), 0, > src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes); I think we need to change this to: int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true); Please note the last argument for APX encoding when src is in higher register bank. src/hotspot/cpu/x86/assembler_x86.cpp line 3489: > 3487: void Assembler::vmovw(Register dst, XMMRegister src) { > 3488: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 3489: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 3491: > 3489: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); > 3490: attributes.set_is_evex_instruction(); > 3491: int encode = vex_prefix_and_encode(src->encoding(), 0, > dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes); I think we need to change this to: int encode = vex_prefix_and_encode(src->encoding(), 0, dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true); Please note the last argument for APX encoding when dst is in higher register bank. src/hotspot/cpu/x86/assembler_x86.cpp line 8464: > 8462: void Assembler::evaddph(XMMRegister dst, XMMRegister nds, XMMRegister > src, int vector_len) { > 8463: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8464: InstructionAttr attributes(vector_len, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 8483: > 8481: void Assembler::evsubph(XMMRegister dst, XMMRegister nds, XMMRegister > src, int vector_len) { > 8482: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8483: InstructionAttr attributes(vector_len, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8502: > 8500: void Assembler::evmulph(XMMRegister dst, XMMRegister nds, XMMRegister > src, int vector_len) { > 8501: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8502: InstructionAttr attributes(vector_len, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8521: > 8519: void Assembler::evminph(XMMRegister dst, XMMRegister nds, XMMRegister > src, int vector_len) { > 8520: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8521: InstructionAttr attributes(vector_len, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8540: > 8538: void Assembler::evmaxph(XMMRegister dst, XMMRegister nds, XMMRegister > src, int vector_len) { > 8539: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8540: InstructionAttr attributes(vector_len, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8559: > 8557: void Assembler::evdivph(XMMRegister dst, XMMRegister nds, XMMRegister > src, int vector_len) { > 8558: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8559: InstructionAttr attributes(vector_len, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8576: > 8574: } > 8575: > 8576: void Assembler::evsqrtph(XMMRegister dst, XMMRegister src1, int > vector_len) { A nitpick src1 could be src :). src/hotspot/cpu/x86/assembler_x86.cpp line 8614: > 8612: } > 8613: > 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { This should be vaddsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8616: > 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { > 8615: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8616: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8622: > 8620: } > 8621: > 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { This should be vsubsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8624: > 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { > 8623: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8624: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8630: > 8628: } > 8629: > 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { This should be vdivsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8632: > 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { > 8631: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8632: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8638: > 8636: } > 8637: > 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { This should be vmulsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8640: > 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { > 8639: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8640: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8646: > 8644: } > 8645: > 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { This should be vmaxsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8648: > 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { > 8647: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8648: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8654: > 8652: } > 8653: > 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { This should be vminsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8656: > 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister > src) { > 8655: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8656: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8662: > 8660: } > 8661: > 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) { This should be vsqrtsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8664: > 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) { > 8663: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8664: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ > false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3974: > 3972: generate_libm_stubs(); > 3973: > 3974: StubRoutines::_fmod = generate_libmFmod(); // from > stubGenerator_x86_64_fmod.cpp Good to retain the is_intrinsic_available checks. src/hotspot/cpu/x86/x86.ad line 4518: > 4516: #ifdef _LP64 > 4517: instruct ReplS_imm(vec dst, immH con, rRegI rtmp) %{ > 4518: predicate(VM_Version::supports_avx512_fp16() && > Matcher::vector_element_basic_type(n) == T_SHORT); I have a question about the predicate for ReplS_imm. What happens if the predicate is false? There doesn't seem to be any other instruct rule to cover that situation. Also I don't see any check in match rule supported on Replicate node. src/hotspot/cpu/x86/x86.ad line 10895: > 10893: format %{ "esqrtsh $dst, $src" %} > 10894: ins_encode %{ > 10895: int opcode = this->ideal_Opcode(); opcode is unused. src/hotspot/cpu/x86/x86.ad line 10936: > 10934: ins_encode %{ > 10935: int vlen_enc = vector_length_encoding(this); > 10936: int opcode = this->ideal_Opcode(); opcode unused later. src/hotspot/cpu/x86/x86.ad line 10949: > 10947: ins_encode %{ > 10948: int vlen_enc = vector_length_encoding(this); > 10949: int opcode = this->ideal_Opcode(); opcode unused later. src/hotspot/cpu/x86/x86.ad line 10964: > 10962: match(Set dst (SubVHF src1 src2)); > 10963: format %{ "evbinopfp16_reg $dst, $src1, $src2" %} > 10964: ins_cost(450); Why ins_cost 450 here for reg version and 150 for mem version of binOps? Whereas sqrt above has 150 cost for both reg and mem version. Good to be consistent. src/hotspot/cpu/x86/x86.ad line 11012: > 11010: effect(DEF dst); > 11011: format %{ "evfmaph_reg $dst, $src1, $src2\t# $dst = $dst * $src1 + > $src2 fma packedH" %} > 11012: ins_cost(450); Good to be consistent with ins_cost for reg vs mem version. src/hotspot/cpu/x86/x86.ad line 11015: > 11013: ins_encode %{ > 11014: int vlen_enc = vector_length_encoding(this); > 11015: __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, > $src1$$XMMRegister, vlen_enc); Wondering if for auto vectorization the natural fma form is dst = dst + src1 * src2 i.e. match(Set dst (FmaVHF dst (Binary src1 src2))); which then leads to fmadd231. src/hotspot/share/adlc/output_h.cpp line 1298: > 1296: case Form::idealD: type = "Type::DOUBLE"; break; > 1297: case Form::idealL: type = "TypeLong::LONG"; break; > 1298: case Form::idealH: type = "Type::HALF_LONG"; break; This should be Type::HALF_FLOAT src/hotspot/share/classfile/vmSymbols.hpp line 143: > 141: template(java_util_DualPivotQuicksort, > "java/util/DualPivotQuicksort") \ > 142: template(jdk_internal_misc_Signal, > "jdk/internal/misc/Signal") \ > 143: template(jdk_internal_math_Float16Math, > "jdk/internal/math/Float16Math") \ This seems to be leftover template. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870304 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843899813 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870852 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843902337 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843871328 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843906656 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843908957 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843910609 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843912897 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843914392 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843916999 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922125 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922490 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843923239 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843924299 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925126 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925319 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926551 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926789 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928252 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928447 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929519 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929686 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843930969 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843931641 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847403451 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847400518 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844234786 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844237825 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844238487 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844244532 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847443990 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847448109 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847470619 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847475384