On Wed, Aug 4, 2021 at 7:28 PM Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao....@intel.com> wrote: > > > > > > gcc/ChangeLog: > > > > > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode. > > > * config/i386/i386.c (enum x86_64_reg_class): Add > > > X86_64_SSEHF_CLASS. > > > (merge_classes): Handle X86_64_SSEHF_CLASS. > > > (examine_argument): Ditto. > > > (construct_container): Ditto. > > > (classify_argument): Ditto, and set HFmode/HCmode to > > > X86_64_SSEHF_CLASS. > > > (function_value_32): Return _FLoat16/Complex Float16 by > > > %xmm0. > > > (function_value_64): Return _Float16/Complex Float16 by SSE > > > register. > > > (ix86_print_operand): Handle CONST_DOUBLE HFmode. > > > (ix86_secondary_reload): Require gpr as intermediate register > > > to store _Float16 from sse register when sse4 is not > > > available. > > > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under > > > sse2. > > > (ix86_scalar_mode_supported_p): Ditto. > > > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined. > > > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode. > > > (VALID_INT_MODE_P): Add HFmode and HCmode. > > > * config/i386/i386.md (*pushhf_rex64): New define_insn. > > > (*pushhf): Ditto. > > > (*movhf_internal): Ditto. > > > * doc/extend.texi (Half-Precision Floating Point): Documemt > > > _Float16 for x86. > > > * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0) > > > which is used by extract_bit_field but not backends. > > > > [...] > > > > Ping, i'd like to ask for approval for the below codes which is > > related to generic part. > > > > start from .. > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c > > > index ff3b4449b37..775ee397836 100644 > > > --- a/gcc/emit-rtl.c > > > +++ b/gcc/emit-rtl.c > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode > > > imode, > > > fix them all. */ > > > if (omode == word_mode) > > > ; > > > + /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI > > > (reg:HF)) > > > + here. Though extract_bit_field is the culprit here, not the > > > backends. */ > > > + else if (known_gt (regsize, osize) && known_gt (osize, isize) > > > + && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode)) > > > + ; > > > /* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though > > > store_bit_field > > > is the culprit here, and not the backends. */ > > > else if (known_ge (osize, regsize) && known_ge (isize, osize)) > > > > and end here. > > So the main restriction otherwise in place is > > /* Subregs involving floating point modes are not allowed to > change size. Therefore (subreg:DI (reg:DF) 0) is fine, but > (subreg:SI (reg:DF) 0) isn't. */ > else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode)) > { > if (! (known_eq (isize, osize) > /* LRA can use subreg to store a floating point value in > an integer mode. Although the floating point and the > integer modes need the same number of hard registers, > the size of floating point mode can be less than the > integer mode. LRA also uses subregs for a register > should be used in different mode in on insn. */ > || lra_in_progress)) > return false; > > I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF)))
After debug, I find (subreg:SI (reg:HF)) is not really needed, it would be finally handled by below cut ----cut----- /* Find a correspondingly-sized integer field, so we can apply shifts and masks to it. */ scalar_int_mode int_mode; if (!int_mode_for_mode (tmode).exists (&int_mode)) /* If this fails, we should probably push op0 out to memory and then do a load. */ int_mode = int_mode_for_mode (mode).require (); target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize, bitnum, target, unsignedp, reverse); -----end---- and generate things like below cut ---cut---- (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 7 2 (parallel [ (set (reg:HI 86) (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0) (const_int -1 [0xffffffffffffffff]))) (clobber (reg:CC 17 flags)) ]) "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11 -1 (nil)) (insn 7 6 11 2 (set (reg:HF 82 [ <retval> ]) (subreg:HF (reg:HI 86) 0)) "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11 -1 (nil)) (insn 11 7 12 2 (set (reg/i:HF 20 xmm0) (reg:HF 82 [ <retval> ])) "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1 -1 (nil)) ----end--- The real problem is here, when validate_subreg doesn't allow subreg between integer mode and float mode with different sizes. It will hit gcc_assert in gen_lowpart ----cut----- /* Don't use LHS paradoxical subreg if explicit truncation is needed between the mode of the extraction (word_mode) and the target mode. Instead, create a temporary and use convert_move to set the target. */ if (REG_P (target) && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)) { target = gen_lowpart (ext_mode, target); if (partial_subreg_p (GET_MODE (spec_target), ext_mode)) spec_target_subreg = target; } ----end---- So how about changes like below, remove changes in validate_subreg and add some guard in extract_bit_field_using_extv. modified gcc/emit-rtl.c @@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode imode, fix them all. */ if (omode == word_mode) ; - /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF)) - here. Though extract_bit_field is the culprit here, not the backends. */ - else if (known_gt (regsize, osize) && known_gt (osize, isize) - && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode)) - ; /* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though store_bit_field is the culprit here, and not the backends. */ else if (known_ge (osize, regsize) && known_ge (isize, osize)) modified gcc/expmed.c @@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const extraction_insn *extv, rtx op0, between the mode of the extraction (word_mode) and the target mode. Instead, create a temporary and use convert_move to set the target. */ + machine_mode tmode = GET_MODE (target); if (REG_P (target) - && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)) + && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode) + /* When validate_subreg doesn't allow subreg between integer mode + and float mode with different size, It will hit gcc_assert in + gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is + not really needed, codes like below will be finally generated. + (set (reg:SI 1) + (and:SI (reg:DI 2) -1)) + (set (reg:SF 3) + (subreg:SF (reg:SI 1))) */ + && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode) + && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode))) { target = gen_lowpart (ext_mode, target); if (partial_subreg_p (GET_MODE (spec_target), ext_mode)) > to "work around" this restriction. Alternatively one could finally do away > with all the exceptions and simply allow all such subregs giving them > semantics as to intermediate same-size subregs to integer modes > if this definition issue is why we disallow them? > > That is, any float-mode source or destination subreg is interpreted as > wrapping the source operand (if float-mode) in a same size int subreg > and performing the subreg in an integer mode first if the destination > mode is a float mode? > > Also I detest that validate_subreg list things not allowed as opposed > to things allowed. Why are FLOAT_MODE special, but > fractional and accumulating modes not? The subreg documentation > also doesn't talk about cases not allowed. > > Richard. -- BR, Hongtao