On Thu, Aug 5, 2021 at 9:25 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Wed, Aug 4, 2021 at 7:28 PM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao....@intel.com> wrote: > > > > > > > > gcc/ChangeLog: > > > > > > > > * config/i386/i386-modes.def (FLOAT_MODE): Define ieee HFmode. > > > > * config/i386/i386.c (enum x86_64_reg_class): Add > > > > X86_64_SSEHF_CLASS. > > > > (merge_classes): Handle X86_64_SSEHF_CLASS. > > > > (examine_argument): Ditto. > > > > (construct_container): Ditto. > > > > (classify_argument): Ditto, and set HFmode/HCmode to > > > > X86_64_SSEHF_CLASS. > > > > (function_value_32): Return _FLoat16/Complex Float16 by > > > > %xmm0. > > > > (function_value_64): Return _Float16/Complex Float16 by SSE > > > > register. > > > > (ix86_print_operand): Handle CONST_DOUBLE HFmode. > > > > (ix86_secondary_reload): Require gpr as intermediate register > > > > to store _Float16 from sse register when sse4 is not > > > > available. > > > > (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 under > > > > sse2. > > > > (ix86_scalar_mode_supported_p): Ditto. > > > > (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined. > > > > * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode. > > > > (VALID_INT_MODE_P): Add HFmode and HCmode. > > > > * config/i386/i386.md (*pushhf_rex64): New define_insn. > > > > (*pushhf): Ditto. > > > > (*movhf_internal): Ditto. > > > > * doc/extend.texi (Half-Precision Floating Point): Documemt > > > > _Float16 for x86. > > > > * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0) > > > > which is used by extract_bit_field but not backends. > > > > > > [...] > > > > > > Ping, i'd like to ask for approval for the below codes which is > > > related to generic part. > > > > > > start from .. > > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c > > > > index ff3b4449b37..775ee397836 100644 > > > > --- a/gcc/emit-rtl.c > > > > +++ b/gcc/emit-rtl.c > > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, machine_mode > > > > imode, > > > > fix them all. */ > > > > if (omode == word_mode) > > > > ; > > > > + /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI > > > > (reg:HF)) > > > > + here. Though extract_bit_field is the culprit here, not the > > > > backends. */ > > > > + else if (known_gt (regsize, osize) && known_gt (osize, isize) > > > > + && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode)) > > > > + ; > > > > /* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though > > > > store_bit_field > > > > is the culprit here, and not the backends. */ > > > > else if (known_ge (osize, regsize) && known_ge (isize, osize)) > > > > > > and end here. > > > > So the main restriction otherwise in place is > > > > /* Subregs involving floating point modes are not allowed to > > change size. Therefore (subreg:DI (reg:DF) 0) is fine, but > > (subreg:SI (reg:DF) 0) isn't. */ > > else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode)) > > { > > if (! (known_eq (isize, osize) > > /* LRA can use subreg to store a floating point value in > > an integer mode. Although the floating point and the > > integer modes need the same number of hard registers, > > the size of floating point mode can be less than the > > integer mode. LRA also uses subregs for a register > > should be used in different mode in on insn. */ > > || lra_in_progress)) > > return false; > > > > I'm not sure if it would be possible to do (subreg:SI (subreg:HI (reg:HF))) > > After debug, I find (subreg:SI (reg:HF)) is not really needed, it > would be finally handled by below cut > ----cut----- > /* Find a correspondingly-sized integer field, so we can apply > shifts and masks to it. */ > scalar_int_mode int_mode; > if (!int_mode_for_mode (tmode).exists (&int_mode)) > /* If this fails, we should probably push op0 out to memory and then > do a load. */ > int_mode = int_mode_for_mode (mode).require (); > > target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize, > bitnum, target, unsignedp, reverse); > -----end---- > > and generate things like below cut > > ---cut---- > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) > (insn 6 3 7 2 (parallel [ > (set (reg:HI 86) > (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0) > (const_int -1 [0xffffffffffffffff]))) > (clobber (reg:CC 17 flags)) > ]) > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11 > -1 > (nil)) > (insn 7 6 11 2 (set (reg:HF 82 [ <retval> ]) > (subreg:HF (reg:HI 86) 0)) > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11 > -1 > (nil)) > (insn 11 7 12 2 (set (reg/i:HF 20 xmm0) > (reg:HF 82 [ <retval> ])) > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1 > -1 > (nil)) > ----end--- > > The real problem is here, when validate_subreg doesn't allow subreg > between integer mode and float mode with different sizes. It will hit > gcc_assert in gen_lowpart > > ----cut----- > /* Don't use LHS paradoxical subreg if explicit truncation is needed > between the mode of the extraction (word_mode) and the target > mode. Instead, create a temporary and use convert_move to set > the target. */ > if (REG_P (target) > && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)) > { > target = gen_lowpart (ext_mode, target); > if (partial_subreg_p (GET_MODE (spec_target), ext_mode)) > spec_target_subreg = target; > } > ----end---- > > So how about changes like below, remove changes in validate_subreg and > add some guard in extract_bit_field_using_extv. > > modified gcc/emit-rtl.c > @@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode imode, > fix them all. */ > if (omode == word_mode) > ; > - /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI (reg:HF)) > - here. Though extract_bit_field is the culprit here, not the backends. > */ > - else if (known_gt (regsize, osize) && known_gt (osize, isize) > - && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode)) > - ; > /* ??? Similarly, e.g. with (subreg:DF (reg:TI)). Though store_bit_field > is the culprit here, and not the backends. */ > else if (known_ge (osize, regsize) && known_ge (isize, osize)) > modified gcc/expmed.c > @@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const > extraction_insn *extv, rtx op0, > between the mode of the extraction (word_mode) and the target > mode. Instead, create a temporary and use convert_move to set > the target. */ > + machine_mode tmode = GET_MODE (target); > if (REG_P (target) > - && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)) > + && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
doesn't it simply mean that TRULY_NOOP_TRUNCATION_MODES_P may not be true for modes that are not handled by gen_lowpart? But it just wraps the truly_noop_truncation target hook which is fed the modes precision. In fact I wonder why we're using 'extv' to "extract" sth larger from sth smaller at all. Why do we arrive at something for get_best_reg_extraction_insn at all? > + /* When validate_subreg doesn't allow subreg between integer mode > + and float mode with different size, It will hit gcc_assert in > + gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) is > + not really needed, codes like below will be finally generated. > + (set (reg:SI 1) > + (and:SI (reg:DI 2) -1)) > + (set (reg:SF 3) > + (subreg:SF (reg:SI 1))) */ > + && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode) > + && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode))) > { > target = gen_lowpart (ext_mode, target); > if (partial_subreg_p (GET_MODE (spec_target), ext_mode)) > > > > > to "work around" this restriction. Alternatively one could finally do away > > with all the exceptions and simply allow all such subregs giving them > > semantics as to intermediate same-size subregs to integer modes > > if this definition issue is why we disallow them? > > > > That is, any float-mode source or destination subreg is interpreted as > > wrapping the source operand (if float-mode) in a same size int subreg > > and performing the subreg in an integer mode first if the destination > > mode is a float mode? > > > > Also I detest that validate_subreg list things not allowed as opposed > > to things allowed. Why are FLOAT_MODE special, but > > fractional and accumulating modes not? The subreg documentation > > also doesn't talk about cases not allowed. > > > > Richard. > > > > -- > BR, > Hongtao