Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

Richard Biener via Gcc-patches Thu, 05 Aug 2021 03:16:25 -0700

On Thu, Aug 5, 2021 at 11:43 AM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Thu, Aug 5, 2021 at 5:24 PM Richard Biener
> <richard.guent...@gmail.com> wrote:
> >
> > On Thu, Aug 5, 2021 at 9:25 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > >
> > > On Wed, Aug 4, 2021 at 7:28 PM Richard Biener
> > > <richard.guent...@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 4, 2021 at 4:39 AM Hongtao Liu <crazy...@gmail.com> wrote:
> > > > >
> > > > > On Mon, Aug 2, 2021 at 2:31 PM liuhongt <hongtao....@intel.com> wrote:
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * config/i386/i386-modes.def (FLOAT_MODE): Define ieee 
> > > > > > HFmode.
> > > > > >         * config/i386/i386.c (enum x86_64_reg_class): Add
> > > > > >         X86_64_SSEHF_CLASS.
> > > > > >         (merge_classes): Handle X86_64_SSEHF_CLASS.
> > > > > >         (examine_argument): Ditto.
> > > > > >         (construct_container): Ditto.
> > > > > >         (classify_argument): Ditto, and set HFmode/HCmode to
> > > > > >         X86_64_SSEHF_CLASS.
> > > > > >         (function_value_32): Return _FLoat16/Complex Float16 by
> > > > > >         %xmm0.
> > > > > >         (function_value_64): Return _Float16/Complex Float16 by SSE
> > > > > >         register.
> > > > > >         (ix86_print_operand): Handle CONST_DOUBLE HFmode.
> > > > > >         (ix86_secondary_reload): Require gpr as intermediate 
> > > > > > register
> > > > > >         to store _Float16 from sse register when sse4 is not
> > > > > >         available.
> > > > > >         (ix86_libgcc_floating_mode_supported_p): Enable _FLoat16 
> > > > > > under
> > > > > >         sse2.
> > > > > >         (ix86_scalar_mode_supported_p): Ditto.
> > > > > >         (TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P): Defined.
> > > > > >         * config/i386/i386.h (VALID_SSE2_REG_MODE): Add HFmode.
> > > > > >         (VALID_INT_MODE_P): Add HFmode and HCmode.
> > > > > >         * config/i386/i386.md (*pushhf_rex64): New define_insn.
> > > > > >         (*pushhf): Ditto.
> > > > > >         (*movhf_internal): Ditto.
> > > > > >         * doc/extend.texi (Half-Precision Floating Point): Documemt
> > > > > >         _Float16 for x86.
> > > > > >         * emit-rtl.c (validate_subreg): Allow (subreg:SI (reg:HF) 0)
> > > > > >         which is used by extract_bit_field but not backends.
> > > > > >
> > > > [...]
> > > > >
> > > > > Ping, i'd like to ask for approval for the below codes which is
> > > > > related to generic part.
> > > > >
> > > > > start from ..
> > > > > > diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
> > > > > > index ff3b4449b37..775ee397836 100644
> > > > > > --- a/gcc/emit-rtl.c
> > > > > > +++ b/gcc/emit-rtl.c
> > > > > > @@ -928,6 +928,11 @@ validate_subreg (machine_mode omode, 
> > > > > > machine_mode imode,
> > > > > >       fix them all.  */
> > > > > >    if (omode == word_mode)
> > > > > >      ;
> > > > > > +  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI 
> > > > > > (reg:HF))
> > > > > > +     here. Though extract_bit_field is the culprit here, not the 
> > > > > > backends.  */
> > > > > > +  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > > > > +          && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > > > > +    ;
> > > > > >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > > > > store_bit_field
> > > > > >       is the culprit here, and not the backends.  */
> > > > > >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > > >
> > > > > and end here.
> > > >
> > > > So the main restriction otherwise in place is
> > > >
> > > >   /* Subregs involving floating point modes are not allowed to
> > > >      change size.  Therefore (subreg:DI (reg:DF) 0) is fine, but
> > > >      (subreg:SI (reg:DF) 0) isn't.  */
> > > >   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
> > > >     {
> > > >       if (! (known_eq (isize, osize)
> > > >              /* LRA can use subreg to store a floating point value in
> > > >                 an integer mode.  Although the floating point and the
> > > >                 integer modes need the same number of hard registers,
> > > >                 the size of floating point mode can be less than the
> > > >                 integer mode.  LRA also uses subregs for a register
> > > >                 should be used in different mode in on insn.  */
> > > >              || lra_in_progress))
> > > >         return false;
> > > >
> > > > I'm not sure if it would be possible to do (subreg:SI (subreg:HI 
> > > > (reg:HF)))
> > >
> > > After debug, I find (subreg:SI (reg:HF)) is not really needed, it
> > > would be finally handled by below cut
> > > ----cut-----
> > >   /* Find a correspondingly-sized integer field, so we can apply
> > >      shifts and masks to it.  */
> > >   scalar_int_mode int_mode;
> > >   if (!int_mode_for_mode (tmode).exists (&int_mode))
> > >     /* If this fails, we should probably push op0 out to memory and then
> > >        do a load.  */
> > >     int_mode = int_mode_for_mode (mode).require ();
> > >
> > >   target = extract_fixed_bit_field (int_mode, op0, op0_mode, bitsize,
> > >     bitnum, target, unsignedp, reverse);
> > > -----end----
> > >
> > > and generate things like below cut
> > >
> > > ---cut----
> > > (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
> > > (insn 6 3 7 2 (parallel [
> > >             (set (reg:HI 86)
> > >                 (and:HI (subreg:HI (reg/v:SI 83 [ a ]) 0)
> > >                     (const_int -1 [0xffffffffffffffff])))
> > >             (clobber (reg:CC 17 flags))
> > >         ]) 
> > > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> > > -1
> > >      (nil))
> > > (insn 7 6 11 2 (set (reg:HF 82 [ <retval> ])
> > >         (subreg:HF (reg:HI 86) 0))
> > > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
> > > -1
> > >      (nil))
> > > (insn 11 7 12 2 (set (reg/i:HF 20 xmm0)
> > >         (reg:HF 82 [ <retval> ]))
> > > "../../gcc/x86-gcc/independentfp16/gcc/testsuite/gcc.target/i386/float16-5.c":12:1
> > > -1
> > >      (nil))
> > > ----end---
> > >
> > > The real problem is here, when validate_subreg doesn't allow subreg
> > > between integer mode and float mode with different sizes. It will hit
> > > gcc_assert in gen_lowpart
> > >
> > > ----cut-----
> > >       /* Don't use LHS paradoxical subreg if explicit truncation is needed
> > > between the mode of the extraction (word_mode) and the target
> > > mode.  Instead, create a temporary and use convert_move to set
> > > the target.  */
> > >       if (REG_P (target)
> > >   && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> > > {
> > >   target = gen_lowpart (ext_mode, target);
> > >   if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
> > >     spec_target_subreg = target;
> > > }
> > > ----end----
> > >
> > > So how about changes like below, remove changes in validate_subreg and
> > > add some guard in extract_bit_field_using_extv.
> > >
> > > modified   gcc/emit-rtl.c
> > > @@ -928,11 +928,6 @@ validate_subreg (machine_mode omode, machine_mode 
> > > imode,
> > >       fix them all.  */
> > >    if (omode == word_mode)
> > >      ;
> > > -  /* ???Similarly like (subreg:DI (reg:SF), also allow (subreg:SI 
> > > (reg:HF))
> > > -     here. Though extract_bit_field is the culprit here, not the 
> > > backends.  */
> > > -  else if (known_gt (regsize, osize) && known_gt (osize, isize)
> > > -           && FLOAT_MODE_P (imode) && INTEGRAL_MODE_P (omode))
> > > -    ;
> > >    /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though 
> > > store_bit_field
> > >       is the culprit here, and not the backends.  */
> > >    else if (known_ge (osize, regsize) && known_ge (isize, osize))
> > > modified   gcc/expmed.c
> > > @@ -1572,8 +1572,19 @@ extract_bit_field_using_extv (const
> > > extraction_insn *extv, rtx op0,
> > >           between the mode of the extraction (word_mode) and the target
> > >           mode.  Instead, create a temporary and use convert_move to set
> > >           the target.  */
> > > +      machine_mode tmode = GET_MODE (target);
> > >        if (REG_P (target)
> > > -          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode))
> > > +          && TRULY_NOOP_TRUNCATION_MODES_P (GET_MODE (target), ext_mode)
> >
> > doesn't it simply mean that TRULY_NOOP_TRUNCATION_MODES_P may not be
> > true for modes that are not handled by gen_lowpart?  But it just wraps the
> > truly_noop_truncation target hook which is fed the modes precision.  In fact
> > I wonder why we're using 'extv' to "extract" sth larger from sth smaller at 
> > all.
> No, target is SFmode, and ext_mode is DImode, so it extracts sth
> smaller from larger.
> > Why do we arrive at something for get_best_reg_extraction_insn at all?
> >
> > > +          /* When validate_subreg doesn't allow subreg between integer 
> > > mode
> > > +             and float mode with different size, It will hit gcc_assert 
> > > in
> > > +             gen_lowpart_general. Also subreg like (subreg:DI (reg:SF)) 
> > > is
> > > +             not really needed, codes like below will be finally 
> > > generated.
> > > +             (set (reg:SI 1)
> > > +                  (and:SI (reg:DI 2) -1))
> > > +             (set (reg:SF 3)
> > > +                  (subreg:SF (reg:SI 1)))  */
> > > +          && FLOAT_MODE_P (tmode) && INTEGRAL_MODE_P (mode)
> > > +          && maybe_ne (GET_MODE_SIZE (tmode), GET_MODE_SIZE (mode)))
> > >          {
> > >            target = gen_lowpart (ext_mode, target);
> > >            if (partial_subreg_p (GET_MODE (spec_target), ext_mode))
> And somehow it tries to use paradoxical_subreg here, that's why we get
> something like (subreg:SI (reg:HF).


OK, I think sth is amiss here upthread.  insv/extv do look like they
are designed
to work on integer modes (but docs do not say anything about this here).
In fact the caller of extract_bit_field_using_extv is named
extract_integral_bit_field.  Of course nothing seems to check what kind of
modes we're dealing with, but we're for example happily doing
expand_shift in 'mode'.  In the extract_integral_bit_field call 'mode' is
some integer mode and op0 is HFmode?  From the above I get it's
the other way around?  In that case we should wrap the
call to extract_integral_bit_field, extracting in an integer mode with the
same size as 'mode' and then converting the result as (subreg:HF (reg:HI ...)).

> > >
> > >
> > >
> > > > to "work around" this restriction.  Alternatively one could finally do 
> > > > away
> > > > with all the exceptions and simply allow all such subregs giving them
> > > > semantics as to intermediate same-size subregs to integer modes
> > > > if this definition issue is why we disallow them?
> > > >
> > > > That is, any float-mode source or destination subreg is interpreted as
> > > > wrapping the source operand (if float-mode) in a same size int subreg
> > > > and performing the subreg in an integer mode first if the destination
> > > > mode is a float mode?
> > > >
> > > > Also I detest that validate_subreg list things not allowed as opposed
> > > > to things allowed.  Why are FLOAT_MODE special, but
> > > > fractional and accumulating modes not?  The subreg documentation
> > > > also doesn't talk about cases not allowed.
> > > >
> > > > Richard.
> > >
> > >
> > >
> > > --
> > > BR,
> > > Hongtao
>
>
>
> --
> BR,
> Hongtao

Re: [PATCH 2/6] [i386] Enable _Float16 type for TARGET_SSE2 and above.

Reply via email to