RE: [Patch 2/3][Aarch64] Add support for IEEE-conformant versions of scalar fmin* and fmax*

David Sherwood Tue, 01 Dec 2015 08:35:23 -0800

Hi,

Thanks for the comments James, I've moved the patterns around
and added new comments to them. Hope this is ok.


Regards,
David Sherwood.

ChangeLog:

2015-12-01  David Sherwood  <david.sherw...@arm.com>

    gcc/
        * config/aarch64/aarch64.md: New pattern.
        * config/aarch64/aarch64-simd.md: Likewise.
        * config/aarch64/iterators.md: New unspecs, iterators.
    gcc/testsuite
        * gcc.target/aarch64/fmaxmin.c: New test.

> -----Original Message-----
> From: James Greenhalgh [mailto:james.greenha...@arm.com]
> Sent: 26 November 2015 16:53
> To: David Sherwood
> Cc: GCC Patches
> Subject: Re: [Patch 2/3][Aarch64] Add support for IEEE-conformant versions of 
> scalar fmin* and
fmax*
> 
> On Thu, Nov 26, 2015 at 04:20:35PM -0000, David Sherwood wrote:
> > Hi,
> >
> > Here is the second patch of the fmin/fmax change, which adds the optabs
> > to the aarch64 backend.
> >
> > Tested:
> >
> > x86_64-linux: no regressions
> > aarch64-none-elf: no regressions
> >
> > Good to go?
> > David Sherwood.
> 
> Could you also update the comment a few lines above the pattern you add
> in aarch64-simd.md? Unless I've misunderstood the point of this patch set,
> that looks to be out of date now:
> 
>   ;; FP Max/Min
>   ;; Max/Min are introduced by idiom recognition by GCC's mid-end.  An
>   ;; expression like:
>   ;;      a = (b < c) ? b : c;
>   ;; is idiom-matched as MIN_EXPR<b,c> only if -ffinite-math-only is enabled
>   ;; either explicitly or indirectly via -ffast-math.
>   ;;
>   ;; MIN_EXPR and MAX_EXPR eventually map to 'smin' and 'smax' in RTL.
>   ;; The 'smax' and 'smin' RTL standard pattern names do not specify which
>   ;; operand will be returned when both operands are zero (i.e. they may not
>   ;; honour signed zeroes), or when either operand is NaN.  Therefore GCC
>   ;; only introduces MIN_EXPR/MAX_EXPR in fast math mode or when not honouring
>   ;; NaNs.
> 
> Either that, or reorder the patterns you add so the existing patterns that
> this comment pertains to are kept close to it, and add a new comment for
> your new pattern - explaining that it is the auto-vectorized form of the
> IEEE-754 fmax/fmin functions.
> 
> Thanks,
> James
> 
> 
> 
> >
> > ChangeLog:
> >
> > 2015-11-26  David Sherwood  <david.sherw...@arm.com>
> >
> >     gcc/
> >     * config/aarch64/aarch64.md: New pattern.
> >     * config/aarch64/aarch64-simd.md: Likewise.
> >     * config/aarch64/iterators.md: New unspecs, iterators.
> >     gcc/testsuite
> >     * gcc.target/aarch64/fmaxmin.c: New test.
> >
> > > -----Original Message-----
> > > From: Richard Biener [mailto:richard.guent...@gmail.com]
> > > Sent: 25 November 2015 12:39
> > > To: David Sherwood
> > > Cc: GCC Patches; Richard Sandiford
> > > Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of 
> > > scalar fmin* and fmax*
> > >
> > > On Mon, Nov 23, 2015 at 10:21 AM, David Sherwood <david.sherw...@arm.com> 
> > > wrote:
> > > > Hi,
> > > >
> > > > This is part 1 of a reworked version of a patch I originally submitted 
> > > > in
> > > > August, rebased after Richard Sandiford's recent work on the internal
> > > > functions. This first patch adds the internal function definitions and 
> > > > optabs
> > > > that provide support for IEEE fmax()/fmin() functions.
> > > >
> > > > Later patches will add the appropriate aarch64/aarch32 vector 
> > > > instructions.
> > >
> > > Ok.
> > >
> > > Thanks,
> > > Richard.
> > >
> > > > Tested:
> > > >
> > > > x86_64-linux: no regressions
> > > > aarch64-none-elf: no regressions
> > > > arm-none-eabi: no regressions
> > > >
> > > > Regards,
> > > > David Sherwood.
> > > >
> > > > ChangeLog:
> > > >
> > > > 2015-11-19  David Sherwood  <david.sherw...@arm.com>
> > > >
> > > >     gcc/
> > > >         * optabs.def: Add new optabs fmax_optab/fmin_optab.
> > > >         * internal-fn.def: Add new fmax/fmin internal functions.
> > > >         * config/aarch64/aarch64.md: New pattern.
> > > >         * config/aarch64/aarch64-simd.md: Likewise.
> > > >         * config/aarch64/iterators.md: New unspecs, iterators.
> > > >         * config/arm/iterators.md: New iterators.
> > > >         * config/arm/unspecs.md: New unspecs.
> > > >         * config/arm/neon.md: New pattern.
> > > >         * config/arm/vfp.md: Likewise.
> > > >         * doc/md.texi: Add fmin and fmax patterns.
> > > >     gcc/testsuite
> > > >         * gcc.target/aarch64/fmaxmin.c: New test.
> > > >         * gcc.target/arm/fmaxmin.c: New test.
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> > > >> Sent: 19 August 2015 13:35
> > > >> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford
> > > >> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of 
> > > >> scalar fmin* and
fmax*
> > > >>
> > > >> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford
> > > >> <richard.sandif...@arm.com> wrote:
> > > >> > Richard Biener <richard.guent...@gmail.com> writes:
> > > >> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford
> > > >> >> <richard.sandif...@arm.com> wrote:
> > > >> >>> Richard Biener <richard.guent...@gmail.com> writes:
> > > >> >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford
> > > >> >>>> <richard.sandif...@arm.com> wrote:
> > > >> >>>>> Richard Biener <richard.guent...@gmail.com> writes:
> > > >> >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood
> > > >> >>>>>> <david.sherw...@arm.com> wrote:
> > > >> >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood
> > > >> >>>>>>>> <david.sherw...@arm.com> wrote:
> > > >> >>>>>>>> > Hi Richard,
> > > >> >>>>>>>> >
> > > >> >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as 
> > > >> >>>>>>>> > this
> > > >> >>>>>>>> > seemed more
> > > >> >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree 
> > > >> >>>>>>>> > codes. In
> > > >> >>>>>>>> > addition it
> > > >> >>>>>>>> > would seem to provide more opportunities for optimisation 
> > > >> >>>>>>>> > than a
> > > >> >>>>>>>> > target-specific
> > > >> >>>>>>>> > builtin implementation would. I accept that optimisation
> > > >> >>>>>>>> > opportunities will
> > > >> >>>>>>>> > be more limited for strict math compilation, but that it 
> > > >> >>>>>>>> > was still
> > > >> >>>>>>>> > worth having
> > > >> >>>>>>>> > them. Also, if we did map it to builtins then the scalar
> > > >> >>>>>>>> > version would go
> > > >> >>>>>>>> > through the optabs and the vector version would go through 
> > > >> >>>>>>>> > the
> > > >> >>>>>>>> > target's builtin
> > > >> >>>>>>>> > expansion, which doesn't seem very consistent.
> > > >> >>>>>>>>
> > > >> >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR 
> > > >> >>>>>>>> and thus
> > > >> >>>>>>>> you can't vectorize anyway?  (strict IEEE behavior is about 
> > > >> >>>>>>>> NaNs,
> > > >> >>>>>>>> correct?)
> > > >> >>>>>>> I thought for this particular case associativity wasn't an 
> > > >> >>>>>>> issue?
> > > >> >>>>>>> We're not doing any
> > > >> >>>>>>> reductions here, just simply performing max/min operations on 
> > > >> >>>>>>> each
> > > >> >>>>>>> pair of elements
> > > >> >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just 
> > > >> >>>>>>> need to
> > > >> >>>>>>> ensure that for
> > > >> >>>>>>> each pair of elements if one element is a NaN we return the 
> > > >> >>>>>>> other one.
> > > >> >>>>>>
> > > >> >>>>>> Hmm, true.  Ok, my comment still stands - I don't see that 
> > > >> >>>>>> using a
> > > >> >>>>>> tree code is the best thing to do here.  You can add fmin/max 
> > > >> >>>>>> optabs
> > > >> >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a 
> > > >> >>>>>> target
> > > >> >>>>>> builtin for the vectorized variant.
> > > >> >>>>>>
> > > >> >>>>>> The reason I am pushing against a new tree code is that we'd 
> > > >> >>>>>> have an
> > > >> >>>>>> awful lot of similar codes when pushing other flag related IL
> > > >> >>>>>> specialities to actual IL constructs.  And we still need to 
> > > >> >>>>>> find a
> > > >> >>>>>> consistent way to do that.
> > > >> >>>>>
> > > >> >>>>> In this case though the new code is really the "native" min/max 
> > > >> >>>>> operation
> > > >> >>>>> for fp, rather than some weird flag-dependent behaviour.  Maybe 
> > > >> >>>>> it's
> > > >> >>>>> a bit unfortunate that the non-strict min/max fp operation got 
> > > >> >>>>> mapped
> > > >> >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version 
> > > >> >>>>> is really
> > > >> >>>>> the flag-related modification.  The STRICT_* prefix is forced by 
> > > >> >>>>> that and
> > > >> >>>>> might make it seem like more of a special case than it really is.
> > > >> >>>>
> > > >> >>>> In some sense.  But the "strict" version already has a builtin 
> > > >> >>>> (just no
> > > >> >>>> special expander in builtins.c).  We usually don't add 1:1 tree 
> > > >> >>>> codes
> > > >> >>>> for existing builtins (why have builtins at all then?).
> > > >> >>>
> > > >> >>> We still need the builtin to match the C function (and to allow 
> > > >> >>> direct
> > > >> >>> calls to __builtin_fmin, etc., which are occasionally useful).
> > > >> >>>
> > > >> >>>>> If you're still not convinced, how about an internal function 
> > > >> >>>>> instead
> > > >> >>>>> of a built-in function, so that we can continue to use optabs 
> > > >> >>>>> for all
> > > >> >>>>> cases?  I'd really like to avoid forcing such a generic concept 
> > > >> >>>>> down to
> > > >> >>>>> target-specific builtins with target-specific expansion code, 
> > > >> >>>>> especially
> > > >> >>>>> when the same concept is exposed by target-independent code for 
> > > >> >>>>> scalars.
> > > >> >>>>
> > > >> >>>> The target builtin is for the vectorized variant - not all 
> > > >> >>>> targets might have
> > > >> >>>> that and we'd need to query the target about this.  So using a 
> > > >> >>>> IFN would
> > > >> >>>> mean adding a target hook for that query.
> > > >> >>>
> > > >> >>> No, the idea is that if we have a tree code or an internal 
> > > >> >>> function, the
> > > >> >>> decision about whether we have target support is based on a query 
> > > >> >>> of the
> > > >> >>> optabs (just like it is for scalar, and for other vectorisable 
> > > >> >>> tree codes).
> > > >> >>> No new hooks are needed.
> > > >> >>>
> > > >> >>> The patch checked for target support that way.
> > > >> >>
> > > >> >> Fair enough.  Still this means we should have tree codes for all 
> > > >> >> builtins
> > > >> >> that eventually are vectorized?  So why don't we have SIN_EXPR,
> > > >> >> POW_EXPR (ok, I did argue and have patches for that in the past),
> > > >> >> RINT_EXPR, SQRT_EXPR, etc?
> > > >> >
> > > >> > Yeah, it doesn't sound so bad to me :-)  The choice of what's a 
> > > >> > function
> > > >> > in C and what's inherent is pretty arbitrary.  E.g. % on doubles 
> > > >> > could
> > > >> > have implemented fmod() or remainder().  Casts from double to int 
> > > >> > could
> > > >> > have used the current rounding mode, but instead they truncate and
> > > >> > conversions using the rounding mode need to go through something like
> > > >> > (l)lrint().  Like you say, pow() could have been an operator (and is 
> > > >> > in
> > > >> > many languages), but instead it's a function.
> > > >> >
> > > >> >> This patch starts to go down that route which is why I ask for the
> > > >> >> whole picture to be considered and hinted at the alternative 
> > > >> >> implementation
> > > >> >> which follows existing practice.  Add a expander in builtins.c, add 
> > > >> >> an optab,
> > > >> >> and eventual support to vectorized_function.
> > > >> >>
> > > >> >> See for example ix86_builtin_vectorized_function which handles
> > > >> >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR
> > > >> >> if the target supports it for the scalar mode, so not sure if there 
> > > >> >> is
> > > >> >> any x86 ISA where it has vectorized FMA but not scalar FMA).
> > > >> >
> > > >> > Yeah.  TBH I'm really against doing that unless (a) there's good 
> > > >> > reason
> > > >> > to believe that the concept really is specific to one target and
> > > >> > wouldn't be implemented on others or (b) there really is a function
> > > >> > rather than an instruction underneath (usually the case for sin, 
> > > >> > etc.).
> > > >> > But (b) could also be handled by the optab support library mechanism.
> > > >> >
> > > >> > Reasons against using target-specific builtins for operations that
> > > >> > have direct support in the ISA:
> > > >> >
> > > >> > 1. Like you say, in practice vector ops only tend to be supported if 
> > > >> > the
> > > >> >    associated scalar op is also supported.  Sticking to this approach
> > > >> >    means that vector ops follow a different path from scalar ops 
> > > >> > whereas
> > > >> >    (for example) division follows the same path for both.  It just 
> > > >> > seems
> > > >> >    confusing to have some floating-point optabs that support both 
> > > >> > scalar
> > > >> >    and vector operands and others that only support scalar operands.
> > > >> >
> > > >> > 2. Once converted to a target-specific function, the 
> > > >> > target-independent
> > > >> >    code has no idea what the function does or how expensive it is.
> > > >> >    We might start out with just one hook to convert a scalar 
> > > >> > operation
> > > >> >    to a target-dependent built-in function, but I bet over time we'll
> > > >> >    grow other hooks to query properties about the function, such as
> > > >> >    costs.
> > > >> >
> > > >> > 3. builtin_vectorized_function returns a decl rather than a call.
> > > >> >    If the target's vector API doesn't already have a built-in for the
> > > >> >    operation we need, with the exact types and arguments that we 
> > > >> > expect,
> > > >> >    the target needs to define one, presumably marked so that it isn't
> > > >> >    callable by input code.
> > > >> >
> > > >> >    E.g. on targets where FP conversion instructions allow an explicit
> > > >> >    rounding mode to be specified as an operand, it's reasonable for a
> > > >> >    target's vector API to expose that operand as a constant argument 
> > > >> > to
> > > >> >    the API function.  There'd then be one API function for all 
> > > >> > vector-
> > > >> >    float-to-vector-float integer rounding operations, rather than one
> > > >> >    for vector rint(), one for vector ceil(), etc.  (I'm thinking of
> > > >> >    System z instructions here, although I don't know offhand what the
> > > >> >    vector API is there.)  IMO it doesn't make sense to force the 
> > > >> > target
> > > >> >    to define "fake" built-in functions for all those possibilities
> > > >> >    purely for the sake of the target hook.  It's a lot of extra code,
> > > >> >    and it's extra code that would be duplicated on any target that 
> > > >> > needs
> > > >> >    to do this.
> > > >> >
> > > >> > IMO optabs are the best way for the target to tell the 
> > > >> > target-independent
> > > >> > code what it can do.  If it supports sqrt on df it defines sqrtdf and
> > > >> > if it supports vector sqrt on v2df it defines sqrtv2df.  These 
> > > >> > patterns
> > > >> > will often be a single define_expand or define_insn template -- the
> > > >> > vectorness often comes "for free" in terms of writing the pattern.
> > > >> >
> > > >> >>>> > TBH though I'm not sure why an internal_fn value (or a 
> > > >> >>>> > target-specific
> > > >> >>>> > builtin enum value) is worse than a tree-code value, unless the 
> > > >> >>>> > limit
> > > >> >>>> > of the tree_code bitfield is in sight (maybe it is).
> > > >> >>>>
> > > >> >>>> I think tree_code is 64bits now.
> > > >> >>>
> > > >> >>> Even better :-)
> > > >> >>
> > > >> >> Yes.
> > > >> >>
> > > >> >> I'm not against adding a corresponding tree code for all math 
> > > >> >> builtin functions,
> > > >> >> we just have to decide whether this is the way to go (and of course 
> > > >> >> support
> > > >> >> expanding those back to libcalls to libc/m rather than libgcc).  
> > > >> >> There are
> > > >> >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may
> > > >> >> generate as the target may not be able to expand the long double 
> > > >> >> variant
> > > >> >> directly but needs a libcall but libm might not be linked or may not
> > > >> >> have support
> > > >> >> for it.  That would be a new thing compared to libgcc providing a 
> > > >> >> fallback
> > > >> >> for all other tree codes.
> > > >> >
> > > >> > True, but that doesn't seem too bad.  The constraints would be the 
> > > >> > same
> > > >> > if we're operating on built-in functions rather than codes.  I 
> > > >> > suppose
> > > >> > built-in functions make this more explicit, but at the end of the day
> > > >> > it's a costing decision.  We should no more be converting a cheap
> > > >> > operation into an expensive libgcc function than converting a cheap
> > > >> > operation into an expensive libm function, even if the libgcc 
> > > >> > conversion
> > > >> > links.
> > > >> >
> > > >> > There's certainly precedent for introducing calls to things that 
> > > >> > libgcc
> > > >> > doesn't define.  E.g. we already introduce calls to memcpy in things
> > > >> > like loop distribution, even though we don't provide a fallback 
> > > >> > memcpy
> > > >> > in libgcc.
> > > >>
> > > >> As an additional point for many math functions we have to support errno
> > > >> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR
> > > >> only if -fno-math-errno is in effect.  But then code has to handle
> > > >> both variants for things like constant folding and expression 
> > > >> combining.
> > > >> That's very unfortunate and something we want to avoid (one reason
> > > >> the POW_EXPR thing didn't fly when I tried).  STRICT_FMIN/MAX_EXPR
> > > >> is an example where this doesn't apply, of course (but I detest the 
> > > >> name,
> > > >> just use FMIN/FMAX_EXPR?).  Still you'd need to handle both,
> > > >> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform.
> > > >>
> > > >> Richard.
> > > >>
> > > >>
> > > >> > Thanks,
> > > >> > Richard
> > > >> >
> > > >
> >
>

fmaxmin2.patch
Description: Binary data

RE: [Patch 2/3][Aarch64] Add support for IEEE-conformant versions of scalar fmin* and fmax*

Reply via email to