Hi, Thanks for the comments James, I've moved the patterns around and added new comments to them. Hope this is ok.
Regards, David Sherwood. ChangeLog: 2015-12-01 David Sherwood <david.sherw...@arm.com> gcc/ * config/aarch64/aarch64.md: New pattern. * config/aarch64/aarch64-simd.md: Likewise. * config/aarch64/iterators.md: New unspecs, iterators. gcc/testsuite * gcc.target/aarch64/fmaxmin.c: New test. > -----Original Message----- > From: James Greenhalgh [mailto:james.greenha...@arm.com] > Sent: 26 November 2015 16:53 > To: David Sherwood > Cc: GCC Patches > Subject: Re: [Patch 2/3][Aarch64] Add support for IEEE-conformant versions of > scalar fmin* and fmax* > > On Thu, Nov 26, 2015 at 04:20:35PM -0000, David Sherwood wrote: > > Hi, > > > > Here is the second patch of the fmin/fmax change, which adds the optabs > > to the aarch64 backend. > > > > Tested: > > > > x86_64-linux: no regressions > > aarch64-none-elf: no regressions > > > > Good to go? > > David Sherwood. > > Could you also update the comment a few lines above the pattern you add > in aarch64-simd.md? Unless I've misunderstood the point of this patch set, > that looks to be out of date now: > > ;; FP Max/Min > ;; Max/Min are introduced by idiom recognition by GCC's mid-end. An > ;; expression like: > ;; a = (b < c) ? b : c; > ;; is idiom-matched as MIN_EXPR<b,c> only if -ffinite-math-only is enabled > ;; either explicitly or indirectly via -ffast-math. > ;; > ;; MIN_EXPR and MAX_EXPR eventually map to 'smin' and 'smax' in RTL. > ;; The 'smax' and 'smin' RTL standard pattern names do not specify which > ;; operand will be returned when both operands are zero (i.e. they may not > ;; honour signed zeroes), or when either operand is NaN. Therefore GCC > ;; only introduces MIN_EXPR/MAX_EXPR in fast math mode or when not honouring > ;; NaNs. > > Either that, or reorder the patterns you add so the existing patterns that > this comment pertains to are kept close to it, and add a new comment for > your new pattern - explaining that it is the auto-vectorized form of the > IEEE-754 fmax/fmin functions. > > Thanks, > James > > > > > > > ChangeLog: > > > > 2015-11-26 David Sherwood <david.sherw...@arm.com> > > > > gcc/ > > * config/aarch64/aarch64.md: New pattern. > > * config/aarch64/aarch64-simd.md: Likewise. > > * config/aarch64/iterators.md: New unspecs, iterators. > > gcc/testsuite > > * gcc.target/aarch64/fmaxmin.c: New test. > > > > > -----Original Message----- > > > From: Richard Biener [mailto:richard.guent...@gmail.com] > > > Sent: 25 November 2015 12:39 > > > To: David Sherwood > > > Cc: GCC Patches; Richard Sandiford > > > Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of > > > scalar fmin* and fmax* > > > > > > On Mon, Nov 23, 2015 at 10:21 AM, David Sherwood <david.sherw...@arm.com> > > > wrote: > > > > Hi, > > > > > > > > This is part 1 of a reworked version of a patch I originally submitted > > > > in > > > > August, rebased after Richard Sandiford's recent work on the internal > > > > functions. This first patch adds the internal function definitions and > > > > optabs > > > > that provide support for IEEE fmax()/fmin() functions. > > > > > > > > Later patches will add the appropriate aarch64/aarch32 vector > > > > instructions. > > > > > > Ok. > > > > > > Thanks, > > > Richard. > > > > > > > Tested: > > > > > > > > x86_64-linux: no regressions > > > > aarch64-none-elf: no regressions > > > > arm-none-eabi: no regressions > > > > > > > > Regards, > > > > David Sherwood. > > > > > > > > ChangeLog: > > > > > > > > 2015-11-19 David Sherwood <david.sherw...@arm.com> > > > > > > > > gcc/ > > > > * optabs.def: Add new optabs fmax_optab/fmin_optab. > > > > * internal-fn.def: Add new fmax/fmin internal functions. > > > > * config/aarch64/aarch64.md: New pattern. > > > > * config/aarch64/aarch64-simd.md: Likewise. > > > > * config/aarch64/iterators.md: New unspecs, iterators. > > > > * config/arm/iterators.md: New iterators. > > > > * config/arm/unspecs.md: New unspecs. > > > > * config/arm/neon.md: New pattern. > > > > * config/arm/vfp.md: Likewise. > > > > * doc/md.texi: Add fmin and fmax patterns. > > > > gcc/testsuite > > > > * gcc.target/aarch64/fmaxmin.c: New test. > > > > * gcc.target/arm/fmaxmin.c: New test. > > > > > > > > > > > >> -----Original Message----- > > > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > > > >> Sent: 19 August 2015 13:35 > > > >> To: Richard Biener; David Sherwood; GCC Patches; Richard Sandiford > > > >> Subject: Re: [PING][Patch] Add support for IEEE-conformant versions of > > > >> scalar fmin* and fmax* > > > >> > > > >> On Wed, Aug 19, 2015 at 2:11 PM, Richard Sandiford > > > >> <richard.sandif...@arm.com> wrote: > > > >> > Richard Biener <richard.guent...@gmail.com> writes: > > > >> >> On Wed, Aug 19, 2015 at 11:54 AM, Richard Sandiford > > > >> >> <richard.sandif...@arm.com> wrote: > > > >> >>> Richard Biener <richard.guent...@gmail.com> writes: > > > >> >>>> On Tue, Aug 18, 2015 at 4:15 PM, Richard Sandiford > > > >> >>>> <richard.sandif...@arm.com> wrote: > > > >> >>>>> Richard Biener <richard.guent...@gmail.com> writes: > > > >> >>>>>> On Tue, Aug 18, 2015 at 1:07 PM, David Sherwood > > > >> >>>>>> <david.sherw...@arm.com> wrote: > > > >> >>>>>>>> On Mon, Aug 17, 2015 at 11:29 AM, David Sherwood > > > >> >>>>>>>> <david.sherw...@arm.com> wrote: > > > >> >>>>>>>> > Hi Richard, > > > >> >>>>>>>> > > > > >> >>>>>>>> > Thanks for the reply. I'd chosen to add new expressions as > > > >> >>>>>>>> > this > > > >> >>>>>>>> > seemed more > > > >> >>>>>>>> > consistent with the existing MAX_EXPR and MIN_EXPR tree > > > >> >>>>>>>> > codes. In > > > >> >>>>>>>> > addition it > > > >> >>>>>>>> > would seem to provide more opportunities for optimisation > > > >> >>>>>>>> > than a > > > >> >>>>>>>> > target-specific > > > >> >>>>>>>> > builtin implementation would. I accept that optimisation > > > >> >>>>>>>> > opportunities will > > > >> >>>>>>>> > be more limited for strict math compilation, but that it > > > >> >>>>>>>> > was still > > > >> >>>>>>>> > worth having > > > >> >>>>>>>> > them. Also, if we did map it to builtins then the scalar > > > >> >>>>>>>> > version would go > > > >> >>>>>>>> > through the optabs and the vector version would go through > > > >> >>>>>>>> > the > > > >> >>>>>>>> > target's builtin > > > >> >>>>>>>> > expansion, which doesn't seem very consistent. > > > >> >>>>>>>> > > > >> >>>>>>>> On another note ISTR you can't associate STRICT_MIN/MAX_EXPR > > > >> >>>>>>>> and thus > > > >> >>>>>>>> you can't vectorize anyway? (strict IEEE behavior is about > > > >> >>>>>>>> NaNs, > > > >> >>>>>>>> correct?) > > > >> >>>>>>> I thought for this particular case associativity wasn't an > > > >> >>>>>>> issue? > > > >> >>>>>>> We're not doing any > > > >> >>>>>>> reductions here, just simply performing max/min operations on > > > >> >>>>>>> each > > > >> >>>>>>> pair of elements > > > >> >>>>>>> in the vectors. I thought for IEEE-compliant behaviour we just > > > >> >>>>>>> need to > > > >> >>>>>>> ensure that for > > > >> >>>>>>> each pair of elements if one element is a NaN we return the > > > >> >>>>>>> other one. > > > >> >>>>>> > > > >> >>>>>> Hmm, true. Ok, my comment still stands - I don't see that > > > >> >>>>>> using a > > > >> >>>>>> tree code is the best thing to do here. You can add fmin/max > > > >> >>>>>> optabs > > > >> >>>>>> and special expansion of BUILT_IN_FMIN/MAX and you can use a > > > >> >>>>>> target > > > >> >>>>>> builtin for the vectorized variant. > > > >> >>>>>> > > > >> >>>>>> The reason I am pushing against a new tree code is that we'd > > > >> >>>>>> have an > > > >> >>>>>> awful lot of similar codes when pushing other flag related IL > > > >> >>>>>> specialities to actual IL constructs. And we still need to > > > >> >>>>>> find a > > > >> >>>>>> consistent way to do that. > > > >> >>>>> > > > >> >>>>> In this case though the new code is really the "native" min/max > > > >> >>>>> operation > > > >> >>>>> for fp, rather than some weird flag-dependent behaviour. Maybe > > > >> >>>>> it's > > > >> >>>>> a bit unfortunate that the non-strict min/max fp operation got > > > >> >>>>> mapped > > > >> >>>>> to the generic MIN_EXPR and MAX_EXPR when the non-strict version > > > >> >>>>> is really > > > >> >>>>> the flag-related modification. The STRICT_* prefix is forced by > > > >> >>>>> that and > > > >> >>>>> might make it seem like more of a special case than it really is. > > > >> >>>> > > > >> >>>> In some sense. But the "strict" version already has a builtin > > > >> >>>> (just no > > > >> >>>> special expander in builtins.c). We usually don't add 1:1 tree > > > >> >>>> codes > > > >> >>>> for existing builtins (why have builtins at all then?). > > > >> >>> > > > >> >>> We still need the builtin to match the C function (and to allow > > > >> >>> direct > > > >> >>> calls to __builtin_fmin, etc., which are occasionally useful). > > > >> >>> > > > >> >>>>> If you're still not convinced, how about an internal function > > > >> >>>>> instead > > > >> >>>>> of a built-in function, so that we can continue to use optabs > > > >> >>>>> for all > > > >> >>>>> cases? I'd really like to avoid forcing such a generic concept > > > >> >>>>> down to > > > >> >>>>> target-specific builtins with target-specific expansion code, > > > >> >>>>> especially > > > >> >>>>> when the same concept is exposed by target-independent code for > > > >> >>>>> scalars. > > > >> >>>> > > > >> >>>> The target builtin is for the vectorized variant - not all > > > >> >>>> targets might have > > > >> >>>> that and we'd need to query the target about this. So using a > > > >> >>>> IFN would > > > >> >>>> mean adding a target hook for that query. > > > >> >>> > > > >> >>> No, the idea is that if we have a tree code or an internal > > > >> >>> function, the > > > >> >>> decision about whether we have target support is based on a query > > > >> >>> of the > > > >> >>> optabs (just like it is for scalar, and for other vectorisable > > > >> >>> tree codes). > > > >> >>> No new hooks are needed. > > > >> >>> > > > >> >>> The patch checked for target support that way. > > > >> >> > > > >> >> Fair enough. Still this means we should have tree codes for all > > > >> >> builtins > > > >> >> that eventually are vectorized? So why don't we have SIN_EXPR, > > > >> >> POW_EXPR (ok, I did argue and have patches for that in the past), > > > >> >> RINT_EXPR, SQRT_EXPR, etc? > > > >> > > > > >> > Yeah, it doesn't sound so bad to me :-) The choice of what's a > > > >> > function > > > >> > in C and what's inherent is pretty arbitrary. E.g. % on doubles > > > >> > could > > > >> > have implemented fmod() or remainder(). Casts from double to int > > > >> > could > > > >> > have used the current rounding mode, but instead they truncate and > > > >> > conversions using the rounding mode need to go through something like > > > >> > (l)lrint(). Like you say, pow() could have been an operator (and is > > > >> > in > > > >> > many languages), but instead it's a function. > > > >> > > > > >> >> This patch starts to go down that route which is why I ask for the > > > >> >> whole picture to be considered and hinted at the alternative > > > >> >> implementation > > > >> >> which follows existing practice. Add a expander in builtins.c, add > > > >> >> an optab, > > > >> >> and eventual support to vectorized_function. > > > >> >> > > > >> >> See for example ix86_builtin_vectorized_function which handles > > > >> >> sqrt, floor, ceil, etc. and even FMA (we only fold FMA to FMA_EXPR > > > >> >> if the target supports it for the scalar mode, so not sure if there > > > >> >> is > > > >> >> any x86 ISA where it has vectorized FMA but not scalar FMA). > > > >> > > > > >> > Yeah. TBH I'm really against doing that unless (a) there's good > > > >> > reason > > > >> > to believe that the concept really is specific to one target and > > > >> > wouldn't be implemented on others or (b) there really is a function > > > >> > rather than an instruction underneath (usually the case for sin, > > > >> > etc.). > > > >> > But (b) could also be handled by the optab support library mechanism. > > > >> > > > > >> > Reasons against using target-specific builtins for operations that > > > >> > have direct support in the ISA: > > > >> > > > > >> > 1. Like you say, in practice vector ops only tend to be supported if > > > >> > the > > > >> > associated scalar op is also supported. Sticking to this approach > > > >> > means that vector ops follow a different path from scalar ops > > > >> > whereas > > > >> > (for example) division follows the same path for both. It just > > > >> > seems > > > >> > confusing to have some floating-point optabs that support both > > > >> > scalar > > > >> > and vector operands and others that only support scalar operands. > > > >> > > > > >> > 2. Once converted to a target-specific function, the > > > >> > target-independent > > > >> > code has no idea what the function does or how expensive it is. > > > >> > We might start out with just one hook to convert a scalar > > > >> > operation > > > >> > to a target-dependent built-in function, but I bet over time we'll > > > >> > grow other hooks to query properties about the function, such as > > > >> > costs. > > > >> > > > > >> > 3. builtin_vectorized_function returns a decl rather than a call. > > > >> > If the target's vector API doesn't already have a built-in for the > > > >> > operation we need, with the exact types and arguments that we > > > >> > expect, > > > >> > the target needs to define one, presumably marked so that it isn't > > > >> > callable by input code. > > > >> > > > > >> > E.g. on targets where FP conversion instructions allow an explicit > > > >> > rounding mode to be specified as an operand, it's reasonable for a > > > >> > target's vector API to expose that operand as a constant argument > > > >> > to > > > >> > the API function. There'd then be one API function for all > > > >> > vector- > > > >> > float-to-vector-float integer rounding operations, rather than one > > > >> > for vector rint(), one for vector ceil(), etc. (I'm thinking of > > > >> > System z instructions here, although I don't know offhand what the > > > >> > vector API is there.) IMO it doesn't make sense to force the > > > >> > target > > > >> > to define "fake" built-in functions for all those possibilities > > > >> > purely for the sake of the target hook. It's a lot of extra code, > > > >> > and it's extra code that would be duplicated on any target that > > > >> > needs > > > >> > to do this. > > > >> > > > > >> > IMO optabs are the best way for the target to tell the > > > >> > target-independent > > > >> > code what it can do. If it supports sqrt on df it defines sqrtdf and > > > >> > if it supports vector sqrt on v2df it defines sqrtv2df. These > > > >> > patterns > > > >> > will often be a single define_expand or define_insn template -- the > > > >> > vectorness often comes "for free" in terms of writing the pattern. > > > >> > > > > >> >>>> > TBH though I'm not sure why an internal_fn value (or a > > > >> >>>> > target-specific > > > >> >>>> > builtin enum value) is worse than a tree-code value, unless the > > > >> >>>> > limit > > > >> >>>> > of the tree_code bitfield is in sight (maybe it is). > > > >> >>>> > > > >> >>>> I think tree_code is 64bits now. > > > >> >>> > > > >> >>> Even better :-) > > > >> >> > > > >> >> Yes. > > > >> >> > > > >> >> I'm not against adding a corresponding tree code for all math > > > >> >> builtin functions, > > > >> >> we just have to decide whether this is the way to go (and of course > > > >> >> support > > > >> >> expanding those back to libcalls to libc/m rather than libgcc). > > > >> >> There are > > > >> >> also constraints on what kind of STRICT_FMIN_EXPR the compiler may > > > >> >> generate as the target may not be able to expand the long double > > > >> >> variant > > > >> >> directly but needs a libcall but libm might not be linked or may not > > > >> >> have support > > > >> >> for it. That would be a new thing compared to libgcc providing a > > > >> >> fallback > > > >> >> for all other tree codes. > > > >> > > > > >> > True, but that doesn't seem too bad. The constraints would be the > > > >> > same > > > >> > if we're operating on built-in functions rather than codes. I > > > >> > suppose > > > >> > built-in functions make this more explicit, but at the end of the day > > > >> > it's a costing decision. We should no more be converting a cheap > > > >> > operation into an expensive libgcc function than converting a cheap > > > >> > operation into an expensive libm function, even if the libgcc > > > >> > conversion > > > >> > links. > > > >> > > > > >> > There's certainly precedent for introducing calls to things that > > > >> > libgcc > > > >> > doesn't define. E.g. we already introduce calls to memcpy in things > > > >> > like loop distribution, even though we don't provide a fallback > > > >> > memcpy > > > >> > in libgcc. > > > >> > > > >> As an additional point for many math functions we have to support errno > > > >> which means, like, BUILT_IN_SQRT can be rewritten to SQRT_EXPR > > > >> only if -fno-math-errno is in effect. But then code has to handle > > > >> both variants for things like constant folding and expression > > > >> combining. > > > >> That's very unfortunate and something we want to avoid (one reason > > > >> the POW_EXPR thing didn't fly when I tried). STRICT_FMIN/MAX_EXPR > > > >> is an example where this doesn't apply, of course (but I detest the > > > >> name, > > > >> just use FMIN/FMAX_EXPR?). Still you'd need to handle both, > > > >> FMIN_EXPR and BUILT_IN_FMIN, in code doing analysis/transform. > > > >> > > > >> Richard. > > > >> > > > >> > > > >> > Thanks, > > > >> > Richard > > > >> > > > > > > > >
fmaxmin2.patch
Description: Binary data