On Wed, Oct 2, 2024 at 6:26 PM Victor Do Nascimento <victor.donascime...@arm.com> wrote: > > Given the categorization of math built-in functions as `ECF_CONST', > when if-converting their uses, their calls are not masked and are thus > called with an all-true predicate. > > This, however, is not appropriate where built-ins have library > equivalents, wherein they may exhibit highly architecture-specific > behaviors. For example, vectorized implementations may delegate the > computation of values outside a certain acceptable numerical range to > special (non-vectorized) routines which considerably slow down > computation. > > As numerical simulation programs often do bounds check on input values > prior to math calls, conditionally assigning default output values for > out-of-bounds input and skipping the math call altogether, these > fallback implementations should seldom be called in the execution of > vectorized code. If, however, we don't apply any masking to these > math functions, we end up effectively executing both if and else > branches for these values, leading to considerable performance > degradation on scientific workloads. > > We therefore invert the order of handling of math function calls in > `if_convertible_stmt_p' to prioritize the handling of their > library-provided implementations over the equivalent internal function. > > Regression tested on aarch64-none-linux-gnu & x86_64-linux-gnu w/ no > new regressions.
I think the patch is good - note I think there's even a bugzilla about this behavior. So as incremental improvement the patch is OK. I think we should further improve this, possibly with profile data, and the situation could be improved by handling the calls like mask stores where we put a if (mask != 0) before the store. Richard. > gcc/ChangeLog: > > * tree-if-conv.cc (if_convertible_stmt_p): Check for explicit > function declaration before IFN fallback. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-fncall-mask-math.c: New. > --- > .../gcc.dg/vect/vect-fncall-mask-math.c | 33 +++++++++++++++++++ > gcc/tree-if-conv.cc | 18 +++++----- > 2 files changed, 42 insertions(+), 9 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c > b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c > new file mode 100644 > index 00000000000..15e22da2807 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-fncall-mask-math.c > @@ -0,0 +1,33 @@ > +/* Test the correct application of masking to autovectorized math function > calls. > + Test is currently set to xfail pending the release of the relevant lmvec > + support. */ > +/* { dg-do compile { target { aarch64*-*-* } } } */ > +/* { dg-additional-options "-march=armv8.2-a+sve -fdump-tree-ifcvt-raw > -Ofast" { target { aarch64*-*-* } } } */ > + > +#include <math.h> > + > +const int N = 20; > +const float lim = 101.0; > +const float cst = -1.0; > +float tot = 0.0; > + > +float b[20]; > +float a[20] = { [0 ... 9] = 1.7014118e39, /* If branch. */ > + [10 ... 19] = 100.0 }; /* Else branch. */ > + > +int main (void) > +{ > + #pragma omp simd > + for (int i = 0; i < N; i += 1) > + { > + if (a[i] > lim) > + b[i] = cst; > + else > + b[i] = expf (a[i]); > + tot += b[i]; > + } > + return (0); > +} > + > +/* { dg-final { scan-tree-dump-not { gimple_call <expf, _2, _1>} ifcvt { > xfail { aarch64*-*-* } } } } */ > +/* { dg-final { scan-tree-dump { gimple_call <.MASK_CALL, _2, expf, _1, > _30>} ifcvt { xfail { aarch64*-*-* } } } } */ > diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc > index 3b04d1e8d34..90c754a4814 100644 > --- a/gcc/tree-if-conv.cc > +++ b/gcc/tree-if-conv.cc > @@ -1133,15 +1133,6 @@ if_convertible_stmt_p (gimple *stmt, > vec<data_reference_p> refs) > > case GIMPLE_CALL: > { > - /* There are some IFN_s that are used to replace builtins but have the > - same semantics. Even if MASK_CALL cannot handle them > vectorable_call > - will insert the proper selection, so do not block conversion. */ > - int flags = gimple_call_flags (stmt); > - if ((flags & ECF_CONST) > - && !(flags & ECF_LOOPING_CONST_OR_PURE) > - && gimple_call_combined_fn (stmt) != CFN_LAST) > - return true; > - > tree fndecl = gimple_call_fndecl (stmt); > if (fndecl) > { > @@ -1160,6 +1151,15 @@ if_convertible_stmt_p (gimple *stmt, > vec<data_reference_p> refs) > } > } > > + /* There are some IFN_s that are used to replace builtins but have the > + same semantics. Even if MASK_CALL cannot handle them > vectorable_call > + will insert the proper selection, so do not block conversion. */ > + int flags = gimple_call_flags (stmt); > + if ((flags & ECF_CONST) > + && !(flags & ECF_LOOPING_CONST_OR_PURE) > + && gimple_call_combined_fn (stmt) != CFN_LAST) > + return true; > + > return false; > } > > -- > 2.34.1 >