Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener:
> On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchr...@linux.ibm.com> wrote:
> >
> > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener:
> > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchr...@linux.ibm.com> 
> > > wrote:
> > > >
> > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchr...@linux.ibm.com> 
> > > > > wrote:
> > > > > >
> > > > > > Do not convert floats to ints in multiple step if trapping math is
> > > > > > enabled.  This might hide some inexact signals.
> > > > > >
> > > > > > Also use correct sign (the sign of the target integer type) for the
> > > > > > intermediate steps.  This only affects undefined behaviour (casting
> > > > > > floats to unsigned datatype where the float is negative).
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * tree-vect-stmts.cc (vectorizable_conversion): multi-step
> > > > > >           float to int conversion only with trapping math and 
> > > > > > correct
> > > > > >           sign.
> > > > > >
> > > > > > Signed-off-by: Juergen Christ <jchr...@linux.ibm.com>
> > > > > >
> > > > > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > > > > >
> > > > > > ---
> > > > > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > index fdcda0d2abae..2ddd13383193 100644
> > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > >             break;
> > > > > >
> > > > > >           cvt_type
> > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE 
> > > > > > (rhs_mode), 0);
> > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE 
> > > > > > (rhs_mode),
> > > > > > +                                             TYPE_UNSIGNED 
> > > > > > (lhs_type));
> > > > >
> > > > > But lhs_type should be a float type here, the idea that for a
> > > > > FLOAT_EXPR (int -> float)
> > > > > a signed integer type is the natural one to use - as it's 2x wider
> > > > > than the original
> > > > > RHS type it's signedness doesn't matter.  Note all float types should 
> > > > > be
> > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the 
> > > > > intent IMO.
> > > > >
> > > > > Please drop it.
> > > >
> > > > Will do.  Sorry about that.
> > > >
> > > > > >           cvt_type = get_same_sized_vectype (cvt_type, vectype_in);
> > > > > >           if (cvt_type == NULL_TREE)
> > > > > >             goto unsupported;
> > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo,
> > > > > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode))
> > > > > >         goto unsupported;
> > > > > >
> > > > > > -      if (code == FIX_TRUNC_EXPR)
> > > > > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > > > >         {
> > > > > >           cvt_type
> > > > > > -           = build_nonstandard_integer_type (GET_MODE_BITSIZE 
> > > > > > (rhs_mode), 0);
> > > > > > +           = build_nonstandard_integer_type (GET_MODE_BITSIZE 
> > > > > > (rhs_mode),
> > > > > > +                                             TYPE_UNSIGNED 
> > > > > > (lhs_type));
> > > > >
> > > > > Here it might be relevant for correctness - we have to choose between
> > > > > sfix and ufix for the float -> [u]int conversion.
> > > > >
> > > > > Do  you have a testcase?  Shouldn't the exactness be independent of 
> > > > > the integer
> > > > > type we convert to?
> > > >
> > > > I was looking at this little program which contains undefined behaviour:
> > > >
> > > > #include <stdio.h>
> > > >
> > > > __attribute__((noinline,noclone,noipa))
> > > > void
> > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
> > > >
> > > > void
> > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> > > > {
> > > >         out[0] = in[0];
> > > >         out[1] = in[1];
> > > >         out[2] = in[2];
> > > >         out[3] = in[3];
> > > > }
> > > >
> > > > int main()
> > > > {
> > > >         double in[] = {-1,-2,-3,-4};
> > > >         unsigned int out[4];
> > > >
> > > >         vec_pack_ufix_trunc_v2df (in, out);
> > > >         for (int i = 0; i < 4; ++i)
> > > >                 printf("out[%d] = %u\n", i, out[i]);
> > > >         return 0;
> > > > }
> > > >
> > > > On s390x, I get different results after vectorization:
> > > >
> > > > out[0] = 4294967295
> > > > out[1] = 4294967294
> > > > out[2] = 4294967293
> > > > out[3] = 4294967292
> > > >
> > > > than without vectorization:
> > > >
> > > > out[0] = 0
> > > > out[1] = 0
> > > > out[2] = 0
> > > > out[3] = 0
> > > >
> > > > Even if this is undefined behaviour, I think it would be nice to have
> > > > consistent results here.
> > > >
> > > > Also, while I added an expander to circumvent this problem in a
> > > > previous patch, reviewers requested to hide this behind trapping math.
> > > > Thus, I looked into this.
> > > >
> > > > Seeing the result from the CI for aarch64, I guess there are some
> > > > tests that actually expect this vectorization to always happen even
> > > > though it might not be save w.r.t. trapping math.
> > >
> > > I do remember this was extensively discussed (but we might have missed
> > > something) and one argument indeed was that when it's undefined behavior
> > > we can do the vectorization given the actual values might be in-bound.
> >
> > Okay.  Would you be fine with the patch to only vectorize when
> > trapping math is disabled?  I still could take care of the rest on
> > s390 backend side by defining the appropriate expanders.  Still think
> > it is weird though that we might produce different results on
> > vectorization than without vectorization.  Yes, that is what
> > "undefined behaviour" is all about, but we can simply fix this here.
> > Nevertheless, how about just adding the trapping math check?
> 
> So to summarize - the problem is different result when vectorizing
> when the scalar code invokes undefined behavior?  I think there is
> nothing to fix and we shouldn't pessimize code not invoking undefined
> behavior by adding a trapping math check.
> 
> Or did I misunderstand things?

The different results can still be delt with in the backend.  The only
remaining part is the question if vectorization of FIX_TRUNC_EXPR in
multiple steps should be guarded by the trapping math flag or not?  I
think it should, but according to CI results, some architectures
already rely on the current behaviour.  So I am unsure if we should
add the flag or not.  What is your opinion on that?

Regards,

Juergen

Reply via email to