On Tue, Aug 20, 2024 at 3:35 PM Juergen Christ <jchr...@linux.ibm.com> wrote:
>
> Am Tue, Aug 20, 2024 at 02:51:02PM +0200 schrieb Richard Biener:
> > On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ <jchr...@linux.ibm.com> 
> > wrote:
> > >
> > > Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener:
> > > > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchr...@linux.ibm.com> 
> > > > wrote:
> > > > >
> > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener:
> > > > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ 
> > > > > > <jchr...@linux.ibm.com> wrote:
> > > > > > >
> > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener:
> > > > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ 
> > > > > > > > <jchr...@linux.ibm.com> wrote:
> > > > > > > > >
> > > > > > > > > Do not convert floats to ints in multiple step if trapping 
> > > > > > > > > math is
> > > > > > > > > enabled.  This might hide some inexact signals.
> > > > > > > > >
> > > > > > > > > Also use correct sign (the sign of the target integer type) 
> > > > > > > > > for the
> > > > > > > > > intermediate steps.  This only affects undefined behaviour 
> > > > > > > > > (casting
> > > > > > > > > floats to unsigned datatype where the float is negative).
> > > > > > > > >
> > > > > > > > > gcc/ChangeLog:
> > > > > > > > >
> > > > > > > > >         * tree-vect-stmts.cc (vectorizable_conversion): 
> > > > > > > > > multi-step
> > > > > > > > >           float to int conversion only with trapping math and 
> > > > > > > > > correct
> > > > > > > > >           sign.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Juergen Christ <jchr...@linux.ibm.com>
> > > > > > > > >
> > > > > > > > > Bootstrapped and tested on x84 and s390.  Ok for trunk?
> > > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  gcc/tree-vect-stmts.cc | 8 +++++---
> > > > > > > > >  1 file changed, 5 insertions(+), 3 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
> > > > > > > > > index fdcda0d2abae..2ddd13383193 100644
> > > > > > > > > --- a/gcc/tree-vect-stmts.cc
> > > > > > > > > +++ b/gcc/tree-vect-stmts.cc
> > > > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info 
> > > > > > > > > *vinfo,
> > > > > > > > >             break;
> > > > > > > > >
> > > > > > > > >           cvt_type
> > > > > > > > > -           = build_nonstandard_integer_type 
> > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > > > > +           = build_nonstandard_integer_type 
> > > > > > > > > (GET_MODE_BITSIZE (rhs_mode),
> > > > > > > > > +                                             TYPE_UNSIGNED 
> > > > > > > > > (lhs_type));
> > > > > > > >
> > > > > > > > But lhs_type should be a float type here, the idea that for a
> > > > > > > > FLOAT_EXPR (int -> float)
> > > > > > > > a signed integer type is the natural one to use - as it's 2x 
> > > > > > > > wider
> > > > > > > > than the original
> > > > > > > > RHS type it's signedness doesn't matter.  Note all float types 
> > > > > > > > should be
> > > > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on 
> > > > > > > > the intent IMO.
> > > > > > > >
> > > > > > > > Please drop it.
> > > > > > >
> > > > > > > Will do.  Sorry about that.
> > > > > > >
> > > > > > > > >           cvt_type = get_same_sized_vectype (cvt_type, 
> > > > > > > > > vectype_in);
> > > > > > > > >           if (cvt_type == NULL_TREE)
> > > > > > > > >             goto unsupported;
> > > > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info 
> > > > > > > > > *vinfo,
> > > > > > > > >        if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE 
> > > > > > > > > (rhs_mode))
> > > > > > > > >         goto unsupported;
> > > > > > > > >
> > > > > > > > > -      if (code == FIX_TRUNC_EXPR)
> > > > > > > > > +      if (code == FIX_TRUNC_EXPR && !flag_trapping_math)
> > > > > > > > >         {
> > > > > > > > >           cvt_type
> > > > > > > > > -           = build_nonstandard_integer_type 
> > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), 0);
> > > > > > > > > +           = build_nonstandard_integer_type 
> > > > > > > > > (GET_MODE_BITSIZE (rhs_mode),
> > > > > > > > > +                                             TYPE_UNSIGNED 
> > > > > > > > > (lhs_type));
> > > > > > > >
> > > > > > > > Here it might be relevant for correctness - we have to choose 
> > > > > > > > between
> > > > > > > > sfix and ufix for the float -> [u]int conversion.
> > > > > > > >
> > > > > > > > Do  you have a testcase?  Shouldn't the exactness be 
> > > > > > > > independent of the integer
> > > > > > > > type we convert to?
> > > > > > >
> > > > > > > I was looking at this little program which contains undefined 
> > > > > > > behaviour:
> > > > > > >
> > > > > > > #include <stdio.h>
> > > > > > >
> > > > > > > __attribute__((noinline,noclone,noipa))
> > > > > > > void
> > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out);
> > > > > > >
> > > > > > > void
> > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out)
> > > > > > > {
> > > > > > >         out[0] = in[0];
> > > > > > >         out[1] = in[1];
> > > > > > >         out[2] = in[2];
> > > > > > >         out[3] = in[3];
> > > > > > > }
> > > > > > >
> > > > > > > int main()
> > > > > > > {
> > > > > > >         double in[] = {-1,-2,-3,-4};
> > > > > > >         unsigned int out[4];
> > > > > > >
> > > > > > >         vec_pack_ufix_trunc_v2df (in, out);
> > > > > > >         for (int i = 0; i < 4; ++i)
> > > > > > >                 printf("out[%d] = %u\n", i, out[i]);
> > > > > > >         return 0;
> > > > > > > }
> > > > > > >
> > > > > > > On s390x, I get different results after vectorization:
> > > > > > >
> > > > > > > out[0] = 4294967295
> > > > > > > out[1] = 4294967294
> > > > > > > out[2] = 4294967293
> > > > > > > out[3] = 4294967292
> > > > > > >
> > > > > > > than without vectorization:
> > > > > > >
> > > > > > > out[0] = 0
> > > > > > > out[1] = 0
> > > > > > > out[2] = 0
> > > > > > > out[3] = 0
> > > > > > >
> > > > > > > Even if this is undefined behaviour, I think it would be nice to 
> > > > > > > have
> > > > > > > consistent results here.
> > > > > > >
> > > > > > > Also, while I added an expander to circumvent this problem in a
> > > > > > > previous patch, reviewers requested to hide this behind trapping 
> > > > > > > math.
> > > > > > > Thus, I looked into this.
> > > > > > >
> > > > > > > Seeing the result from the CI for aarch64, I guess there are some
> > > > > > > tests that actually expect this vectorization to always happen 
> > > > > > > even
> > > > > > > though it might not be save w.r.t. trapping math.
> > > > > >
> > > > > > I do remember this was extensively discussed (but we might have 
> > > > > > missed
> > > > > > something) and one argument indeed was that when it's undefined 
> > > > > > behavior
> > > > > > we can do the vectorization given the actual values might be 
> > > > > > in-bound.
> > > > >
> > > > > Okay.  Would you be fine with the patch to only vectorize when
> > > > > trapping math is disabled?  I still could take care of the rest on
> > > > > s390 backend side by defining the appropriate expanders.  Still think
> > > > > it is weird though that we might produce different results on
> > > > > vectorization than without vectorization.  Yes, that is what
> > > > > "undefined behaviour" is all about, but we can simply fix this here.
> > > > > Nevertheless, how about just adding the trapping math check?
> > > >
> > > > So to summarize - the problem is different result when vectorizing
> > > > when the scalar code invokes undefined behavior?  I think there is
> > > > nothing to fix and we shouldn't pessimize code not invoking undefined
> > > > behavior by adding a trapping math check.
> > > >
> > > > Or did I misunderstand things?
> > >
> > > The different results can still be delt with in the backend.  The only
> > > remaining part is the question if vectorization of FIX_TRUNC_EXPR in
> > > multiple steps should be guarded by the trapping math flag or not?  I
> > > think it should, but according to CI results, some architectures
> > > already rely on the current behaviour.  So I am unsure if we should
> > > add the flag or not.  What is your opinion on that?
> >
> > If we are adding a float conversion as intermediate step that would
> > definitely need to be guarded by !flag_trapping_math, say for
> > double -> float -> int, but I think we're doing double -> long -> int
> > for this and check ranges to make sure we are not missing a
> > truncation.
>
> I thought that was the problematic conversion.  At least I did not see
> a check there, but maybe I am wrong.  Then I will drop this patch
> completely.

So we're not doing any range analysis but for double -> int performed
as double -> long -> int when the double cannot be represented in int
the code invokes undefined behavior.  We're only considering intermediate
types where the final step is a truncation so we should be OK here.
The corner-case would be float->unsigned performed as
float->int->unsigned but I think that supportable_narrowing_operation
will reject the int->unsigned conversion.  I'm not sure we are
ever using a non vec_[un]pack_{u,s}fix_trunc optab aka
{u,s}fix_trunc directly for vectors (but x86 has patterns for this).
Doing float->unsigned via int would be wrong then (always, not
only with -ftrapping-math).

So, I think we're fine.

Richard.

>
> Thanks,
>
> Juergen

Reply via email to