On Tue, Aug 20, 2024 at 3:35 PM Juergen Christ <jchr...@linux.ibm.com> wrote: > > Am Tue, Aug 20, 2024 at 02:51:02PM +0200 schrieb Richard Biener: > > On Tue, Aug 20, 2024 at 11:16 AM Juergen Christ <jchr...@linux.ibm.com> > > wrote: > > > > > > Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener: > > > > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchr...@linux.ibm.com> > > > > wrote: > > > > > > > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > > > > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ > > > > > > <jchr...@linux.ibm.com> wrote: > > > > > > > > > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ > > > > > > > > <jchr...@linux.ibm.com> wrote: > > > > > > > > > > > > > > > > > > Do not convert floats to ints in multiple step if trapping > > > > > > > > > math is > > > > > > > > > enabled. This might hide some inexact signals. > > > > > > > > > > > > > > > > > > Also use correct sign (the sign of the target integer type) > > > > > > > > > for the > > > > > > > > > intermediate steps. This only affects undefined behaviour > > > > > > > > > (casting > > > > > > > > > floats to unsigned datatype where the float is negative). > > > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): > > > > > > > > > multi-step > > > > > > > > > float to int conversion only with trapping math and > > > > > > > > > correct > > > > > > > > > sign. > > > > > > > > > > > > > > > > > > Signed-off-by: Juergen Christ <jchr...@linux.ibm.com> > > > > > > > > > > > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > > > > > > > > > > > --- > > > > > > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > > > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > > > > > > --- a/gcc/tree-vect-stmts.cc > > > > > > > > > +++ b/gcc/tree-vect-stmts.cc > > > > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info > > > > > > > > > *vinfo, > > > > > > > > > break; > > > > > > > > > > > > > > > > > > cvt_type > > > > > > > > > - = build_nonstandard_integer_type > > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > > > + = build_nonstandard_integer_type > > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), > > > > > > > > > + TYPE_UNSIGNED > > > > > > > > > (lhs_type)); > > > > > > > > > > > > > > > > But lhs_type should be a float type here, the idea that for a > > > > > > > > FLOAT_EXPR (int -> float) > > > > > > > > a signed integer type is the natural one to use - as it's 2x > > > > > > > > wider > > > > > > > > than the original > > > > > > > > RHS type it's signedness doesn't matter. Note all float types > > > > > > > > should be > > > > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on > > > > > > > > the intent IMO. > > > > > > > > > > > > > > > > Please drop it. > > > > > > > > > > > > > > Will do. Sorry about that. > > > > > > > > > > > > > > > > cvt_type = get_same_sized_vectype (cvt_type, > > > > > > > > > vectype_in); > > > > > > > > > if (cvt_type == NULL_TREE) > > > > > > > > > goto unsupported; > > > > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info > > > > > > > > > *vinfo, > > > > > > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE > > > > > > > > > (rhs_mode)) > > > > > > > > > goto unsupported; > > > > > > > > > > > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > > > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > > > > > > { > > > > > > > > > cvt_type > > > > > > > > > - = build_nonstandard_integer_type > > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), 0); > > > > > > > > > + = build_nonstandard_integer_type > > > > > > > > > (GET_MODE_BITSIZE (rhs_mode), > > > > > > > > > + TYPE_UNSIGNED > > > > > > > > > (lhs_type)); > > > > > > > > > > > > > > > > Here it might be relevant for correctness - we have to choose > > > > > > > > between > > > > > > > > sfix and ufix for the float -> [u]int conversion. > > > > > > > > > > > > > > > > Do you have a testcase? Shouldn't the exactness be > > > > > > > > independent of the integer > > > > > > > > type we convert to? > > > > > > > > > > > > > > I was looking at this little program which contains undefined > > > > > > > behaviour: > > > > > > > > > > > > > > #include <stdio.h> > > > > > > > > > > > > > > __attribute__((noinline,noclone,noipa)) > > > > > > > void > > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > > > > > > > > > > > void > > > > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > > > > > > { > > > > > > > out[0] = in[0]; > > > > > > > out[1] = in[1]; > > > > > > > out[2] = in[2]; > > > > > > > out[3] = in[3]; > > > > > > > } > > > > > > > > > > > > > > int main() > > > > > > > { > > > > > > > double in[] = {-1,-2,-3,-4}; > > > > > > > unsigned int out[4]; > > > > > > > > > > > > > > vec_pack_ufix_trunc_v2df (in, out); > > > > > > > for (int i = 0; i < 4; ++i) > > > > > > > printf("out[%d] = %u\n", i, out[i]); > > > > > > > return 0; > > > > > > > } > > > > > > > > > > > > > > On s390x, I get different results after vectorization: > > > > > > > > > > > > > > out[0] = 4294967295 > > > > > > > out[1] = 4294967294 > > > > > > > out[2] = 4294967293 > > > > > > > out[3] = 4294967292 > > > > > > > > > > > > > > than without vectorization: > > > > > > > > > > > > > > out[0] = 0 > > > > > > > out[1] = 0 > > > > > > > out[2] = 0 > > > > > > > out[3] = 0 > > > > > > > > > > > > > > Even if this is undefined behaviour, I think it would be nice to > > > > > > > have > > > > > > > consistent results here. > > > > > > > > > > > > > > Also, while I added an expander to circumvent this problem in a > > > > > > > previous patch, reviewers requested to hide this behind trapping > > > > > > > math. > > > > > > > Thus, I looked into this. > > > > > > > > > > > > > > Seeing the result from the CI for aarch64, I guess there are some > > > > > > > tests that actually expect this vectorization to always happen > > > > > > > even > > > > > > > though it might not be save w.r.t. trapping math. > > > > > > > > > > > > I do remember this was extensively discussed (but we might have > > > > > > missed > > > > > > something) and one argument indeed was that when it's undefined > > > > > > behavior > > > > > > we can do the vectorization given the actual values might be > > > > > > in-bound. > > > > > > > > > > Okay. Would you be fine with the patch to only vectorize when > > > > > trapping math is disabled? I still could take care of the rest on > > > > > s390 backend side by defining the appropriate expanders. Still think > > > > > it is weird though that we might produce different results on > > > > > vectorization than without vectorization. Yes, that is what > > > > > "undefined behaviour" is all about, but we can simply fix this here. > > > > > Nevertheless, how about just adding the trapping math check? > > > > > > > > So to summarize - the problem is different result when vectorizing > > > > when the scalar code invokes undefined behavior? I think there is > > > > nothing to fix and we shouldn't pessimize code not invoking undefined > > > > behavior by adding a trapping math check. > > > > > > > > Or did I misunderstand things? > > > > > > The different results can still be delt with in the backend. The only > > > remaining part is the question if vectorization of FIX_TRUNC_EXPR in > > > multiple steps should be guarded by the trapping math flag or not? I > > > think it should, but according to CI results, some architectures > > > already rely on the current behaviour. So I am unsure if we should > > > add the flag or not. What is your opinion on that? > > > > If we are adding a float conversion as intermediate step that would > > definitely need to be guarded by !flag_trapping_math, say for > > double -> float -> int, but I think we're doing double -> long -> int > > for this and check ranges to make sure we are not missing a > > truncation. > > I thought that was the problematic conversion. At least I did not see > a check there, but maybe I am wrong. Then I will drop this patch > completely.
So we're not doing any range analysis but for double -> int performed as double -> long -> int when the double cannot be represented in int the code invokes undefined behavior. We're only considering intermediate types where the final step is a truncation so we should be OK here. The corner-case would be float->unsigned performed as float->int->unsigned but I think that supportable_narrowing_operation will reject the int->unsigned conversion. I'm not sure we are ever using a non vec_[un]pack_{u,s}fix_trunc optab aka {u,s}fix_trunc directly for vectors (but x86 has patterns for this). Doing float->unsigned via int would be wrong then (always, not only with -ftrapping-math). So, I think we're fine. Richard. > > Thanks, > > Juergen