Am Tue, Aug 20, 2024 at 10:15:22AM +0200 schrieb Richard Biener: > On Fri, Aug 9, 2024 at 2:58 PM Juergen Christ <jchr...@linux.ibm.com> wrote: > > > > Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > > > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchr...@linux.ibm.com> > > > wrote: > > > > > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchr...@linux.ibm.com> > > > > > wrote: > > > > > > > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > > > > enabled. This might hide some inexact signals. > > > > > > > > > > > > Also use correct sign (the sign of the target integer type) for the > > > > > > intermediate steps. This only affects undefined behaviour (casting > > > > > > floats to unsigned datatype where the float is negative). > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > > > > float to int conversion only with trapping math and > > > > > > correct > > > > > > sign. > > > > > > > > > > > > Signed-off-by: Juergen Christ <jchr...@linux.ibm.com> > > > > > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > > > > > --- > > > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > > > --- a/gcc/tree-vect-stmts.cc > > > > > > +++ b/gcc/tree-vect-stmts.cc > > > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > break; > > > > > > > > > > > > cvt_type > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE > > > > > > (rhs_mode), 0); > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE > > > > > > (rhs_mode), > > > > > > + TYPE_UNSIGNED > > > > > > (lhs_type)); > > > > > > > > > > But lhs_type should be a float type here, the idea that for a > > > > > FLOAT_EXPR (int -> float) > > > > > a signed integer type is the natural one to use - as it's 2x wider > > > > > than the original > > > > > RHS type it's signedness doesn't matter. Note all float types should > > > > > be > > > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the > > > > > intent IMO. > > > > > > > > > > Please drop it. > > > > > > > > Will do. Sorry about that. > > > > > > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > > > > if (cvt_type == NULL_TREE) > > > > > > goto unsupported; > > > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > > > > goto unsupported; > > > > > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > > > { > > > > > > cvt_type > > > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE > > > > > > (rhs_mode), 0); > > > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE > > > > > > (rhs_mode), > > > > > > + TYPE_UNSIGNED > > > > > > (lhs_type)); > > > > > > > > > > Here it might be relevant for correctness - we have to choose between > > > > > sfix and ufix for the float -> [u]int conversion. > > > > > > > > > > Do you have a testcase? Shouldn't the exactness be independent of > > > > > the integer > > > > > type we convert to? > > > > > > > > I was looking at this little program which contains undefined behaviour: > > > > > > > > #include <stdio.h> > > > > > > > > __attribute__((noinline,noclone,noipa)) > > > > void > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > > > > > void > > > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > > > { > > > > out[0] = in[0]; > > > > out[1] = in[1]; > > > > out[2] = in[2]; > > > > out[3] = in[3]; > > > > } > > > > > > > > int main() > > > > { > > > > double in[] = {-1,-2,-3,-4}; > > > > unsigned int out[4]; > > > > > > > > vec_pack_ufix_trunc_v2df (in, out); > > > > for (int i = 0; i < 4; ++i) > > > > printf("out[%d] = %u\n", i, out[i]); > > > > return 0; > > > > } > > > > > > > > On s390x, I get different results after vectorization: > > > > > > > > out[0] = 4294967295 > > > > out[1] = 4294967294 > > > > out[2] = 4294967293 > > > > out[3] = 4294967292 > > > > > > > > than without vectorization: > > > > > > > > out[0] = 0 > > > > out[1] = 0 > > > > out[2] = 0 > > > > out[3] = 0 > > > > > > > > Even if this is undefined behaviour, I think it would be nice to have > > > > consistent results here. > > > > > > > > Also, while I added an expander to circumvent this problem in a > > > > previous patch, reviewers requested to hide this behind trapping math. > > > > Thus, I looked into this. > > > > > > > > Seeing the result from the CI for aarch64, I guess there are some > > > > tests that actually expect this vectorization to always happen even > > > > though it might not be save w.r.t. trapping math. > > > > > > I do remember this was extensively discussed (but we might have missed > > > something) and one argument indeed was that when it's undefined behavior > > > we can do the vectorization given the actual values might be in-bound. > > > > Okay. Would you be fine with the patch to only vectorize when > > trapping math is disabled? I still could take care of the rest on > > s390 backend side by defining the appropriate expanders. Still think > > it is weird though that we might produce different results on > > vectorization than without vectorization. Yes, that is what > > "undefined behaviour" is all about, but we can simply fix this here. > > Nevertheless, how about just adding the trapping math check? > > So to summarize - the problem is different result when vectorizing > when the scalar code invokes undefined behavior? I think there is > nothing to fix and we shouldn't pessimize code not invoking undefined > behavior by adding a trapping math check. > > Or did I misunderstand things?
The different results can still be delt with in the backend. The only remaining part is the question if vectorization of FIX_TRUNC_EXPR in multiple steps should be guarded by the trapping math flag or not? I think it should, but according to CI results, some architectures already rely on the current behaviour. So I am unsure if we should add the flag or not. What is your opinion on that? Regards, Juergen