Am Thu, Aug 08, 2024 at 02:06:44PM +0200 schrieb Richard Biener: > On Mon, Aug 5, 2024 at 4:02 PM Juergen Christ <jchr...@linux.ibm.com> wrote: > > > > Am Mon, Aug 05, 2024 at 01:00:31PM +0200 schrieb Richard Biener: > > > On Fri, Aug 2, 2024 at 2:43 PM Juergen Christ <jchr...@linux.ibm.com> > > > wrote: > > > > > > > > Do not convert floats to ints in multiple step if trapping math is > > > > enabled. This might hide some inexact signals. > > > > > > > > Also use correct sign (the sign of the target integer type) for the > > > > intermediate steps. This only affects undefined behaviour (casting > > > > floats to unsigned datatype where the float is negative). > > > > > > > > gcc/ChangeLog: > > > > > > > > * tree-vect-stmts.cc (vectorizable_conversion): multi-step > > > > float to int conversion only with trapping math and correct > > > > sign. > > > > > > > > Signed-off-by: Juergen Christ <jchr...@linux.ibm.com> > > > > > > > > Bootstrapped and tested on x84 and s390. Ok for trunk? > > > > > > > > --- > > > > gcc/tree-vect-stmts.cc | 8 +++++--- > > > > 1 file changed, 5 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc > > > > index fdcda0d2abae..2ddd13383193 100644 > > > > --- a/gcc/tree-vect-stmts.cc > > > > +++ b/gcc/tree-vect-stmts.cc > > > > @@ -5448,7 +5448,8 @@ vectorizable_conversion (vec_info *vinfo, > > > > break; > > > > > > > > cvt_type > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE > > > > (rhs_mode), 0); > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE > > > > (rhs_mode), > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > But lhs_type should be a float type here, the idea that for a > > > FLOAT_EXPR (int -> float) > > > a signed integer type is the natural one to use - as it's 2x wider > > > than the original > > > RHS type it's signedness doesn't matter. Note all float types should be > > > !TYPE_UNSIGNED so this hunk is a no-op but still less clear on the intent > > > IMO. > > > > > > Please drop it. > > > > Will do. Sorry about that. > > > > > > cvt_type = get_same_sized_vectype (cvt_type, vectype_in); > > > > if (cvt_type == NULL_TREE) > > > > goto unsupported; > > > > @@ -5505,10 +5506,11 @@ vectorizable_conversion (vec_info *vinfo, > > > > if (GET_MODE_SIZE (lhs_mode) >= GET_MODE_SIZE (rhs_mode)) > > > > goto unsupported; > > > > > > > > - if (code == FIX_TRUNC_EXPR) > > > > + if (code == FIX_TRUNC_EXPR && !flag_trapping_math) > > > > { > > > > cvt_type > > > > - = build_nonstandard_integer_type (GET_MODE_BITSIZE > > > > (rhs_mode), 0); > > > > + = build_nonstandard_integer_type (GET_MODE_BITSIZE > > > > (rhs_mode), > > > > + TYPE_UNSIGNED (lhs_type)); > > > > > > Here it might be relevant for correctness - we have to choose between > > > sfix and ufix for the float -> [u]int conversion. > > > > > > Do you have a testcase? Shouldn't the exactness be independent of the > > > integer > > > type we convert to? > > > > I was looking at this little program which contains undefined behaviour: > > > > #include <stdio.h> > > > > __attribute__((noinline,noclone,noipa)) > > void > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out); > > > > void > > vec_pack_ufix_trunc_v2df (double *in, unsigned int *out) > > { > > out[0] = in[0]; > > out[1] = in[1]; > > out[2] = in[2]; > > out[3] = in[3]; > > } > > > > int main() > > { > > double in[] = {-1,-2,-3,-4}; > > unsigned int out[4]; > > > > vec_pack_ufix_trunc_v2df (in, out); > > for (int i = 0; i < 4; ++i) > > printf("out[%d] = %u\n", i, out[i]); > > return 0; > > } > > > > On s390x, I get different results after vectorization: > > > > out[0] = 4294967295 > > out[1] = 4294967294 > > out[2] = 4294967293 > > out[3] = 4294967292 > > > > than without vectorization: > > > > out[0] = 0 > > out[1] = 0 > > out[2] = 0 > > out[3] = 0 > > > > Even if this is undefined behaviour, I think it would be nice to have > > consistent results here. > > > > Also, while I added an expander to circumvent this problem in a > > previous patch, reviewers requested to hide this behind trapping math. > > Thus, I looked into this. > > > > Seeing the result from the CI for aarch64, I guess there are some > > tests that actually expect this vectorization to always happen even > > though it might not be save w.r.t. trapping math. > > I do remember this was extensively discussed (but we might have missed > something) and one argument indeed was that when it's undefined behavior > we can do the vectorization given the actual values might be in-bound.
Okay. Would you be fine with the patch to only vectorize when trapping math is disabled? I still could take care of the rest on s390 backend side by defining the appropriate expanders. Still think it is weird though that we might produce different results on vectorization than without vectorization. Yes, that is what "undefined behaviour" is all about, but we can simply fix this here. Nevertheless, how about just adding the trapping math check?