On Wed, 2 Aug 2023, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Tue, 1 Aug 2023, Richard Sandiford wrote: > > > >> Richard Sandiford <richard.sandif...@arm.com> writes: > >> > Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > >> >> The following makes sure to limit the shift operand when vectorizing > >> >> (short)((int)x >> 31) via (short)x >> 31 as the out of bounds shift > >> >> operand otherwise invokes undefined behavior. When we determine > >> >> whether we can demote the operand we know we at most shift in the > >> >> sign bit so we can adjust the shift amount. > >> >> > >> >> Note this has the possibility of un-CSEing common shift operands > >> >> as there's no good way to share pattern stmts between patterns. > >> >> We'd have to separately pattern recognize the definition. > >> >> > >> >> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > >> >> > >> >> Not sure about LSHIFT_EXPR, it probably has the same issue but > >> >> the fallback optimistic zero for out-of-range shifts is at least > >> >> "corrrect". Not sure we ever try to demote rotates (probably not). > >> > > >> > I guess you mean "correct" for x86? But that's just a quirk of x86. > >> > IMO the behaviour is equally wrong for LSHIFT_EXPR. > > > > I meant "correct" for the constant folding that evaluates out-of-bound > > shifts as zero. > > > >> Sorry for the multiple messages. Wanted to get something out quickly > >> because I wasn't sure how long it would take me to write this... > >> > >> On rotates, for: > >> > >> void > >> foo (unsigned short *restrict ptr) > >> { > >> for (int i = 0; i < 200; ++i) > >> { > >> unsigned int x = ptr[i] & 0xff0; > >> ptr[i] = (x << 1) | (x >> 31); > >> } > >> } > >> > >> we do get: > >> > >> can narrow to unsigned:13 without loss of precision: _5 = x_12 r>> 31; > >> > >> although aarch64 doesn't provide rrotate patterns, so nothing actually > >> comes of it. > > > > I think it's still correct that we only need unsigned:13 for the input, > > we know other bits are zero. But of course when actually applying > > this as documented > > > > /* Record that STMT_INFO could be changed from operating on TYPE to > > operating on a type with the precision and sign given by PRECISION > > and SIGN respectively. > > > > the operation itself has to be altered (the above doesn't suggest > > promoting/demoting the operands to TYPE is the only thing to do). > > > > So it seems to be the burden is on the consumers of the information? > > Yeah, textually that seems fair. Not sure I was thinking of it in > those terms at the time though. :) > > >> I think the handling of variable shifts is flawed for other reasons. > >> Given: > >> > >> void > >> uu (unsigned short *restrict ptr1, unsigned short *restrict ptr2) > >> { > >> for (int i = 0; i < 200; ++i) > >> ptr1[i] = ptr1[i] >> ptr2[i]; > >> } > >> > >> void > >> us (unsigned short *restrict ptr1, short *restrict ptr2) > >> { > >> for (int i = 0; i < 200; ++i) > >> ptr1[i] = ptr1[i] >> ptr2[i]; > >> } > >> > >> void > >> su (short *restrict ptr1, unsigned short *restrict ptr2) > >> { > >> for (int i = 0; i < 200; ++i) > >> ptr1[i] = ptr1[i] >> ptr2[i]; > >> } > >> > >> void > >> ss (short *restrict ptr1, short *restrict ptr2) > >> { > >> for (int i = 0; i < 200; ++i) > >> ptr1[i] = ptr1[i] >> ptr2[i]; > >> } > >> > >> we only narrow uu and ss, due to: > >> > >> /* Ignore codes that don't take uniform arguments. */ > >> if (!types_compatible_p (TREE_TYPE (op), type)) > >> return; > > > > I suppose that's because we care about the shift operand at all here. > > We could possibly use [0 .. precision-1] as known range for it > > and only if that doesn't fit 'type' give up (and otherwise simply > > ignore the input range of the shift operands here). > > > >> in vect_determine_precisions_from_range. Maybe we should drop > >> the shift handling from there and instead rely on > >> vect_determine_precisions_from_users, extending: > >> > >> if (TREE_CODE (shift) != INTEGER_CST > >> || !wi::ltu_p (wi::to_widest (shift), precision)) > >> return; > >> > >> to handle ranges where the max is known to be < precision. > >> > >> There again, if masking is enough for right shifts and right rotates, > >> maybe we should keep the current handling for then (with your fix) > >> and skip the types_compatible_p check for those cases. > > > > I think it should be enough for left-shifts as well? If we lshift > > out like 0x100 << 9 so the lhs range is [0,0] the input range from > > op0 will still make us use HImode. I think we only ever get overly > > conservative answers for left-shifts from this function? > > But if we have: > > short x, y; > int z = (int) x << (int) y; > > and at runtime, x == 1, y == 16, (short) z should be 0 (no UB), > whereas x << y would invoke UB and x << (y & 15) would be 1.
True, but we start with the range of the LHS which in this case would be of type 'int' and thus 1 << 16 and not zero. You might call that a failure of vect_determine_precisions_from_range of course, since it makes it not exactly a forward propagation ... > > Whatever works for RROTATE should also work for LROTATE. > > I think the same problem affects LROTATE. > > >> So: > >> > >> - restrict shift handling in vect_determine_precisions_from_range to > >> RSHIFT_EXPR and RROTATE_EXPR > >> > >> - remove types_compatible_p restriction for those cases > >> > >> - extend vect_determine_precisions_from_users shift handling to check > >> for ranges on the shift amount > >> > >> Does that sound right? > > > > I'm not sure. This all felt somewhat fragile when looking closer > > (I was hoping you would eventually tackle it from the older > > referenced bug) ... so my main idea was to perform incremental changes > > where I have test coverage (as with the new wrong-code bug). > > Fair :) > > > Originally I completely disabled shift support but that regressed > > the over-widen testcases a lot which at least have widened shifts > > by constants a lot. > > > > x86 has vector rotates only for AMD XOP (which is dead) plus > > some for V1TImode AFAICS, but I think we pattern-match rotates > > to shifts, so maybe the precision stuff is interesting for the > > case where we match the pattern rotate sequence for widenings? > > > > So for the types_compatible_p issue something along > > the following? We could also exempt the shift operand from > > being covered by min_precision so the consumer would have > > to make sure it can be represented (I think that's never going > > to be an issue in practice until we get 256bit integers vectorized). > > It will have to fixup the shift operands anyway. > > > > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > > index e4ab8c2d65b..cdeeaf98a47 100644 > > --- a/gcc/tree-vect-patterns.cc > > +++ b/gcc/tree-vect-patterns.cc > > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range > > (stmt_vec_info stmt_info, gassign *stmt) > > } > > else if (TREE_CODE (op) == SSA_NAME) > > { > > - /* Ignore codes that don't take uniform arguments. */ > > - if (!types_compatible_p (TREE_TYPE (op), type)) > > + /* Ignore codes that don't take uniform arguments. For shifts > > + the shift amount is known to be in-range. */ > > I guess it's more "we can assume that the amount is in range"? Yes. > > + if (code == LSHIFT_EXPR > > + || code == RSHIFT_EXPR > > + || code == LROTATE_EXPR > > + || code == RROTATE_EXPR) > > + { > > + min_value = wi::min (min_value, 0, sign); > > + max_value = wi::max (max_value, TYPE_PRECISION (type), > > sign); > > LGTM for shifts right. Because of the above lshift thing, I think we > need something like: > > if (code == LSHIFT_EXPR || code == LROTATE_EXPR) > { > wide_int op_min_value, op_max_value; > if (!vect_get_range_info (op, &op_min_value, op_max_value)) > return; > > /* We can ignore left shifts by negative amounts, which are UB. */ > min_value = wi::min (min_value, 0, sign); > > /* Make sure the highest non-UB shift amount doesn't become UB. */ > op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type)); > auto mask = wi::mask (TYPE_PRECISION (type), false, > op_max_value.to_uhwi ()); > max_value = wi::max (max_value, mask, sign); > } > > Does that look right? As said it looks overly conservative to me? For example with my patch for void foo (signed char *v, int s) { if (s < 1 || s > 7) return; for (int i = 0; i < 1024; ++i) v[i] = v[i] << s; } I get t.c:5:21: note: _7 has range [0xffffc000, 0x3f80] t.c:5:21: note: can narrow to signed:15 without loss of precision: _7 = _6 << s_12(D); t.c:5:21: note: only the low 15 bits of _6 are significant t.c:5:21: note: _6 has range [0xffffff80, 0x7f] ... t.c:5:21: note: vect_recog_over_widening_pattern: detected: _7 = _6 << s_12(D); t.c:5:21: note: demoting int to signed short t.c:5:21: note: Splitting statement: _6 = (int) _5; t.c:5:21: note: into pattern statements: patt_24 = (signed short) _5; t.c:5:21: note: and: patt_23 = (int) patt_24; t.c:5:21: note: created pattern stmt: patt_21 = patt_24 << patt_22; t.c:5:21: note: over_widening pattern recognized: patt_18 = (int) patt_21; and eventually patt_22 = (signed short) s_12(D); ... vect__5.7_28 = MEM <vector(16) signed char> [(signed char *)vectp_v.5_10]; vect_patt_24.8_29 = [vec_unpack_lo_expr] vect__5.7_28; vect_patt_24.8_30 = [vec_unpack_hi_expr] vect__5.7_28; vect_patt_21.9_31 = vect_patt_24.8_29 << patt_22; vect_patt_21.9_32 = vect_patt_24.8_30 << patt_22; vect_patt_17.10_33 = VEC_PACK_TRUNC_EXPR <vect_patt_21.9_31, vect_patt_21.9_32>; MEM <vector(16) signed char> [(signed char *)vectp_v.11_34] = vect_patt_17.10_33; which I think is correct. If I'd up the range of the shift operand then the range for _7 would be larger and we wouldn't be able to shrink the range. I suppose the _from_users might back-propagate the truncation to signed char and enable that though? > Thanks, and sorry for the scope creep. Well, if we can agree on that the patch as originally is correct (but incomplete) we can maybe still install it. As seen above I wonder if there's really an existing problem for LSHIFT_EXPR. I'm sure we mishandle rotates, but again at the consumer side - in priciple we could analyze them as lshift + rshift which means the existing behavior is probably also correct. Ricahrd. > Richard > > > + } > > + else if (types_compatible_p (TREE_TYPE (op), type)) > > + { > > + wide_int op_min_value, op_max_value; > > + if (!vect_get_range_info (op, &op_min_value, > > &op_max_value)) > > + return; > > + min_value = wi::min (min_value, op_min_value, sign); > > + max_value = wi::max (max_value, op_max_value, sign); > > + } > > + else > > return; > > - > > - wide_int op_min_value, op_max_value; > > - if (!vect_get_range_info (op, &op_min_value, &op_max_value)) > > - return; > > - > > - min_value = wi::min (min_value, op_min_value, sign); > > - max_value = wi::max (max_value, op_max_value, sign); > > } > > else > > return;