On Wed, 2 Aug 2023, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > [...] > >> >> in vect_determine_precisions_from_range. Maybe we should drop > >> >> the shift handling from there and instead rely on > >> >> vect_determine_precisions_from_users, extending: > >> >> > >> >> if (TREE_CODE (shift) != INTEGER_CST > >> >> || !wi::ltu_p (wi::to_widest (shift), precision)) > >> >> return; > >> >> > >> >> to handle ranges where the max is known to be < precision. > >> >> > >> >> There again, if masking is enough for right shifts and right rotates, > >> >> maybe we should keep the current handling for then (with your fix) > >> >> and skip the types_compatible_p check for those cases. > >> > > >> > I think it should be enough for left-shifts as well? If we lshift > >> > out like 0x100 << 9 so the lhs range is [0,0] the input range from > >> > op0 will still make us use HImode. I think we only ever get overly > >> > conservative answers for left-shifts from this function? > >> > >> But if we have: > >> > >> short x, y; > >> int z = (int) x << (int) y; > >> > >> and at runtime, x == 1, y == 16, (short) z should be 0 (no UB), > >> whereas x << y would invoke UB and x << (y & 15) would be 1. > > > > True, but we start with the range of the LHS which in this case > > would be of type 'int' and thus 1 << 16 and not zero. You > > might call that a failure of vect_determine_precisions_from_range > > of course, since it makes it not exactly a forward propagation ... > > Ah, right, sorry. I should have done more checking. > > > [...] > >> > Originally I completely disabled shift support but that regressed > >> > the over-widen testcases a lot which at least have widened shifts > >> > by constants a lot. > >> > > >> > x86 has vector rotates only for AMD XOP (which is dead) plus > >> > some for V1TImode AFAICS, but I think we pattern-match rotates > >> > to shifts, so maybe the precision stuff is interesting for the > >> > case where we match the pattern rotate sequence for widenings? > >> > > >> > So for the types_compatible_p issue something along > >> > the following? We could also exempt the shift operand from > >> > being covered by min_precision so the consumer would have > >> > to make sure it can be represented (I think that's never going > >> > to be an issue in practice until we get 256bit integers vectorized). > >> > It will have to fixup the shift operands anyway. > >> > > >> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc > >> > index e4ab8c2d65b..cdeeaf98a47 100644 > >> > --- a/gcc/tree-vect-patterns.cc > >> > +++ b/gcc/tree-vect-patterns.cc > >> > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range > >> > (stmt_vec_info stmt_info, gassign *stmt) > >> > } > >> > else if (TREE_CODE (op) == SSA_NAME) > >> > { > >> > - /* Ignore codes that don't take uniform arguments. */ > >> > - if (!types_compatible_p (TREE_TYPE (op), type)) > >> > + /* Ignore codes that don't take uniform arguments. For > >> > shifts > >> > + the shift amount is known to be in-range. */ > >> > >> I guess it's more "we can assume that the amount is in range"? > > > > Yes. > > > >> > + if (code == LSHIFT_EXPR > >> > + || code == RSHIFT_EXPR > >> > + || code == LROTATE_EXPR > >> > + || code == RROTATE_EXPR) > >> > + { > >> > + min_value = wi::min (min_value, 0, sign); > >> > + max_value = wi::max (max_value, TYPE_PRECISION (type), > >> > sign); > >> > >> LGTM for shifts right. Because of the above lshift thing, I think we > >> need something like: > >> > >> if (code == LSHIFT_EXPR || code == LROTATE_EXPR) > >> { > >> wide_int op_min_value, op_max_value; > >> if (!vect_get_range_info (op, &op_min_value, op_max_value)) > >> return; > >> > >> /* We can ignore left shifts by negative amounts, which are UB. */ > >> min_value = wi::min (min_value, 0, sign); > >> > >> /* Make sure the highest non-UB shift amount doesn't become UB. */ > >> op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type)); > >> auto mask = wi::mask (TYPE_PRECISION (type), false, > >> op_max_value.to_uhwi ()); > >> max_value = wi::max (max_value, mask, sign); > >> } > >> > >> Does that look right? > > > > As said it looks overly conservative to me? For example with my patch > > for > > > > void foo (signed char *v, int s) > > { > > if (s < 1 || s > 7) > > return; > > for (int i = 0; i < 1024; ++i) > > v[i] = v[i] << s; > > } > > > > I get > > > > t.c:5:21: note: _7 has range [0xffffc000, 0x3f80] > > t.c:5:21: note: can narrow to signed:15 without loss of precision: _7 = > > _6 << s_12(D); > > t.c:5:21: note: only the low 15 bits of _6 are significant > > t.c:5:21: note: _6 has range [0xffffff80, 0x7f] > > ... > > t.c:5:21: note: vect_recog_over_widening_pattern: detected: _7 = _6 << > > s_12(D); > > t.c:5:21: note: demoting int to signed short > > t.c:5:21: note: Splitting statement: _6 = (int) _5; > > t.c:5:21: note: into pattern statements: patt_24 = (signed short) _5; > > t.c:5:21: note: and: patt_23 = (int) patt_24; > > t.c:5:21: note: created pattern stmt: patt_21 = patt_24 << patt_22; > > t.c:5:21: note: over_widening pattern recognized: patt_18 = (int) > > patt_21; > > > > and eventually > > > > patt_22 = (signed short) s_12(D); > > ... > > vect__5.7_28 = MEM <vector(16) signed char> [(signed char > > *)vectp_v.5_10]; > > vect_patt_24.8_29 = [vec_unpack_lo_expr] vect__5.7_28; > > vect_patt_24.8_30 = [vec_unpack_hi_expr] vect__5.7_28; > > vect_patt_21.9_31 = vect_patt_24.8_29 << patt_22; > > vect_patt_21.9_32 = vect_patt_24.8_30 << patt_22; > > vect_patt_17.10_33 = VEC_PACK_TRUNC_EXPR <vect_patt_21.9_31, > > vect_patt_21.9_32>; > > MEM <vector(16) signed char> [(signed char *)vectp_v.11_34] = > > vect_patt_17.10_33; > > > > which I think is correct. > > Agreed. > > > If I'd up the range of the shift > > operand then the range for _7 would be larger and we wouldn't > > be able to shrink the range. I suppose the _from_users > > might back-propagate the truncation to signed char and enable > > that though? > > Yeah, that's the mechanism I was getting confused with, but like > you say, it isn't a factor for vect_determine_precisions_from_range. > > >> Thanks, and sorry for the scope creep. > > > > Well, if we can agree on that the patch as originally is > > correct (but incomplete) we can maybe still install it. > > Yeah, works for me. > > > As seen above I wonder if there's really an existing problem > > for LSHIFT_EXPR. > > No, maybe not. For: > > void foo (int *v, int s) > { > for (int i = 0; i < 1024; ++i) > v[i] = 2 << ((v[i] & 3) * 8 + 7); > } > > we get: > > note: _8 has range [0x0, 0x1010100] > note: _7 has range [0x7, 0x1f] > note: can narrow to unsigned:25 without loss of precision: _8 = 2 << _7; > > which isn't true, but I don't think anything will come of it. > > With a bit of work, we could in principle realise that: > > void foo (short *v, int s) > { > for (int i = 0; i < 1024; ++i) > v[i] = 2 << (((v[i] & 3) - 1) & 31); > } > > gives a shift range of [0,2] \union {31}, and so the shift result is > either 0, 2, 4 or 8 (range [0, 8]), but we don't compute that result > range at present. > > So yeah, I still can't trigger an actual problem. > > > I'm sure we mishandle rotates, but again at the consumer side - in > > priciple we could analyze them as lshift + rshift which means the > > existing behavior is probably also correct. > > Yeah, please go ahead with the original patch. And sorry for the > distraction. It's been a useful exercise for me at least...
Installed. Indeed it was useful for me as well. Richard.