Richard Biener <rguent...@suse.de> writes: > [...] >> >> in vect_determine_precisions_from_range. Maybe we should drop >> >> the shift handling from there and instead rely on >> >> vect_determine_precisions_from_users, extending: >> >> >> >> if (TREE_CODE (shift) != INTEGER_CST >> >> || !wi::ltu_p (wi::to_widest (shift), precision)) >> >> return; >> >> >> >> to handle ranges where the max is known to be < precision. >> >> >> >> There again, if masking is enough for right shifts and right rotates, >> >> maybe we should keep the current handling for then (with your fix) >> >> and skip the types_compatible_p check for those cases. >> > >> > I think it should be enough for left-shifts as well? If we lshift >> > out like 0x100 << 9 so the lhs range is [0,0] the input range from >> > op0 will still make us use HImode. I think we only ever get overly >> > conservative answers for left-shifts from this function? >> >> But if we have: >> >> short x, y; >> int z = (int) x << (int) y; >> >> and at runtime, x == 1, y == 16, (short) z should be 0 (no UB), >> whereas x << y would invoke UB and x << (y & 15) would be 1. > > True, but we start with the range of the LHS which in this case > would be of type 'int' and thus 1 << 16 and not zero. You > might call that a failure of vect_determine_precisions_from_range > of course, since it makes it not exactly a forward propagation ...
Ah, right, sorry. I should have done more checking. > [...] >> > Originally I completely disabled shift support but that regressed >> > the over-widen testcases a lot which at least have widened shifts >> > by constants a lot. >> > >> > x86 has vector rotates only for AMD XOP (which is dead) plus >> > some for V1TImode AFAICS, but I think we pattern-match rotates >> > to shifts, so maybe the precision stuff is interesting for the >> > case where we match the pattern rotate sequence for widenings? >> > >> > So for the types_compatible_p issue something along >> > the following? We could also exempt the shift operand from >> > being covered by min_precision so the consumer would have >> > to make sure it can be represented (I think that's never going >> > to be an issue in practice until we get 256bit integers vectorized). >> > It will have to fixup the shift operands anyway. >> > >> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc >> > index e4ab8c2d65b..cdeeaf98a47 100644 >> > --- a/gcc/tree-vect-patterns.cc >> > +++ b/gcc/tree-vect-patterns.cc >> > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range >> > (stmt_vec_info stmt_info, gassign *stmt) >> > } >> > else if (TREE_CODE (op) == SSA_NAME) >> > { >> > - /* Ignore codes that don't take uniform arguments. */ >> > - if (!types_compatible_p (TREE_TYPE (op), type)) >> > + /* Ignore codes that don't take uniform arguments. For shifts >> > + the shift amount is known to be in-range. */ >> >> I guess it's more "we can assume that the amount is in range"? > > Yes. > >> > + if (code == LSHIFT_EXPR >> > + || code == RSHIFT_EXPR >> > + || code == LROTATE_EXPR >> > + || code == RROTATE_EXPR) >> > + { >> > + min_value = wi::min (min_value, 0, sign); >> > + max_value = wi::max (max_value, TYPE_PRECISION (type), >> > sign); >> >> LGTM for shifts right. Because of the above lshift thing, I think we >> need something like: >> >> if (code == LSHIFT_EXPR || code == LROTATE_EXPR) >> { >> wide_int op_min_value, op_max_value; >> if (!vect_get_range_info (op, &op_min_value, op_max_value)) >> return; >> >> /* We can ignore left shifts by negative amounts, which are UB. */ >> min_value = wi::min (min_value, 0, sign); >> >> /* Make sure the highest non-UB shift amount doesn't become UB. */ >> op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type)); >> auto mask = wi::mask (TYPE_PRECISION (type), false, >> op_max_value.to_uhwi ()); >> max_value = wi::max (max_value, mask, sign); >> } >> >> Does that look right? > > As said it looks overly conservative to me? For example with my patch > for > > void foo (signed char *v, int s) > { > if (s < 1 || s > 7) > return; > for (int i = 0; i < 1024; ++i) > v[i] = v[i] << s; > } > > I get > > t.c:5:21: note: _7 has range [0xffffc000, 0x3f80] > t.c:5:21: note: can narrow to signed:15 without loss of precision: _7 = > _6 << s_12(D); > t.c:5:21: note: only the low 15 bits of _6 are significant > t.c:5:21: note: _6 has range [0xffffff80, 0x7f] > ... > t.c:5:21: note: vect_recog_over_widening_pattern: detected: _7 = _6 << > s_12(D); > t.c:5:21: note: demoting int to signed short > t.c:5:21: note: Splitting statement: _6 = (int) _5; > t.c:5:21: note: into pattern statements: patt_24 = (signed short) _5; > t.c:5:21: note: and: patt_23 = (int) patt_24; > t.c:5:21: note: created pattern stmt: patt_21 = patt_24 << patt_22; > t.c:5:21: note: over_widening pattern recognized: patt_18 = (int) > patt_21; > > and eventually > > patt_22 = (signed short) s_12(D); > ... > vect__5.7_28 = MEM <vector(16) signed char> [(signed char > *)vectp_v.5_10]; > vect_patt_24.8_29 = [vec_unpack_lo_expr] vect__5.7_28; > vect_patt_24.8_30 = [vec_unpack_hi_expr] vect__5.7_28; > vect_patt_21.9_31 = vect_patt_24.8_29 << patt_22; > vect_patt_21.9_32 = vect_patt_24.8_30 << patt_22; > vect_patt_17.10_33 = VEC_PACK_TRUNC_EXPR <vect_patt_21.9_31, > vect_patt_21.9_32>; > MEM <vector(16) signed char> [(signed char *)vectp_v.11_34] = > vect_patt_17.10_33; > > which I think is correct. Agreed. > If I'd up the range of the shift > operand then the range for _7 would be larger and we wouldn't > be able to shrink the range. I suppose the _from_users > might back-propagate the truncation to signed char and enable > that though? Yeah, that's the mechanism I was getting confused with, but like you say, it isn't a factor for vect_determine_precisions_from_range. >> Thanks, and sorry for the scope creep. > > Well, if we can agree on that the patch as originally is > correct (but incomplete) we can maybe still install it. Yeah, works for me. > As seen above I wonder if there's really an existing problem > for LSHIFT_EXPR. No, maybe not. For: void foo (int *v, int s) { for (int i = 0; i < 1024; ++i) v[i] = 2 << ((v[i] & 3) * 8 + 7); } we get: note: _8 has range [0x0, 0x1010100] note: _7 has range [0x7, 0x1f] note: can narrow to unsigned:25 without loss of precision: _8 = 2 << _7; which isn't true, but I don't think anything will come of it. With a bit of work, we could in principle realise that: void foo (short *v, int s) { for (int i = 0; i < 1024; ++i) v[i] = 2 << (((v[i] & 3) - 1) & 31); } gives a shift range of [0,2] \union {31}, and so the shift result is either 0, 2, 4 or 8 (range [0, 8]), but we don't compute that result range at present. So yeah, I still can't trigger an actual problem. > I'm sure we mishandle rotates, but again at the consumer side - in > priciple we could analyze them as lshift + rshift which means the > existing behavior is probably also correct. Yeah, please go ahead with the original patch. And sorry for the distraction. It's been a useful exercise for me at least... Richard