Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

Richard Biener via Gcc-patches Thu, 03 Aug 2023 05:54:02 -0700

On Wed, 2 Aug 2023, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > [...]
> >> >> in vect_determine_precisions_from_range.  Maybe we should drop
> >> >> the shift handling from there and instead rely on
> >> >> vect_determine_precisions_from_users, extending:
> >> >> 
> >> >>         if (TREE_CODE (shift) != INTEGER_CST
> >> >>             || !wi::ltu_p (wi::to_widest (shift), precision))
> >> >>           return;
> >> >> 
> >> >> to handle ranges where the max is known to be < precision.
> >> >> 
> >> >> There again, if masking is enough for right shifts and right rotates,
> >> >> maybe we should keep the current handling for then (with your fix)
> >> >> and skip the types_compatible_p check for those cases.
> >> >
> >> > I think it should be enough for left-shifts as well?  If we lshift
> >> > out like 0x100 << 9 so the lhs range is [0,0] the input range from
> >> > op0 will still make us use HImode.  I think we only ever get overly
> >> > conservative answers for left-shifts from this function?
> >> 
> >> But if we have:
> >> 
> >>   short x, y;
> >>   int z = (int) x << (int) y;
> >> 
> >> and at runtime, x == 1, y == 16, (short) z should be 0 (no UB),
> >> whereas x << y would invoke UB and x << (y & 15) would be 1.
> >
> > True, but we start with the range of the LHS which in this case
> > would be of type 'int' and thus 1 << 16 and not zero.  You
> > might call that a failure of vect_determine_precisions_from_range
> > of course, since it makes it not exactly a forward propagation ...
> 
> Ah, right, sorry.  I should have done more checking.
> 
> > [...]
> >> > Originally I completely disabled shift support but that regressed
> >> > the over-widen testcases a lot which at least have widened shifts
> >> > by constants a lot.
> >> >
> >> > x86 has vector rotates only for AMD XOP (which is dead) plus
> >> > some for V1TImode AFAICS, but I think we pattern-match rotates
> >> > to shifts, so maybe the precision stuff is interesting for the
> >> > case where we match the pattern rotate sequence for widenings?
> >> >
> >> > So for the types_compatible_p issue something along
> >> > the following?  We could also exempt the shift operand from
> >> > being covered by min_precision so the consumer would have
> >> > to make sure it can be represented (I think that's never going
> >> > to be an issue in practice until we get 256bit integers vectorized).
> >> > It will have to fixup the shift operands anyway.
> >> >
> >> > diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
> >> > index e4ab8c2d65b..cdeeaf98a47 100644
> >> > --- a/gcc/tree-vect-patterns.cc
> >> > +++ b/gcc/tree-vect-patterns.cc
> >> > @@ -6378,16 +6378,26 @@ vect_determine_precisions_from_range 
> >> > (stmt_vec_info stmt_info, gassign *stmt)
> >> >           }
> >> >         else if (TREE_CODE (op) == SSA_NAME)
> >> >           {
> >> > -           /* Ignore codes that don't take uniform arguments.  */
> >> > -           if (!types_compatible_p (TREE_TYPE (op), type))
> >> > +           /* Ignore codes that don't take uniform arguments.  For 
> >> > shifts
> >> > +              the shift amount is known to be in-range.  */
> >> 
> >> I guess it's more "we can assume that the amount is in range"?
> >
> > Yes.
> >
> >> > +           if (code == LSHIFT_EXPR
> >> > +               || code == RSHIFT_EXPR
> >> > +               || code == LROTATE_EXPR
> >> > +               || code == RROTATE_EXPR)
> >> > +             {
> >> > +               min_value = wi::min (min_value, 0, sign);
> >> > +               max_value = wi::max (max_value, TYPE_PRECISION (type), 
> >> > sign);
> >> 
> >> LGTM for shifts right.  Because of the above lshift thing, I think we
> >> need something like:
> >> 
> >>   if (code == LSHIFT_EXPR || code == LROTATE_EXPR)
> >>     {
> >>       wide_int op_min_value, op_max_value;
> >>       if (!vect_get_range_info (op, &op_min_value, op_max_value))
> >>         return;
> >> 
> >>       /* We can ignore left shifts by negative amounts, which are UB.  */
> >>       min_value = wi::min (min_value, 0, sign);
> >> 
> >>       /* Make sure the highest non-UB shift amount doesn't become UB.  */
> >>       op_max_value = wi::umin (op_max_value, TYPE_PRECISION (type));
> >>       auto mask = wi::mask (TYPE_PRECISION (type), false,
> >>                        op_max_value.to_uhwi ());
> >>       max_value = wi::max (max_value, mask, sign);
> >>     }
> >> 
> >> Does that look right?
> >
> > As said it looks overly conservative to me?  For example with my patch
> > for
> >
> > void foo (signed char *v, int s)
> > {
> >   if (s < 1 || s > 7)
> >     return;
> >   for (int i = 0; i < 1024; ++i)
> >     v[i] = v[i] << s;
> > }
> >
> > I get
> >
> > t.c:5:21: note:   _7 has range [0xffffc000, 0x3f80]
> > t.c:5:21: note:   can narrow to signed:15 without loss of precision: _7 = 
> > _6 << s_12(D);
> > t.c:5:21: note:   only the low 15 bits of _6 are significant
> > t.c:5:21: note:   _6 has range [0xffffff80, 0x7f]
> > ...
> > t.c:5:21: note:   vect_recog_over_widening_pattern: detected: _7 = _6 << 
> > s_12(D);
> > t.c:5:21: note:   demoting int to signed short
> > t.c:5:21: note:   Splitting statement: _6 = (int) _5;
> > t.c:5:21: note:   into pattern statements: patt_24 = (signed short) _5;
> > t.c:5:21: note:   and: patt_23 = (int) patt_24;
> > t.c:5:21: note:   created pattern stmt: patt_21 = patt_24 << patt_22;
> > t.c:5:21: note:   over_widening pattern recognized: patt_18 = (int) 
> > patt_21;
> >
> > and eventually
> >
> >   patt_22 = (signed short) s_12(D);
> > ...
> >   vect__5.7_28 = MEM <vector(16) signed char> [(signed char 
> > *)vectp_v.5_10];
> >   vect_patt_24.8_29 = [vec_unpack_lo_expr] vect__5.7_28;
> >   vect_patt_24.8_30 = [vec_unpack_hi_expr] vect__5.7_28;
> >   vect_patt_21.9_31 = vect_patt_24.8_29 << patt_22;
> >   vect_patt_21.9_32 = vect_patt_24.8_30 << patt_22;
> >   vect_patt_17.10_33 = VEC_PACK_TRUNC_EXPR <vect_patt_21.9_31, 
> > vect_patt_21.9_32>;
> >   MEM <vector(16) signed char> [(signed char *)vectp_v.11_34] = 
> > vect_patt_17.10_33;
> >
> > which I think is correct.
> 
> Agreed.
> 
> > If I'd up the range of the shift
> > operand then the range for _7 would be larger and we wouldn't
> > be able to shrink the range.  I suppose the _from_users
> > might back-propagate the truncation to signed char and enable
> > that though?
> 
> Yeah, that's the mechanism I was getting confused with, but like
> you say, it isn't a factor for vect_determine_precisions_from_range.
> 
> >> Thanks, and sorry for the scope creep.
> >
> > Well, if we can agree on that the patch as originally is
> > correct (but incomplete) we can maybe still install it.
> 
> Yeah, works for me.
> 
> > As seen above I wonder if there's really an existing problem
> > for LSHIFT_EXPR.
> 
> No, maybe not.  For:
> 
> void foo (int *v, int s)
> {
>   for (int i = 0; i < 1024; ++i)
>     v[i] = 2 << ((v[i] & 3) * 8 + 7);
> }
> 
> we get:
> 
> note:   _8 has range [0x0, 0x1010100]
> note:   _7 has range [0x7, 0x1f]
> note:   can narrow to unsigned:25 without loss of precision: _8 = 2 << _7;
> 
> which isn't true, but I don't think anything will come of it.
> 
> With a bit of work, we could in principle realise that:
> 
> void foo (short *v, int s)
> {
>   for (int i = 0; i < 1024; ++i)
>     v[i] = 2 << (((v[i] & 3) - 1) & 31);
> }
> 
> gives a shift range of [0,2] \union {31}, and so the shift result is
> either 0, 2, 4 or 8 (range [0, 8]), but we don't compute that result
> range at present.
> 
> So yeah, I still can't trigger an actual problem.
> 
> > I'm sure we mishandle rotates, but again at the consumer side - in
> > priciple we could analyze them as lshift + rshift which means the
> > existing behavior is probably also correct.
> 
> Yeah, please go ahead with the original patch.  And sorry for the
> distraction.  It's been a useful exercise for me at least...


Installed.  Indeed it was useful for me as well.

Richard.

Re: [PATCH] tree-optimization/110838 - vectorization of widened shifts

Reply via email to