Re: [PATCH] Use get_nonzero_bits to improve vectorization

Richard Biener Wed, 30 Oct 2013 04:11:32 -0700

On Tue, 29 Oct 2013, Jakub Jelinek wrote:

> On Tue, Oct 29, 2013 at 01:11:53PM +0100, Richard Biener wrote:
> > > +/* Return number of known trailing zero bits in EXPR, or, if the value of
> > > +   EXPR is known to be zero, the precision of it's type.  */
> > > +
> > > +int
> > 
> > unsigned int?
> 
> Ok.
> 
> > > +    case PLUS_EXPR:
> > > +    case MINUS_EXPR:
> > > +    case BIT_IOR_EXPR:
> > > +    case BIT_XOR_EXPR:
> > > +    case MIN_EXPR:
> > > +    case MAX_EXPR:
> > > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > > +      ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > 
> > This first recurses but if either one returns 0 you don't have
> > to (that cuts down the recursion from exponential to linear in
> > the common case?).  Thus, early out on ret == 0?
> 
> Yeah, that is reasonable.  Usually we use this during expansion etc.,
> trees won't be arbitrarily large and for SSA_NAMEs we don't recurse
> on definitions, so I think we are unlikely to ever see very large
> expressions there though.  I've added similar early out to the COND_EXPR
> case.
> 
> > > +      return MIN (ret1, ret2);
> > > +    case POINTER_PLUS_EXPR:
> > > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > > +      ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > > +      ret2 = MIN (ret2, prec);
> > 
> > Why do you need that here but not elsewhere when processing
> > binary ops?
> 
> Because POINTER_PLUS_EXPR (well, also shifts, but for those we
> don't call tree_ctz on the second argument) is the only
> binop we handle that uses different types for the operands,
> for the first argument we know it has the same precision as the result, but
> what if sizetype has bigger precision than TYPE_PRECISION of pointers?
> I know it typically doesn't, just wanted to make sure we never return
> an out of range return value, tree_ctz result should be in [0, prec]
> always.


Ah, indeed.  Maybe worth a comment.

> > > +      return MIN (ret1, ret2);
> > > +    case BIT_AND_EXPR:
> > > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > > +      ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > > +      return MAX (ret1, ret2);
> > > +    case MULT_EXPR:
> > > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > > +      ret2 = tree_ctz (TREE_OPERAND (expr, 1));
> > > +      return MIN (ret1 + ret2, prec);
> > > +    case LSHIFT_EXPR:
> > > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > > +      if (host_integerp (TREE_OPERAND (expr, 1), 1)
> > 
> > check that first before recursing for op0 - if op1 is negative
> > you simply return ret1 which looks wrong, too.
> 
> If op1 is negative, then it is undefined behavior, so assuming
> in that case the same thing as when op1 is not constant (i.e.
> that we worst case shift left by 0 and thus not increase number
> of trailing zeros, or shift left by > 0 and increase it) is IMHO
> correct.  I don't think we treat left shift by negative value as
> right shift anywhere, do we?

We do during constant folding I think.

> > > +   && ((unsigned HOST_WIDE_INT) tree_low_cst (TREE_OPERAND (expr, 1), 1)
> > > +       < (unsigned HOST_WIDE_INT) prec))
> > 
> > This check is to avoid overflowing ret1 + ret2?  If so, why not add
> > 
> > > + {
> > > +   ret2 = tree_low_cst (TREE_OPERAND (expr, 1), 1);
> > 
> >   ret2 = MIN (ret2, prec);
> > 
> > instead?
> 
> Because ret{1,2} are int (well, with the change you've suggested unsigned
> int), but tree_low_cst is UHWI, so if the shift count is say UINT_MAX + 1
> (yeah, surely undefined behavior), I just didn't want to lose any of the upper
> bits.  Sure, alternatively I could have an UHWI temporary variable and
> assign tree_low_cst to it and do the MIN on that temporary and prec, but
> still I think it is better to handle out of range constants as variable
> shift count rather than say shift count up by prec - 1.

Ok.

Thanks,
Richard.

> > > +    case TRUNC_DIV_EXPR:
> > > +    case CEIL_DIV_EXPR:
> > > +    case FLOOR_DIV_EXPR:
> > > +    case ROUND_DIV_EXPR:
> > > +    case EXACT_DIV_EXPR:
> > > +      if (TREE_CODE (TREE_OPERAND (expr, 1)) == INTEGER_CST)
> > > + {
> > > +   ret2 = tree_log2 (TREE_OPERAND (expr, 1));
> > > +   if (ret2 >= 0 && tree_int_cst_sgn (TREE_OPERAND (expr, 1)) == 1)
> > 
> > cheaper to test the sign first.
> 
> Ok.
> > 
> > > +     {
> > > +       ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > > +       if (ret1 > ret2)
> > > +         return ret1 - ret2;
> > > +     }
> > > + }
> > > +      return 0;
> > > +    CASE_CONVERT:
> > > +      ret1 = tree_ctz (TREE_OPERAND (expr, 0));
> > > +      if (ret1 && ret1 == TYPE_PRECISION (TREE_TYPE (TREE_OPERAND (expr, 
> > > 0))))
> > > + ret1 = prec;
> > > +      return MIN (ret1, prec);
> > > +    case SAVE_EXPR:
> > > +      return tree_ctz (TREE_OPERAND (expr, 0));
> > > +    case COND_EXPR:
> > > +      ret1 = tree_ctz (TREE_OPERAND (expr, 1));
> > > +      ret2 = tree_ctz (TREE_OPERAND (expr, 2));
> > > +      return MIN (ret1, ret2);
> > > +    case COMPOUND_EXPR:
> > > +      return tree_ctz (TREE_OPERAND (expr, 1));
> > > +    case ADDR_EXPR:
> > > +      ret1 = get_object_alignment (TREE_OPERAND (expr, 0));
> > 
> > Use get_pointer_alignment, this isn't a memory reference so type
> > alignment rules don't apply.
> 
> Ok.

Re: [PATCH] Use get_nonzero_bits to improve vectorization

Reply via email to