Re: [PATCH] fold_range_test like optimization on GIMPLE (PR tree-optimization/46309)

Jakub Jelinek Fri, 30 Sep 2011 04:11:45 -0700

On Fri, Sep 30, 2011 at 12:33:07PM +0200, Richard Guenther wrote:
> > +  low = build_int_cst (TREE_TYPE (exp), 0);
> > +  high = low;
> > +  in_p = 0;
> > +  strict_overflow_p = false;
> > +  is_bool = TREE_CODE (TREE_TYPE (exp)) == BOOLEAN_TYPE;
> 
> Effective boolean are also TYPE_PRECISION () == 1 types.  Remember
> we don't preserve conversions from BOOLEAN_TYPE to such types.


I can replace these with TYPE_PRECISION (TREE_TYPE (exp)) == 1;
checks if you prefer, though maybe it would need to also do
&& TYPE_UNSIGNED (TREE_TYPE (exp)), at least if different operands of
the | have different inner 1-bit signedness we could not merge them.

> > +       CASE_CONVERT:
> > +         is_bool |= TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE;
> 
> Likewise.  Though I wonder why !=?  Does it matter whether the extension
> sign- or zero-extends?

I think the |= is needed, this loop follows through not just casts, but
then also through the comparisons and again through casts.
It doesn't matter if the types or casts inside of the comparison argument
are bool or not (and likely they will not be), yet we don't want to stop
iterating because of that.
It wants to reconstruct ranges from what e.g. fold in fold_range_test
already created before, so stuff like
(int) (((unsigned) x - 64U) <= 31U)
for + [64, 95] range.

> > +                 if (integer_onep (range_binop (LT_EXPR, integer_type_node,
> > +                                                p->low, 0, q->low, 0)))
> 
> that's just
> 
> tem = fold_binary (LT_EXPR, boolean_type_node, p->low, q->low);
> if (tem && integer_onep (tem))
>   return -1;

Ok, can change that.

> which avoids building the LT_EXPR tree if it doesn't fold.  Similar below.
> (ISTR some integer_onep () variant that handles a NULL argument ...)

Couldn't find any.  integer_onep needs non-NULL, compare_tree_int too.

> > +  /* Try to merge ranges.  */
> > +  for (first = i; i < length; i++)
> > +    {
> > +      tree low = ranges[i].low;
> > +      tree high = ranges[i].high;
> > +      int in_p = ranges[i].in_p;
> > +      bool strict_overflow_p = ranges[i].strict_overflow_p;
> > +
> > +      for (j = i + 1; j < length; j++)
> > +       {
> 
> That looks quadratic - do we want to limit this with a --param, simply
> partitioning the array into quadratic chunks?

This isn't quadratic (except in the hypothetical case
where all merge_ranges calls would succeed, but then
build_range_test would fail).  In the likely case where if range merges
succeed then update_range_test succeeds too this is just linear (plus the
qsort before that which isn't linear though).
So perhaps:
+      if (j > i + 1
+         && update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode,
+                               ops, ranges[i].exp, in_p, low, high,
+                               strict_overflow_p))
+       {
+         i = j - 1;
+         any_changes = true;
+       }
could be
+      if (j > i + 1)
+       {
+         if (update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode,
+                               ops, ranges[i].exp, in_p, low, high,
+                               strict_overflow_p))
+           any_changes = true;
+         i = j - 1;
+       }
(then it isn't quadratic), or could try a few times before giving up:
+      if (j > i + 1)
+       {
+         if (update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode,
+                               ops, ranges[i].exp, in_p, low, high,
+                               strict_overflow_p))
+           {
+             any_changes = true;
+             i = j - 1;
+           }
+         else if (update_fail_count == 64)
+           i = j - 1;
+         else
+           update_fail_count = 0;
+       }
where int update_fail_count = 0; would be after the
      bool strict_overflow_p = ranges[i].strict_overflow_p;
line in the outer loop.

> > +         if (ranges[i].exp != ranges[j].exp)
> > +           break;
> 
> Or isn't it too bad because of this check?

The above limits the chunks to a particular SSA_NAME.  Within each
chunk for the same SSA_NAME, the ranges are sorted in a way that
merge_ranges will likely succeed, so it isn't quadratic unless
update_range_test fails (see above).

BTW, the second loop (the one that attempts to optimize
x == 1 || x == 3 into (x & ~2) == 1 etc. is quadratic, which is why
there is
  for (j = i + 1; j < length && j < i + 64; j++)
don't think it is a limit people will often run into and thus I don't
think it is worth adding a --param= for that.

> > +         if (!merge_ranges (&in_p, &low, &high, in_p, low, high,
> > +                            ranges[j].in_p, ranges[j].low, ranges[j].high))
> > +           break;
> 
> And this early out?  I suppose some comment on why together with
> the sorting this is of limited complexity would help.
> 
> > @@ -2447,6 +2864,9 @@ reassociate_bb (basic_block bb)
> >                  optimize_ops_list (rhs_code, &ops);
> >                }
> >
> > +             if (rhs_code == BIT_IOR_EXPR || rhs_code == BIT_AND_EXPR)
> 
> I think we want to check that the type is boolean here, so we don't setup
> the machinery needlessly?

It is boolean only in some testcases, the is_bool stuff discussed at the
beginning above was originally just an early return
  if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE)
    return;
before the loop, but it turned out that often the type of the | operands
is integer, with either bool casted to integer, or with the type of EQ_EXPR
etc. being integer instead of bool.

        Jakub

Re: [PATCH] fold_range_test like optimization on GIMPLE (PR tree-optimization/46309)

Reply via email to