On Fri, Sep 30, 2011 at 12:33:07PM +0200, Richard Guenther wrote: > > + low = build_int_cst (TREE_TYPE (exp), 0); > > + high = low; > > + in_p = 0; > > + strict_overflow_p = false; > > + is_bool = TREE_CODE (TREE_TYPE (exp)) == BOOLEAN_TYPE; > > Effective boolean are also TYPE_PRECISION () == 1 types. Remember > we don't preserve conversions from BOOLEAN_TYPE to such types.
I can replace these with TYPE_PRECISION (TREE_TYPE (exp)) == 1; checks if you prefer, though maybe it would need to also do && TYPE_UNSIGNED (TREE_TYPE (exp)), at least if different operands of the | have different inner 1-bit signedness we could not merge them. > > + CASE_CONVERT: > > + is_bool |= TREE_CODE (TREE_TYPE (arg0)) == BOOLEAN_TYPE; > > Likewise. Though I wonder why !=? Does it matter whether the extension > sign- or zero-extends? I think the |= is needed, this loop follows through not just casts, but then also through the comparisons and again through casts. It doesn't matter if the types or casts inside of the comparison argument are bool or not (and likely they will not be), yet we don't want to stop iterating because of that. It wants to reconstruct ranges from what e.g. fold in fold_range_test already created before, so stuff like (int) (((unsigned) x - 64U) <= 31U) for + [64, 95] range. > > + if (integer_onep (range_binop (LT_EXPR, integer_type_node, > > + p->low, 0, q->low, 0))) > > that's just > > tem = fold_binary (LT_EXPR, boolean_type_node, p->low, q->low); > if (tem && integer_onep (tem)) > return -1; Ok, can change that. > which avoids building the LT_EXPR tree if it doesn't fold. Similar below. > (ISTR some integer_onep () variant that handles a NULL argument ...) Couldn't find any. integer_onep needs non-NULL, compare_tree_int too. > > + /* Try to merge ranges. */ > > + for (first = i; i < length; i++) > > + { > > + tree low = ranges[i].low; > > + tree high = ranges[i].high; > > + int in_p = ranges[i].in_p; > > + bool strict_overflow_p = ranges[i].strict_overflow_p; > > + > > + for (j = i + 1; j < length; j++) > > + { > > That looks quadratic - do we want to limit this with a --param, simply > partitioning the array into quadratic chunks? This isn't quadratic (except in the hypothetical case where all merge_ranges calls would succeed, but then build_range_test would fail). In the likely case where if range merges succeed then update_range_test succeeds too this is just linear (plus the qsort before that which isn't linear though). So perhaps: + if (j > i + 1 + && update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode, + ops, ranges[i].exp, in_p, low, high, + strict_overflow_p)) + { + i = j - 1; + any_changes = true; + } could be + if (j > i + 1) + { + if (update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode, + ops, ranges[i].exp, in_p, low, high, + strict_overflow_p)) + any_changes = true; + i = j - 1; + } (then it isn't quadratic), or could try a few times before giving up: + if (j > i + 1) + { + if (update_range_test (ranges + i, ranges + i + 1, j - i - 1, opcode, + ops, ranges[i].exp, in_p, low, high, + strict_overflow_p)) + { + any_changes = true; + i = j - 1; + } + else if (update_fail_count == 64) + i = j - 1; + else + update_fail_count = 0; + } where int update_fail_count = 0; would be after the bool strict_overflow_p = ranges[i].strict_overflow_p; line in the outer loop. > > + if (ranges[i].exp != ranges[j].exp) > > + break; > > Or isn't it too bad because of this check? The above limits the chunks to a particular SSA_NAME. Within each chunk for the same SSA_NAME, the ranges are sorted in a way that merge_ranges will likely succeed, so it isn't quadratic unless update_range_test fails (see above). BTW, the second loop (the one that attempts to optimize x == 1 || x == 3 into (x & ~2) == 1 etc. is quadratic, which is why there is for (j = i + 1; j < length && j < i + 64; j++) don't think it is a limit people will often run into and thus I don't think it is worth adding a --param= for that. > > + if (!merge_ranges (&in_p, &low, &high, in_p, low, high, > > + ranges[j].in_p, ranges[j].low, ranges[j].high)) > > + break; > > And this early out? I suppose some comment on why together with > the sorting this is of limited complexity would help. > > > @@ -2447,6 +2864,9 @@ reassociate_bb (basic_block bb) > > optimize_ops_list (rhs_code, &ops); > > } > > > > + if (rhs_code == BIT_IOR_EXPR || rhs_code == BIT_AND_EXPR) > > I think we want to check that the type is boolean here, so we don't setup > the machinery needlessly? It is boolean only in some testcases, the is_bool stuff discussed at the beginning above was originally just an early return if (TREE_CODE (TREE_TYPE (exp)) != BOOLEAN_TYPE) return; before the loop, but it turned out that often the type of the | operands is integer, with either bool casted to integer, or with the type of EQ_EXPR etc. being integer instead of bool. Jakub