On Fri, 9 Feb 2024, Jakub Jelinek wrote: > Hi! > > The following patch is the optimization part of PR113774, where in > handle_cast we emit some conditionals which are always true and presumably > VRP would figure that out later and clean it up, except that instead > thread1 is invoked and threads everything through the conditions, so we end > up with really ugly code which is hard to be cleaned up later and then > run into PR113831 VN bug and miscompile stuff. > > handle_cast computes low and high as limb indexes, where idx < low > doesn't need any special treatment, just uses the operand's limb, > idx >= high cases all the bits in the limb are an extension (so, for > unsigned widening cast all those bits are 0, for signed widening cast > all those bits are equal to the in earlier code computed sign mask, > narrowing cast don't trigger this code) and then the idx == low && idx < > high case if it exists need special treatment (some bits are copied, others > extended, or all bits are copied but sign mask needs to be computed). > > The code already attempted to optimize away some unneeded casts, in the > first hunk below e.g. for the case like 257 -> 321 bit extension, where > low is 4 and high 5 and we use a loop handling the first 4 limbs (2 > iterations) with m_upwards_2limb 4 - no special handling is needed in the > loop, and the special handling is done on the first limb after the loop > and then the last limb after the loop gets the extension only, or > in the second hunk where can emit a single comparison instead of > 2 e.g. for the low == high case - that must be a zero extension from > multiple of limb bits, say 192 -> 328, or for the case where we know > the idx == low case happens in the other limb processed in the loop, not > the current one. > > But the testcase shows further cases where we always know some of the > comparisons can be folded to true/false, in particular there is > 255 -> 257 bit zero extension, so low 3, high 4, m_upwards_2limb 4. > The loop handles 2 limbs at the time and for the first limb we were > emitting idx < low ? operand[idx] : 0; but because idx goes from 0 > with step 2 2 iterations, idx < 3 is always true, so we can just > emit operand[idx]. This is handled in the first hunk. In addition > to fixing it (that is the " - m_first" part in there) I've rewritten > it using low to make it more readable. > > Similarly, in the other limb we were emitting > idx + 1 <= low ? (idx + 1 == low ? operand[idx] & 0x7ff....ff : operand[idx]) > : 0 > but idx + 1 <= 3 is always true in the loop, so all we should emit is > idx + 1 == low ? operand[idx] & 0x7ff....ff : operand[idx], > Unfortunately for the latter, when single_comparison is true, we emit > just one comparison, but the code which fills the branches will fill it > with the operand[idx] and 0 cases (for zero extension, for sign extension > similarly), not the operand[idx] (aka copy) and operand[idx] & 0x7ff....ff > (aka most significant limb of the narrower precision) cases. Instead > of making the code less readable by using single_comparison for that and > handling it in the code later differently I've chosen to just emit > a condition which will be always true and let cfg cleanup clean it up. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK. Richard. > 2024-02-09 Jakub Jelinek <ja...@redhat.com> > > PR tree-optimization/113774 > * gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Don't > emit any comparison if m_first and low + 1 is equal to > m_upwards_2limb, simplify condition for that. If not > single_comparison, not m_first and we can prove that the idx <= low > comparison will be always true, emit instead of idx <= low > comparison low <= low such that cfg cleanup will optimize it at > the end of the pass. > > * gcc.dg/torture/bitint-57.c: New test. > > --- gcc/gimple-lower-bitint.cc.jj 2024-02-08 12:49:40.435313811 +0100 > +++ gcc/gimple-lower-bitint.cc 2024-02-08 14:33:36.033220098 +0100 > @@ -1350,9 +1350,7 @@ bitint_large_huge::handle_cast (tree lhs > if (!tree_fits_uhwi_p (idx)) > { > if (m_upwards_2limb > - && (m_upwards_2limb * limb_prec > - <= ((unsigned) TYPE_PRECISION (rhs_type) > - - !TYPE_UNSIGNED (rhs_type)))) > + && low >= m_upwards_2limb - m_first) > { > rhs1 = handle_operand (rhs1, idx); > if (m_first) > @@ -1363,8 +1361,21 @@ bitint_large_huge::handle_cast (tree lhs > } > bool single_comparison > = low == high || (m_upwards_2limb && (low & 1) == m_first); > + tree idxc = idx; > + if (!single_comparison > + && m_upwards_2limb > + && !m_first > + && low + 1 == m_upwards_2limb) > + /* In this case we know that idx <= low always, > + so effectively we just needs a single comparison, > + idx < low or idx == low, but we'd need to emit different > + code for the 2 branches than single_comparison normally > + emits. So, instead of special-casing that, emit a > + low <= low comparison which cfg cleanup will clean up > + at the end of the pass. */ > + idxc = size_int (low); > g = gimple_build_cond (single_comparison ? LT_EXPR : LE_EXPR, > - idx, size_int (low), NULL_TREE, NULL_TREE); > + idxc, size_int (low), NULL_TREE, NULL_TREE); > edge edge_true_true, edge_true_false, edge_false; > if_then_if_then_else (g, (single_comparison ? NULL > : gimple_build_cond (EQ_EXPR, idx, > --- gcc/testsuite/gcc.dg/torture/bitint-57.c.jj 2024-02-08 > 14:45:35.033303393 +0100 > +++ gcc/testsuite/gcc.dg/torture/bitint-57.c 2024-02-07 18:50:10.759168537 > +0100 > @@ -0,0 +1,32 @@ > +/* PR tree-optimization/113774 */ > +/* { dg-do run { target bitint } } */ > +/* { dg-options "-std=c23 -pedantic-errors" } */ > +/* { dg-skip-if "" { ! run_expensive_tests } { "*" } { "-O0" "-O2" } } */ > +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */ > + > +#if __BITINT_MAXWIDTH__ >= 512 > +unsigned _BitInt(512) u; > +unsigned _BitInt(512) v; > + > +void > +foo (unsigned _BitInt(255) a, unsigned _BitInt(257) b, unsigned _BitInt(512) > *r) > +{ > + b += v; > + b |= a - b; > + unsigned _BitInt(512) c = b * 6; > + unsigned _BitInt(512) h = c >> u; > + *r = h; > +} > +#endif > + > +int > +main () > +{ > +#if __BITINT_MAXWIDTH__ >= 512 > + unsigned _BitInt(512) x; > + foo (0x10000000000000000wb, 0x10000000000000001wb, &x); > + if (x != > 0x1fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffawb) > + __builtin_abort (); > +#endif > + return 0; > +} > > Jakub > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)