On Fri, 9 Feb 2024, Jakub Jelinek wrote:

> Hi!
> 
> The following patch is the optimization part of PR113774, where in
> handle_cast we emit some conditionals which are always true and presumably
> VRP would figure that out later and clean it up, except that instead
> thread1 is invoked and threads everything through the conditions, so we end
> up with really ugly code which is hard to be cleaned up later and then
> run into PR113831 VN bug and miscompile stuff.
> 
> handle_cast computes low and high as limb indexes, where idx < low
> doesn't need any special treatment, just uses the operand's limb,
> idx >= high cases all the bits in the limb are an extension (so, for
> unsigned widening cast all those bits are 0, for signed widening cast
> all those bits are equal to the in earlier code computed sign mask,
> narrowing cast don't trigger this code) and then the idx == low && idx <
> high case if it exists need special treatment (some bits are copied, others
> extended, or all bits are copied but sign mask needs to be computed).
> 
> The code already attempted to optimize away some unneeded casts, in the
> first hunk below e.g. for the case like 257 -> 321 bit extension, where
> low is 4 and high 5 and we use a loop handling the first 4 limbs (2
> iterations) with m_upwards_2limb 4 - no special handling is needed in the
> loop, and the special handling is done on the first limb after the loop
> and then the last limb after the loop gets the extension only, or
> in the second hunk where can emit a single comparison instead of
> 2 e.g. for the low == high case - that must be a zero extension from
> multiple of limb bits, say 192 -> 328, or for the case where we know
> the idx == low case happens in the other limb processed in the loop, not
> the current one.
> 
> But the testcase shows further cases where we always know some of the
> comparisons can be folded to true/false, in particular there is
> 255 -> 257 bit zero extension, so low 3, high 4, m_upwards_2limb 4.
> The loop handles 2 limbs at the time and for the first limb we were
> emitting idx < low ? operand[idx] : 0; but because idx goes from 0
> with step 2 2 iterations, idx < 3 is always true, so we can just
> emit operand[idx].  This is handled in the first hunk.  In addition
> to fixing it (that is the " - m_first" part in there) I've rewritten
> it using low to make it more readable.
> 
> Similarly, in the other limb we were emitting
> idx + 1 <= low ? (idx + 1 == low ? operand[idx] & 0x7ff....ff : operand[idx]) 
> : 0
> but idx + 1 <= 3 is always true in the loop, so all we should emit is
> idx + 1 == low ? operand[idx] & 0x7ff....ff : operand[idx],
> Unfortunately for the latter, when single_comparison is true, we emit
> just one comparison, but the code which fills the branches will fill it
> with the operand[idx] and 0 cases (for zero extension, for sign extension
> similarly), not the operand[idx] (aka copy) and operand[idx] & 0x7ff....ff
> (aka most significant limb of the narrower precision) cases.  Instead
> of making the code less readable by using single_comparison for that and
> handling it in the code later differently I've chosen to just emit
> a condition which will be always true and let cfg cleanup clean it up.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2024-02-09  Jakub Jelinek  <ja...@redhat.com>
> 
>       PR tree-optimization/113774
>       * gimple-lower-bitint.cc (bitint_large_huge::handle_cast): Don't
>       emit any comparison if m_first and low + 1 is equal to
>       m_upwards_2limb, simplify condition for that.  If not
>       single_comparison, not m_first and we can prove that the idx <= low
>       comparison will be always true, emit instead of idx <= low
>       comparison low <= low such that cfg cleanup will optimize it at
>       the end of the pass.
> 
>       * gcc.dg/torture/bitint-57.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj     2024-02-08 12:49:40.435313811 +0100
> +++ gcc/gimple-lower-bitint.cc        2024-02-08 14:33:36.033220098 +0100
> @@ -1350,9 +1350,7 @@ bitint_large_huge::handle_cast (tree lhs
>        if (!tree_fits_uhwi_p (idx))
>       {
>         if (m_upwards_2limb
> -           && (m_upwards_2limb * limb_prec
> -               <= ((unsigned) TYPE_PRECISION (rhs_type)
> -                   - !TYPE_UNSIGNED (rhs_type))))
> +           && low >= m_upwards_2limb - m_first)
>           {
>             rhs1 = handle_operand (rhs1, idx);
>             if (m_first)
> @@ -1363,8 +1361,21 @@ bitint_large_huge::handle_cast (tree lhs
>           }
>         bool single_comparison
>           = low == high || (m_upwards_2limb && (low & 1) == m_first);
> +       tree idxc = idx;
> +       if (!single_comparison
> +           && m_upwards_2limb
> +           && !m_first
> +           && low + 1 == m_upwards_2limb)
> +         /* In this case we know that idx <= low always,
> +            so effectively we just needs a single comparison,
> +            idx < low or idx == low, but we'd need to emit different
> +            code for the 2 branches than single_comparison normally
> +            emits.  So, instead of special-casing that, emit a
> +            low <= low comparison which cfg cleanup will clean up
> +            at the end of the pass.  */
> +         idxc = size_int (low);
>         g = gimple_build_cond (single_comparison ? LT_EXPR : LE_EXPR,
> -                              idx, size_int (low), NULL_TREE, NULL_TREE);
> +                              idxc, size_int (low), NULL_TREE, NULL_TREE);
>         edge edge_true_true, edge_true_false, edge_false;
>         if_then_if_then_else (g, (single_comparison ? NULL
>                                   : gimple_build_cond (EQ_EXPR, idx,
> --- gcc/testsuite/gcc.dg/torture/bitint-57.c.jj       2024-02-08 
> 14:45:35.033303393 +0100
> +++ gcc/testsuite/gcc.dg/torture/bitint-57.c  2024-02-07 18:50:10.759168537 
> +0100
> @@ -0,0 +1,32 @@
> +/* PR tree-optimization/113774 */
> +/* { dg-do run { target bitint } } */
> +/* { dg-options "-std=c23 -pedantic-errors" } */
> +/* { dg-skip-if "" { ! run_expensive_tests }  { "*" } { "-O0" "-O2" } } */
> +/* { dg-skip-if "" { ! run_expensive_tests } { "-flto" } { "" } } */
> +
> +#if __BITINT_MAXWIDTH__ >= 512
> +unsigned _BitInt(512) u;
> +unsigned _BitInt(512) v;
> +
> +void
> +foo (unsigned _BitInt(255) a, unsigned _BitInt(257) b, unsigned _BitInt(512) 
> *r)
> +{
> +  b += v;
> +  b |= a - b;
> +  unsigned _BitInt(512) c = b * 6;
> +  unsigned _BitInt(512) h = c >> u;
> +  *r = h;
> +}
> +#endif
> +
> +int
> +main ()
> +{
> +#if __BITINT_MAXWIDTH__ >= 512
> +  unsigned _BitInt(512) x;
> +  foo (0x10000000000000000wb, 0x10000000000000001wb, &x);
> +  if (x != 
> 0x1fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffawb)
> +    __builtin_abort ();
> +#endif
> +  return 0;
> +}
> 
>       Jakub
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to