[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

ubizjak at gmail dot com via Gcc-bugs Wed, 05 Jun 2024 13:42:57 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600


Uroš Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #11 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Jonathan Wakely from comment #0)
> These two implementations of C++26 saturating addition
> (std::add_sat<unsigned>) have equivalent behaviour:
> 
> unsigned
> add_sat(unsigned x, unsigned y) noexcept
> {
>     unsigned z;
>     if (!__builtin_add_overflow(x, y, &z))
>           return z;
>     return -1u;
> }

[...]

> For -O3 on x86_64 GCC uses a branch for the first one:
> 
> add_sat(unsigned int, unsigned int):
>         add     edi, esi
>         jc      .L3
>         mov     eax, edi
>         ret
> .L3:
>         or      eax, -1
>         ret

The reason for failed if-conversion to cmove is due to the "weird" compare
arguments, the consequence of addsi3_cc_overflow_1 definition:

(insn 9 4 10 2 (parallel [
            (set (reg:CCC 17 flags)
                (compare:CCC (plus:SI (reg:SI 106)
                        (reg:SI 107))
                    (reg:SI 106)))
            (set (reg:SI 104)
                (plus:SI (reg:SI 106)
                    (reg:SI 107)))
        ]) "sadd.c":7:12 477 {addsi3_cc_overflow_1}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:SI 106)
            (nil))))

the noce_try_cmove path fails in noce_emit_cmove:

Breakpoint 1, noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40,
code=LTU, cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
    vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
1774                return NULL_RTX;
(gdb) list
1766          /* Don't even try if the comparison operands are weird
1767             except that the target supports cbranchcc4.  */
1768          if (! general_operand (cmp_a, GET_MODE (cmp_a))
1769              || ! general_operand (cmp_b, GET_MODE (cmp_b)))
1770            {
1771              if (!have_cbranchcc4
1772                  || GET_MODE_CLASS (GET_MODE (cmp_a)) != MODE_CC
1773                  || cmp_b != const0_rtx)
1774                return NULL_RTX;
1775            }
1776
1777          target = emit_conditional_move (x, { code, cmp_a, cmp_b, VOIDmode
},
1778                                          vtrue, vfalse, GET_MODE (x),
(gdb) bt
#0  noce_emit_cmove (if_info=0x7fffffffd750, x=0x7fffe9fe4e40, code=LTU,
cmp_a=0x7fffe9fe4a20, cmp_b=0x7fffe9feb9a8, vfalse=0x7fffe9fe49d8, 
    vtrue=0x7fffe9e09480, cc_cmp=0x0, rev_cc_cmp=0x0) at
../../git/gcc/gcc/ifcvt.cc:1774
#1  0x00000000020d995b in noce_try_cmove (if_info=0x7fffffffd750) at
../../git/gcc/gcc/ifcvt.cc:1884
#2  0x00000000020dec37 in noce_process_if_block (if_info=0x7fffffffd750) at
../../git/gcc/gcc/ifcvt.cc:4149
#3  0x00000000020e0248 in noce_find_if_block (test_bb=0x7fffe9fb5d80,
then_edge=0x7fffe9fd7cc0, else_edge=0x7fffe9fd7c60, pass=1)
    at ../../git/gcc/gcc/ifcvt.cc:4716
#4  0x00000000020e08e9 in find_if_header (test_bb=0x7fffe9fb5d80, pass=1) at
../../git/gcc/gcc/ifcvt.cc:4921
#5  0x00000000020e3255 in if_convert (after_combine=true) at
../../git/gcc/gcc/ifcvt.cc:6068

(gdb) p debug_rtx (cmp_a)
(plus:SI (reg:SI 106)
    (reg:SI 107))
$1 = void
(gdb) p debug_rtx (cmp_b)
(reg:SI 106)
$2 = void

The above cmp_a RTX fails general_operand check.

Please note that similar testcase:

unsigned
sub_sat(unsigned x, unsigned y)
{
    unsigned z;
    return __builtin_sub_overflow(x, y, &z) ? 0 : z;
}

results in the expected:

        subl    %esi, %edi      # 52    [c=4 l=2]  *subsi_3/0
        movl    $0, %eax        # 53    [c=4 l=5]  *movsi_internal/0
        cmovnb  %edi, %eax      # 54    [c=4 l=3]  *movsicc_noc/0
        ret             # 50    [c=0 l=1]  simple_return_internal

due to:

(insn 9 4 10 2 (parallel [
            (set (reg:CC 17 flags)
                (compare:CC (reg:SI 106)
                    (reg:SI 107)))
            (set (reg:SI 104)
                (minus:SI (reg:SI 106)
                    (reg:SI 107)))
        ]) "sadd.c":28:12 416 {*subsi_3}
     (expr_list:REG_DEAD (reg:SI 107)
        (expr_list:REG_DEAD (reg:SI 106)
            (nil))))

So, either addsi3_cc_overflow_1 RTX is not correct, or noce_emit_cmove should
be improved to handle the above "weird" operand form.

Let's ask Jakub.

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

Reply via email to