[Bug c/99591] Improving __builtin_add_overflow performance on x86-64

jakub at gcc dot gnu.org via Gcc-bugs Wed, 01 Sep 2021 01:16:21 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99591


Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
But the user could have written:
int signed1_overflow (signed char a, signed char b)
{
  signed char r;
  return __builtin_add_overflow (a, b, &r);
}

int signed2_overflow (short a, short b)
{
  short r;
  return __builtin_add_overflow (a, b, &r);
}

int signed3_overflow (signed char a, signed char b)
{
  signed char r;
  return __builtin_add_overflow ((int) a, (int) b, &r);
}

int signed4_overflow (short a, short b)
{
  short r;
  return __builtin_add_overflow ((int) a, (int) b, &r);
}
and then the latter two functions behave the same in C and C++.

So, I think it would be better to optimize this at the RTL level (only when
we've decided what exact operation we are using), but then I think the problem
is that this kind of thing is optimized usually by combine which doesn't
trigger
as the registers have multiple uses:
(insn 9 6 10 2 (set (reg:SI 92)
        (sign_extend:SI (reg/v:QI 88 [ a ]))) "pr99591.c":16:33 151
{extendqisi2}
     (nil))
(insn 10 9 11 2 (set (reg:SI 93)
        (sign_extend:SI (reg/v:QI 90 [ b ]))) "pr99591.c":16:33 151
{extendqisi2}
     (nil))
(insn 11 10 12 2 (set (reg:QI 86 [ _6+1 ])
        (const_int 0 [0])) "pr99591.c":16:33 77 {*movqi_internal}
     (nil))
(insn 12 11 13 2 (parallel [
            (set (reg:CCO 17 flags)
                (eq:CCO (plus:HI (sign_extend:HI (subreg:QI (reg:SI 92) 0))
                        (sign_extend:HI (subreg:QI (reg:SI 93) 0)))
                    (sign_extend:HI (plus:QI (subreg:QI (reg:SI 92) 0)
                            (subreg:QI (reg:SI 93) 0)))))
            (set (reg:QI 94)
                (plus:QI (subreg:QI (reg:SI 92) 0)
                    (subreg:QI (reg:SI 93) 0)))
        ]) "pr99591.c":16:33 238 {*addvqi4}
     (nil))
all in the same insn, but still multiple uses.
Another option is some gimple optimization, see the arguments are promoted and
repeat part of the expand_arith_overflow analysis and demote the arguments if
possible.  Or maybe just demote always and let expand_arith_overflow promote
again if needed?

[Bug c/99591] Improving __builtin_add_overflow performance on x86-64

Reply via email to