https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97997

            Bug ID: 97997
           Summary: Missed optimization: Multiply of extended integer
                    cannot overflow
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: matthijs at stdin dot nl
  Target Milestone: ---

When an integer is extended and then multiplied by another integer of the
original size, the resulting multiplication can never overflow. However, gcc
does not seem to realize this. Consider:

uint16_t calc_u(uint16_t x ) {
        return (uint32_t)x * 10 / 10;
}

If gcc would realize that x * 10 cannot overflow, it can optimize away the * 10
/ 10. However, it does not:

$ gcc-10 -Os -Wall -Wextra -pedantic foo.c && objdump -S --disassemble=calc_u
a.out
00000000000011a0 <calc_u>:
    11a0:       f3 0f 1e fa             endbr64 
    11a4:       0f b7 c7                movzwl %di,%eax
    11a7:       b9 0a 00 00 00          mov    $0xa,%ecx
    11ac:       31 d2                   xor    %edx,%edx
    11ae:       6b c0 0a                imul   $0xa,%eax,%eax
    11b1:       f7 f1                   div    %ecx
    11b3:       c3                      retq   

When doing the multiplication signed, this optimization *does* happen:

uint16_t calc_s(uint16_t x ) {
        return (int32_t)x * 10 / 10;
}

$ gcc-10 -Os -Wall -Wextra -pedantic foo.c  && objdump -S --disassemble=calc_s
a.out

0000000000001199 <calc_s>:
    1199:       f3 0f 1e fa             endbr64 
    119d:       89 f8                   mov    %edi,%eax
    119f:       c3                      retq   

Since signed overflow is undefined, gcc presumably assumes that the
multiplication does not overflow and optimizes this. This shows that the
machinery for this optimization exists and works and suggests that the only
thing missing in the unsigned case is realizing that the overflow cannot
happen.

The above uses 16/32bit numbers, but the same happens on 32/64bit (just not on
8/16 bit, because then things are integer-promoted and multiplication is always
signed). When using -O2 or -O3, the code generated for unsigned is different,
but still not fully optimized.

Maybe I'm missing some corner case of the C language that would make this
optimization incorrect, but I think it should be allowed.

The original code that triggered this report is:

#define ticks2us(t)   (uint32_t)((uint64_t)(t)*1000000 / TICKS_PER_SEC)

Which could be optimized to a single multiply or even bitshift rather than a
multiply and division for particular values of TICKS_PER_SEC, while staying
generally applicable (but slower) for other values.

I took a guess at the component, please correct that if needed.

$ gcc-10 --version
gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Reply via email to