https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97997
Bug ID: 97997 Summary: Missed optimization: Multiply of extended integer cannot overflow Product: gcc Version: 10.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: matthijs at stdin dot nl Target Milestone: --- When an integer is extended and then multiplied by another integer of the original size, the resulting multiplication can never overflow. However, gcc does not seem to realize this. Consider: uint16_t calc_u(uint16_t x ) { return (uint32_t)x * 10 / 10; } If gcc would realize that x * 10 cannot overflow, it can optimize away the * 10 / 10. However, it does not: $ gcc-10 -Os -Wall -Wextra -pedantic foo.c && objdump -S --disassemble=calc_u a.out 00000000000011a0 <calc_u>: 11a0: f3 0f 1e fa endbr64 11a4: 0f b7 c7 movzwl %di,%eax 11a7: b9 0a 00 00 00 mov $0xa,%ecx 11ac: 31 d2 xor %edx,%edx 11ae: 6b c0 0a imul $0xa,%eax,%eax 11b1: f7 f1 div %ecx 11b3: c3 retq When doing the multiplication signed, this optimization *does* happen: uint16_t calc_s(uint16_t x ) { return (int32_t)x * 10 / 10; } $ gcc-10 -Os -Wall -Wextra -pedantic foo.c && objdump -S --disassemble=calc_s a.out 0000000000001199 <calc_s>: 1199: f3 0f 1e fa endbr64 119d: 89 f8 mov %edi,%eax 119f: c3 retq Since signed overflow is undefined, gcc presumably assumes that the multiplication does not overflow and optimizes this. This shows that the machinery for this optimization exists and works and suggests that the only thing missing in the unsigned case is realizing that the overflow cannot happen. The above uses 16/32bit numbers, but the same happens on 32/64bit (just not on 8/16 bit, because then things are integer-promoted and multiplication is always signed). When using -O2 or -O3, the code generated for unsigned is different, but still not fully optimized. Maybe I'm missing some corner case of the C language that would make this optimization incorrect, but I think it should be allowed. The original code that triggered this report is: #define ticks2us(t) (uint32_t)((uint64_t)(t)*1000000 / TICKS_PER_SEC) Which could be optimized to a single multiply or even bitshift rather than a multiply and division for particular values of TICKS_PER_SEC, while staying generally applicable (but slower) for other values. I took a guess at the component, please correct that if needed. $ gcc-10 --version gcc-10 (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.