http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50717
Bug #: 50717 Summary: Silent code gen fault with incorrect widening of multiply Classification: Unclassified Product: gcc Version: 4.7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: mgret...@sourceware.org Host: x86_64-linux-gnu Target: arm-none-eabi Created attachment 25483 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=25483 Executable test case. The attached test case fails when compiled and executed as follows: arm-none-eabi-gcc -O2 gen_exec.c -o gen_exec.axf -fno-expensive-optimizations .../linaro-qemu/0.15.50/bin/qemu-arm ./gen_exec.axf The two functions in the test case f0a and f0b are identical, just compiled with -fexpensive-optimizations off (for f0a) and on (for f0b). The code generation differences produce an incorrect result. The attached file gen_exec_simple.c contains the individual f0b function for compilation. The attached tree dumps show the first difference between compiling gen_exec_simple.c with and without -fexpensive-optimizations. The main difference seems to be the following: --- gen_exec_simple.c.135t.tailc.cheap 2011-10-13 15:02:50.000000000 +0100 +++ gen_exec_simple.c.135t.tailc.expensive 2011-10-13 15:03:15.000000000 +0100 @@ -3,6 +3,7 @@ f0b (uint32_t * restrict arg1, uint64_t * restrict arg2, uint8_t * restrict arg3) { + <unnamed-unsigned:32> D.8363; void * D.8362; sizetype D.8361; void * D.8360; @@ -67,7 +68,8 @@ f0b (uint32_t * restrict arg1, uint64_t D.8255_41 = MEM[base: D.8362_127, offset: 0B]; D.8256_42 = D.8252_36 * D.8255_41; D.8257_43 = (uint64_t) D.8256_42; - D.8258_44 = D.8257_43 + temp_1_18; + D.8363_7 = (<unnamed-unsigned:32>) D.8245_16; + D.8258_44 = WIDEN_MULT_PLUS_EXPR <D.8255_41, D.8363_7, temp_1_18>; D.8259_45 = D.8258_44 >> 1; D.8260_46 = D.8259_45 >> 24; D.8272_57 = D.8251_31 | 1; That is a widening multiply/accumulate has been added to the tree. This ultimately becomes a UMLAL in the output. This widening multiply/accumulate is incorrect. It is trying to do the following: result += ((((((arg3[idx] * arg1[idx]) + temp_1)/2))>>24) / (temp_2 | 1)); Where arg3[idx] is a uint8_t, arg1[idx] is a uint32_t and temp_1 is a uint64_t. As written in C, the result of the multiply is truncated to a 32-bit value, and then added to the 64-bit value. The widening multiply/accumulate, however, widens the inputs to 64-bits, and does a 64-bit multiply before adding it to the 64-bit accumulator. These produce a different result when the result of the multiply overflows 32-bits. A bisect of the source leads me to believe that revision 177907 is responsible: http://gcc.gnu.org/viewcvs?view=revision&revision=177907