https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121773
Bug ID: 121773 Summary: Combine over-simplifies a subreg write Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rearnsha at gcc dot gnu.org CC: segher at gcc dot gnu.org Target Milestone: --- Target: arm With this testcase, compiled with -march=armv7-a+simd -mfpu=auto -marm -mfloat-abi=hard #include <arm_neon.h> uint64x1_t foo() { uint64x2_t v36 = vdupq_n_u64(0x2020000012345678); uint64x1_t v48 = vget_low_u64(v36); uint64x1_t v50 = vadd_u64(v48, v48); return vpadal_u32(v50, vdup_n_u32(0)); } Is miscompiled to vldr.64 d16, .L2 @ int vmov.i32 d17, #0 @ v2si vpadal.u32 d16, d17 vmov r0, r1, d16 @ int bx lr .L2: .word 0 .word 1077936128 We get, prior to combine: (insn 21 20 7 2 (set (reg:DI 101 [ _5 ]) (const_int 0 [0])) "/home/rearnsha/gnusrc/gcc/master/gcc/config/arm/arm_neon.h":607:14 -1 (nil)) (insn 7 21 8 2 (parallel [ (set (reg:CC_C 80 cc) (compare:CC_C (plus:SI (reg:SI 104 [ _6 ]) (reg:SI 104 [ _6 ])) (reg:SI 104 [ _6 ]))) (set (subreg:SI (reg:DI 101 [ _5 ]) 0) (plus:SI (reg:SI 104 [ _6 ]) (reg:SI 104 [ _6 ]))) ]) "/home/rearnsha/gnusrc/gcc/master/gcc/config/arm/arm_neon.h":607:14 17 {addsi3_compare_op1} (expr_list:REG_DEAD (reg:SI 104 [ _6 ]) (nil))) (insn 8 7 9 2 (set (subreg:SI (reg:DI 101 [ _5 ]) 4) (plus:SI (plus:SI (reg:SI 105 [ _6+4 ]) (reg:SI 105 [ _6+4 ])) (ltu:SI (reg:CC_C 80 cc) (const_int 0 [0])))) "/home/rearnsha/gnusrc/gcc/master/gcc/config/arm/arm_neon.h":607:14 21 {addsi3_carryin} That is: insn 21 clears R101 insn 7 writes the low part of R101 with an addition that carries out any overflow bit insn 8 writes the top part of R101 with an addition with carry-in. In this specific test R104 and R105 are known constants. It appears that combine tries to merge insns 21 and 8 with: Trying 21 -> 8: 21: r101:DI=0 8: r101:DI#4=0x40400000 Successfully matched this instruction: (set (reg:DI 101 [ _5 ]) (const_int 4629700416936869888 [0x4040000000000000])) ie writing the whole of r101 with the top part of the addition. somehow combine ignores that this will overwrite the intervening write of the low part - that subsequently becomes dead code and is eliminated.