https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94941
Bug ID: 94941 Summary: Expansion of some internal fns can drop the lhs on the floor Product: gcc Version: 10.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: rsandifo at gcc dot gnu.org Target Milestone: --- internal-fn.c:expand_mask_load_optab_fn uses expand_insn to emit the load instruction, but doesn't then test whether the coerced output operand is the same as the target of the gcall. It might not be, for example, in unoptimised code, where the target of the gcall expands to a MEM rtx and the load insn requires a REG destination. We need the equivalent of: if (!rtx_equal_p (lhs_rtx, ops[0].value)) emit_move_insn (lhs_rtx, ops[0].value); in expand_while_optab_fn. This can be seen for AArch64 with the following test, compiled with -O0 -march=armv8.2-a+sve: ---------------------------------------------------------- #include <arm_sve.h> svfloat32_t foo (float *ptr) { svbool_t pg = svptrue_pat_b32 (SV_VL1); svfloat32_t res = svld1 (pg, ptr); return res; } int main (void) { svbool_t pg = svptrue_pat_b32 (SV_VL1); float x[1] = { 1 }; if (svptest_any (pg, svcmpne (pg, foo (x), 1.0))) __builtin_abort (); return 0; } ---------------------------------------------------------- We emit: ;; res_5 = .MASK_LOAD (ptr_4(D), 4B, _2); (insn 9 8 10 (set (reg/f:DI 96) (mem/f/c:DI (plus:DI (reg/f:DI 87 virtual-stack-vars) (const_poly_int:DI [-40, -32])) [3 ptr+0 S8 A64])) "/tmp/foo.c":7:21 -1 (nil)) (insn 10 9 0 (set (reg:VNx4SF 97) (unspec:VNx4SF [ (reg:VNx4BI 92 [ _2 ]) (mem:VNx4SF (reg/f:DI 96) [0 MEM <svfloat32_t> [(float *)ptr_4(D)]+0 S[16, 16] A8]) ] UNSPEC_LD1_SVE)) "/tmp/foo.c":7:21 -1 (nil)) but don't store reg 97 to the stack slot for "res". Then the return statement loads from "res": (insn 12 11 0 (set (reg:VNx4SF 93 [ _6 ]) (unspec:VNx4SF [ (subreg:VNx4BI (reg:VNx16BI 98) 0) (mem/c:VNx4SF (plus:DI (reg/f:DI 87 virtual-stack-vars) (const_poly_int:DI [-32, -32])) [2 res+0 S[16, 16] A128]) ] UNSPEC_PRED_X)) "/tmp/foo.c":8:10 -1 (nil)) meaning we return uninitialised stack contents. The same problem affects expand_load_lanes_optab_fn and expand_gather_load_optab_fn. I think this problem has existed since the mask load/store functions were introduced, but it was probably latent until GCC 10 because nothing would use them in unoptimised code.