https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109093
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |hjl.tools at gmail dot com --- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> --- My patch just caused far more .DEFERRED_INITs to be optimized away for dead variables (though, as can be seen on #c0 apparently not all). What I see on the #c0 testcase looks like a x86 backend bug to me. In func_2.constprop.0.isra.0 there is in optimized dump: uint64_t * * * * const * l_2254[6]; variable and the IL mentions it just in l_2254 = .DEFERRED_INIT (48, 2, &"l_2254"[0]); and l_2254 ={v} {CLOBBER(eol)}; (the latter in 2 spots) statements. Why the .DEFERRED_INIT hasn't been DSEd is certainly a question. Anyway, l_2254 has 128-bit alignment (supposedly due to ix86_local_alignment and psABI requirements). Expansion expands that .DEFERRED_INIT into: (insn 23 22 24 5 (parallel [ (set (reg:DI 162) (plus:DI (reg/f:DI 19 frame) (const_int -48 [0xffffffffffffffd0]))) (clobber (reg:CC 17 flags)) ]) "runData/keep/in.16651.c":199:34 247 {*adddi_1} (nil)) (insn 24 23 25 5 (set (reg:V32QI 163) (const_vector:V32QI [ (const_int 0 [0]) repeated x32 ])) "runData/keep/in.16651.c":199:34 1823 {movv32qi_internal} (nil)) (insn 25 24 26 5 (set (mem/c:V16QI (reg:DI 162) [0 MEM <char[1:48]> [(void *)_157]+0 S16 A128]) (vec_select:V16QI (reg:V32QI 163) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))) "runData/keep/in.16651.c":199:34 4383 {vec_extract_lo_v32qi} (nil)) (insn 26 25 27 5 (set (mem/c:V16QI (plus:DI (reg:DI 162) (const_int 16 [0x10])) [0 MEM <char[1:48]> [(void *)_157]+16 S16 A128]) (vec_select:V16QI (reg:V32QI 163) (parallel [ (const_int 16 [0x10]) (const_int 17 [0x11]) (const_int 18 [0x12]) (const_int 19 [0x13]) (const_int 20 [0x14]) (const_int 21 [0x15]) (const_int 22 [0x16]) (const_int 23 [0x17]) (const_int 24 [0x18]) (const_int 25 [0x19]) (const_int 26 [0x1a]) (const_int 27 [0x1b]) (const_int 28 [0x1c]) (const_int 29 [0x1d]) (const_int 30 [0x1e]) (const_int 31 [0x1f]) ]))) "runData/keep/in.16651.c":199:34 4384 {vec_extract_hi_v32qi} (nil)) (insn 27 26 28 5 (set (mem/c:V16QI (plus:DI (reg:DI 162) (const_int 32 [0x20])) [0 MEM <char[1:48]> [(void *)_157]+32 S16 A128]) (subreg:V16QI (reg:V32QI 163) 0)) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (nil)) cmpelim dump still has: (insn 279 6 25 4 (set (reg/f:DI 38 r10 [215]) (plus:DI (reg/f:DI 7 sp) (const_int -48 [0xffffffffffffffd0]))) 241 {*leadi} (expr_list:REG_EQUIV (plus:DI (reg/f:DI 19 frame) (const_int -48 [0xffffffffffffffd0])) (nil))) (insn 25 279 26 4 (set (reg:V16QI 21 xmm1 [orig:218 MEM <char[1:48]> [(void *)_157] ] [218]) (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ])) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (expr_list:REG_EQUIV (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ]) (nil))) (insn 26 25 34 4 (set (reg:V16QI 20 xmm0 [orig:219 MEM <char[1:48]> [(void *)_157]+16 ] [219]) (reg:V16QI 21 xmm1)) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (expr_list:REG_EQUIV (const_vector:V16QI [ (const_int 0 [0]) repeated x16 ]) (nil))) before the loop and (insn 289 22 290 5 (set (mem/c:V16QI (reg/f:DI 38 r10 [215]) [0 MEM <char[1:48]> [(void *)_157]+0 S16 A128]) (reg:V16QI 21 xmm1 [orig:218 MEM <char[1:48]> [(void *)_157] ] [218])) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (nil)) (insn 290 289 291 5 (set (mem/c:V16QI (plus:DI (reg/f:DI 38 r10 [215]) (const_int 16 [0x10])) [0 MEM <char[1:48]> [(void *)_157]+16 S16 A128]) (reg:V16QI 20 xmm0 [orig:219 MEM <char[1:48]> [(void *)_157]+16 ] [219])) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (nil)) (insn 291 290 29 5 (set (mem/c:V16QI (plus:DI (reg/f:DI 38 r10 [215]) (const_int 32 [0x20])) [0 MEM <char[1:48]> [(void *)_157]+32 S16 A128]) (reg:V16QI 20 xmm0 [orig:219 MEM <char[1:48]> [(void *)_157]+16 ] [219])) "runData/keep/in.16651.c":199:34 1824 {movv16qi_internal} (nil)) in the loop. stack_alignment_needed is 128, but then pro_and_epilogue decides to do: (insn/f 337 315 338 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg/f:DI 6 bp)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn/f 338 337 339 2 (set (reg/f:DI 6 bp) (reg/f:DI 7 sp)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn/f 339 338 340 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg:DI 41 r13)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn/f 340 339 341 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg:DI 40 r12)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn/f 341 340 342 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg:DI 3 bx)) "runData/keep/in.16651.c":157:16 -1 (nil)) (insn 342 341 343 2 (set (mem/v:BLK (scratch:DI) [0 A8]) (unspec:BLK [ (mem/v:BLK (scratch:DI) [0 A8]) ] UNSPEC_MEMORY_BLOCKAGE)) "runData/keep/in.16651.c":157:16 -1 (nil)) ... (insn 279 6 25 3 (set (reg/f:DI 38 r10 [215]) (plus:DI (reg/f:DI 7 sp) (const_int -48 [0xffffffffffffffd0]))) 241 {*leadi} (expr_list:REG_EQUIV (plus:DI (reg/f:DI 19 frame) (const_int -48 [0xffffffffffffffd0])) (nil))) which ends up: pushq %rbp .LCFI5: movq %rsp, %rbp .LCFI6: pushq %r13 pushq %r12 pushq %rbx ... leaq -48(%rsp), %r10 ... vmovdqa %xmm1, (%r10) vmovdqa %xmm0, 16(%r10) movl $8, %esi vmovdqa %xmm0, 32(%r10) But, this result in unaligned stores, because %rsp on entry to x86_64 functions should be (%rsp & 15) == 8, such that %rbp is 16-byte aligned, and then it does 3 pushes (24 bytes) and allocates l_2254 48 bytes below that, so at %rbp - 72 bytes, so all the 3 vector stores are unaligned ((%r10 & 15) == 8).