https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84101
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target| |x86_64-*-*, i?86-*-* Priority|P3 |P2 Status|UNCONFIRMED |NEW Keywords| |missed-optimization Last reconfirmed| |2018-01-30 Component|c |rtl-optimization CC| |segher at gcc dot gnu.org Ever confirmed|0 |1 Target Milestone|--- |7.4 --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- .optimized: pair (int num) { struct uint64_pair_t D.1958; int _1; long unsigned int _2; int _3; long unsigned int _4; vector(2) long unsigned int _9; <bb 2> [local count: 1073741825]: _1 = num_5(D) << 1; _2 = (long unsigned int) _1; _3 = num_5(D) >> 1; _4 = (long unsigned int) _3; _9 = {_2, _4}; MEM[(struct uint64_pair *)&D.1958] = _9; return D.1958; } there's (plenty?) of duplicates with the vectorizer making mistakes with respect to ABI details which are not exposed at vectorization time. Note we don't spill at expansion time either: (insn 10 9 11 2 (set (reg:V2DI 95) (vec_concat:V2DI (reg:DI 97) (reg:DI 99))) "t.c":15 -1 (nil)) (insn 11 10 12 2 (set (reg:TI 92 [ D.1958 ]) (subreg:TI (reg:V2DI 95) 0)) "t.c":15 -1 (nil)) (insn 12 11 16 2 (set (reg:TI 93 [ <retval> ]) (reg:TI 92 [ D.1958 ])) "t.c":15 -1 (nil)) (insn 16 12 17 2 (set (reg/i:TI 0 ax) (reg:TI 93 [ <retval> ])) "t.c":16 -1 (nil)) (insn 17 16 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1 (nil)) but it's at LRA time the 'ax' TImode reg (register pair!) gets exposed. From (insn 10 9 16 2 (set (reg:V2DI 95) (vec_concat:V2DI (reg:DI 97) (reg:DI 99))) "t.c":15 3744 {vec_concatv2di} (expr_list:REG_DEAD (reg:DI 99) (expr_list:REG_DEAD (reg:DI 97) (nil)))) (insn 16 10 17 2 (set (reg/i:TI 0 ax) (subreg:TI (reg:V2DI 95) 0)) "t.c":16 84 {*movti_internal} (expr_list:REG_DEAD (reg:V2DI 95) (nil))) we go to (after first spilling the DImode components): (insn 22 9 24 2 (set (reg:DI 21 xmm0 [95]) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8 A128])) "t.c":15 85 {*movdi_internal} (nil)) (insn 24 22 10 2 (set (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8 A128]) (reg:DI 5 di [99])) "t.c":15 85 {*movdi_internal} (nil)) (insn 10 24 23 2 (set (reg:V2DI 21 xmm0 [95]) (vec_concat:V2DI (reg:DI 21 xmm0 [95]) (mem/c:DI (plus:DI (reg/f:DI 7 sp) (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S8 A128]))) "t.c":15 3744 {vec_concatv2di} (nil)) (insn 23 10 16 2 (set (mem/c:V2DI (plus:DI (reg/f:DI 7 sp) (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S16 A128]) (reg:V2DI 21 xmm0 [95])) "t.c":15 1255 {movv2di_internal} (nil)) (insn 16 23 17 2 (set (reg/i:TI 0 ax) (mem/c:TI (plus:DI (reg/f:DI 7 sp) (const_int -24 [0xffffffffffffffe8])) [3 %sfp+-16 S16 A128])) "t.c":16 84 {*movti_internal} (nil)) This is really hard to avoid in the vectorizer given the decl we return isn't a RESULT_DECL but a regular VAR_DECL so we have no idea it is literally returned. Note the RTL when not vectorizing isn't too different: (insn 10 9 11 2 (set (reg:DI 97) (sign_extend:DI (reg:SI 96))) "t.c":13 -1 (nil)) (insn 11 10 12 2 (set (subreg:DI (reg:TI 91 [ D.1958 ]) 8) (reg:DI 97)) "t.c":15 -1 (nil)) (insn 12 11 16 2 (set (reg:TI 92 [ <retval> ]) (reg:TI 91 [ D.1958 ])) "t.c":15 -1 (nil)) (insn 16 12 17 2 (set (reg/i:TI 0 ax) (reg:TI 92 [ <retval> ])) "t.c":16 -1 (nil)) (insn 17 16 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1 (nil)) here it is the subreg1 pass that exposes the register pair and lowers the subreg: (insn 10 9 11 2 (set (reg:DI 97) (sign_extend:DI (reg:SI 96))) "t.c":13 149 {*extendsidi2_rex64} (nil)) (insn 11 10 19 2 (set (reg:DI 100 [ D.1958+8 ]) (reg:DI 97)) "t.c":15 85 {*movdi_internal} (nil)) (insn 19 11 20 2 (set (reg:DI 101 [ <retval> ]) (reg:DI 99 [ D.1958 ])) "t.c":15 85 {*movdi_internal} (nil)) (insn 20 19 21 2 (set (reg:DI 102 [ <retval>+8 ]) (reg:DI 100 [ D.1958+8 ])) "t.c":15 85 {*movdi_internal} (nil)) (insn 21 20 22 2 (set (reg:DI 0 ax) (reg:DI 101 [ <retval> ])) "t.c":16 85 {*movdi_internal} (nil)) (insn 22 21 17 2 (set (reg:DI 1 dx [+8 ]) (reg:DI 102 [ <retval>+8 ])) "t.c":16 85 {*movdi_internal} (nil)) (insn 17 22 0 2 (use (reg/i:TI 0 ax)) "t.c":16 -1 (nil)) I imagine it could be made recognizing the (subreg (vec_concat ..)) case as well... but would that be a hack?