The case comes from spec2006 403.gcc (or old GCC itself). for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i) { vd->e[i].mode = VOIDmode; vd->e[i].oldest_regno = i; vd->e[i].next_regno = INVALID_REGNUM; }
It is vectorized and only then completely peeled. Only after peeling all stored values become constant. Currently expand_vec_perm_pblendv works as following: Let d.target, d.op0, dop1 be permutation parameters. First we permute an operand (d.op1 or d.op0) and then blend it with other argument: d.target = shuffle(d.op1) /* using expand_vec_perm_1 */ d.target = pblend(d.op0, d.target) (if d.op0 equal to d.target this is buggy) Patch make it more accurate: tmp = gen_reg_rtx (vmode) tmp = shuffle(d.op1) /* using expand_vec_perm_1 */ d.target = pblend(d.op0, tmp) (Here d.op0 can be equal to d.target) Below is rtl dump of buggy case: (183t.optimized) ... vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34, 35 }, { 0, 4, 0, 1 }>; vect_shuffle3_high_469 = VEC_PERM_EXPR <vect_shuffle3_low_470, { 4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>; ... (184r.expand) ... (insn 205 204 206 (set (reg:V4SI 768) (const_vector:V4SI [ (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) (const_int 0 [0]) ])) ../regrename.c:1171 -1 (nil)) (insn 206 205 208 (set (reg:V4SI 769) (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3 S16 A128])) ../regrename.c:1171 -1 (expr_list:REG_EQUAL (const_vector:V4SI [ (const_int 32 [0x20]) (const_int 33 [0x21]) (const_int 34 [0x22]) (const_int 35 [0x23]) ]) (nil))) (insn 208 206 207 (set (reg:V4SI 770) (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768) (reg:V4SI 769)) (parallel [ (const_int 0 [0]) (const_int 4 [0x4]) (const_int 1 [0x1]) (const_int 5 [0x5]) ]))) ../regrename.c:1171 -1 (nil)) (insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ]) (vec_select:V4SI (reg:V4SI 770) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 0 [0]) (const_int 2 [0x2]) ]))) ../regrename.c:1171 -1 (nil)) (insn 209 207 210 (set (reg:V4SI 771) (const_vector:V4SI [ (const_int -1 [0xffffffffffffffff]) (const_int -1 [0xffffffffffffffff]) (const_int -1 [0xffffffffffffffff]) (const_int -1 [0xffffffffffffffff]) ])) ../regrename.c:1171 -1 (nil)) (insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ]) (vec_select:V4SI (reg:V4SI 771) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 0 [0]) (const_int 3 [0x3]) ]))) ../regrename.c:1171 -1 (nil)) (insn 211 210 212 (set (reg:V8HI 772) (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0) (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0) (const_int 48 [0x30]))) ../regrename.c:1171 -1 (nil)) ... On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > >>> The patch fix pblendv expand. >>> The bug was uncovered when permutation operands are constants. >>> In this case we init target register for expand_vec_perm_1 with >>> constant and then rewrite the target with constant for >>> expand_vec_perm_pblend. >>> >>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops >>> -flto -march=corei7. >>> >>> Bootstrap and make check passed. >>> >>> Is it ok? >> >> Please add a testcase. > > Also, it surprises me that we enter expand_vec_perm_pblendv with > uninitialized (?) target. Does your patch only papers over a real > problem up the call chain (hard to say without a testcase)? > > Uros. > >> >>> >>> Evgeny >>> >>> 2014-12-09 Evgeny Stupachenko <evstu...@gmail.com> >>> >>> gcc/ >>> * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for >>> expand_vec_perm_1 target. >>> >>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c >>> index eafc15a..5a914ad 100644 >>> --- a/gcc/config/i386/i386.c >>> +++ b/gcc/config/i386/i386.c >>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d >>> *d) >>> dcopy.op0 = dcopy.op1 = d->op1; >>> else >>> dcopy.op0 = dcopy.op1 = d->op0; >>> + dcopy.target = gen_reg_rtx (vmode); >>> dcopy.one_operand_p = true; >>> >>> for (i = 0; i < nelt; ++i)