The case comes from spec2006 403.gcc (or old GCC itself).

  for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
    {
      vd->e[i].mode = VOIDmode;
      vd->e[i].oldest_regno = i;
      vd->e[i].next_regno = INVALID_REGNUM;
    }

It is vectorized and only then completely peeled.
Only after peeling all stored values become constant.

Currently expand_vec_perm_pblendv works as following:
Let d.target, d.op0, dop1 be permutation parameters.

First we permute an operand (d.op1 or d.op0) and then blend it with
other argument:

d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, d.target)
(if d.op0 equal to d.target this is buggy)

Patch make it more accurate:
tmp = gen_reg_rtx (vmode)
tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
d.target = pblend(d.op0, tmp)
(Here d.op0 can be equal to d.target)

Below is rtl dump of buggy case:

(183t.optimized)
...
vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
35 }, { 0, 4, 0, 1 }>;
vect_shuffle3_high_469 = VEC_PERM_EXPR <vect_shuffle3_low_470, {
4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>;
...
(184r.expand)
...
(insn 205 204 206 (set (reg:V4SI 768)
        (const_vector:V4SI [
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
                (const_int 0 [0])
            ])) ../regrename.c:1171 -1
     (nil))

(insn 206 205 208 (set (reg:V4SI 769)
        (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3  S16
A128])) ../regrename.c:1171 -1
     (expr_list:REG_EQUAL (const_vector:V4SI [
                (const_int 32 [0x20])
                (const_int 33 [0x21])
                (const_int 34 [0x22])
                (const_int 35 [0x23])
            ])
        (nil)))

(insn 208 206 207 (set (reg:V4SI 770)
        (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
                (reg:V4SI 769))
            (parallel [
                    (const_int 0 [0])
                    (const_int 4 [0x4])
                    (const_int 1 [0x1])
                    (const_int 5 [0x5])
                ]))) ../regrename.c:1171 -1
     (nil))

(insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
        (vec_select:V4SI (reg:V4SI 770)
            (parallel [
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 2 [0x2])
                ]))) ../regrename.c:1171 -1
     (nil))

(insn 209 207 210 (set (reg:V4SI 771)
        (const_vector:V4SI [
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
                (const_int -1 [0xffffffffffffffff])
            ])) ../regrename.c:1171 -1
     (nil))

(insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
        (vec_select:V4SI (reg:V4SI 771)
            (parallel [
                    (const_int 0 [0])
                    (const_int 1 [0x1])
                    (const_int 0 [0])
                    (const_int 3 [0x3])
                ]))) ../regrename.c:1171 -1
     (nil))
(insn 211 210 212 (set (reg:V8HI 772)
        (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
            (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
            (const_int 48 [0x30]))) ../regrename.c:1171 -1
     (nil))
...

On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
>
>>> The patch fix pblendv expand.
>>> The bug was uncovered when permutation operands are constants.
>>> In this case we init target register for expand_vec_perm_1 with
>>> constant and then rewrite the target with constant for
>>> expand_vec_perm_pblend.
>>>
>>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>>> -flto -march=corei7.
>>>
>>> Bootstrap and make check passed.
>>>
>>> Is it ok?
>>
>> Please add a testcase.
>
> Also, it surprises me that we enter expand_vec_perm_pblendv with
> uninitialized (?) target. Does your patch only papers over a real
> problem up the call chain (hard to say without a testcase)?
>
> Uros.
>
>>
>>>
>>> Evgeny
>>>
>>> 2014-12-09  Evgeny Stupachenko  <evstu...@gmail.com>
>>>
>>> gcc/
>>>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>>         expand_vec_perm_1 target.
>>>
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index eafc15a..5a914ad 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d 
>>> *d)
>>>      dcopy.op0 = dcopy.op1 = d->op1;
>>>    else
>>>      dcopy.op0 = dcopy.op1 = d->op0;
>>> +  dcopy.target = gen_reg_rtx (vmode);
>>>    dcopy.one_operand_p = true;
>>>
>>>    for (i = 0; i < nelt; ++i)

Reply via email to