I've tried to get smaller reproducer, however currently it is
complicated as several functions in GCC.
However patch is fixing spec2006 benchmark. Shouldn't that be enough
for regression testing?

Thanks,
Evgeny

On Tue, Dec 9, 2014 at 4:29 PM, Evgeny Stupachenko <evstu...@gmail.com> wrote:
> The case comes from spec2006 403.gcc (or old GCC itself).
>
>   for (i = 0; i < FIRST_PSEUDO_REGISTER; ++i)
>     {
>       vd->e[i].mode = VOIDmode;
>       vd->e[i].oldest_regno = i;
>       vd->e[i].next_regno = INVALID_REGNUM;
>     }
>
> It is vectorized and only then completely peeled.
> Only after peeling all stored values become constant.
>
> Currently expand_vec_perm_pblendv works as following:
> Let d.target, d.op0, dop1 be permutation parameters.
>
> First we permute an operand (d.op1 or d.op0) and then blend it with
> other argument:
>
> d.target = shuffle(d.op1) /* using expand_vec_perm_1 */
> d.target = pblend(d.op0, d.target)
> (if d.op0 equal to d.target this is buggy)
>
> Patch make it more accurate:
> tmp = gen_reg_rtx (vmode)
> tmp = shuffle(d.op1) /* using expand_vec_perm_1 */
> d.target = pblend(d.op0, tmp)
> (Here d.op0 can be equal to d.target)
>
> Below is rtl dump of buggy case:
>
> (183t.optimized)
> ...
> vect_shuffle3_low_470 = VEC_PERM_EXPR <{ 0, 0, 0, 0 }, { 32, 33, 34,
> 35 }, { 0, 4, 0, 1 }>;
> vect_shuffle3_high_469 = VEC_PERM_EXPR <vect_shuffle3_low_470, {
> 4294967295, 4294967295, 4294967295, 4294967295 }, { 0, 1, 4, 3 }>;
> ...
> (184r.expand)
> ...
> (insn 205 204 206 (set (reg:V4SI 768)
>         (const_vector:V4SI [
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>                 (const_int 0 [0])
>             ])) ../regrename.c:1171 -1
>      (nil))
>
> (insn 206 205 208 (set (reg:V4SI 769)
>         (mem/u/c:V4SI (symbol_ref/u:DI ("*.LC28") [flags 0x2]) [3  S16
> A128])) ../regrename.c:1171 -1
>      (expr_list:REG_EQUAL (const_vector:V4SI [
>                 (const_int 32 [0x20])
>                 (const_int 33 [0x21])
>                 (const_int 34 [0x22])
>                 (const_int 35 [0x23])
>             ])
>         (nil)))
>
> (insn 208 206 207 (set (reg:V4SI 770)
>         (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 768)
>                 (reg:V4SI 769))
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 4 [0x4])
>                     (const_int 1 [0x1])
>                     (const_int 5 [0x5])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
>
> (insn 207 208 209 (set (reg:V4SI 464 [ D.15061 ])
>         (vec_select:V4SI (reg:V4SI 770)
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 1 [0x1])
>                     (const_int 0 [0])
>                     (const_int 2 [0x2])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
>
> (insn 209 207 210 (set (reg:V4SI 771)
>         (const_vector:V4SI [
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>                 (const_int -1 [0xffffffffffffffff])
>             ])) ../regrename.c:1171 -1
>      (nil))
>
> (insn 210 209 211 (set (reg:V4SI 464 [ D.15061 ])
>         (vec_select:V4SI (reg:V4SI 771)
>             (parallel [
>                     (const_int 0 [0])
>                     (const_int 1 [0x1])
>                     (const_int 0 [0])
>                     (const_int 3 [0x3])
>                 ]))) ../regrename.c:1171 -1
>      (nil))
> (insn 211 210 212 (set (reg:V8HI 772)
>         (vec_merge:V8HI (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
>             (subreg:V8HI (reg:V4SI 464 [ D.15061 ]) 0)
>             (const_int 48 [0x30]))) ../regrename.c:1171 -1
>      (nil))
> ...
>
> On Tue, Dec 9, 2014 at 12:06 PM, Uros Bizjak <ubiz...@gmail.com> wrote:
>> On Tue, Dec 9, 2014 at 9:57 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>
>>>> The patch fix pblendv expand.
>>>> The bug was uncovered when permutation operands are constants.
>>>> In this case we init target register for expand_vec_perm_1 with
>>>> constant and then rewrite the target with constant for
>>>> expand_vec_perm_pblend.
>>>>
>>>> The patch fixes 403.gcc execution, compiled with -Ofast -funroll-loops
>>>> -flto -march=corei7.
>>>>
>>>> Bootstrap and make check passed.
>>>>
>>>> Is it ok?
>>>
>>> Please add a testcase.
>>
>> Also, it surprises me that we enter expand_vec_perm_pblendv with
>> uninitialized (?) target. Does your patch only papers over a real
>> problem up the call chain (hard to say without a testcase)?
>>
>> Uros.
>>
>>>
>>>>
>>>> Evgeny
>>>>
>>>> 2014-12-09  Evgeny Stupachenko  <evstu...@gmail.com>
>>>>
>>>> gcc/
>>>>         * config/i386/i386.c (expand_vec_perm_pblendv): Gen new rtx for
>>>>         expand_vec_perm_1 target.
>>>>
>>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>>> index eafc15a..5a914ad 100644
>>>> --- a/gcc/config/i386/i386.c
>>>> +++ b/gcc/config/i386/i386.c
>>>> @@ -47546,6 +47546,7 @@ expand_vec_perm_pblendv (struct expand_vec_perm_d 
>>>> *d)
>>>>      dcopy.op0 = dcopy.op1 = d->op1;
>>>>    else
>>>>      dcopy.op0 = dcopy.op1 = d->op0;
>>>> +  dcopy.target = gen_reg_rtx (vmode);
>>>>    dcopy.one_operand_p = true;
>>>>
>>>>    for (i = 0; i < nelt; ++i)

Reply via email to