https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106069

--- Comment #19 from Xionghu Luo (luoxhu at gcc dot gnu.org) <yinyuefengyi at 
gmail dot com> ---
(In reply to Xionghu Luo (luo...@gcc.gnu.org) from comment #15)
> In combine: vec_select(vec_concat and the followed vec_select are combined
> to a single extract instruction, which seems reasonable for both LE and BE?
> 
> R146:   0 1 2 3
> R141:   4 5 6 7
> R150:   2 6 3 7    // vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7])
> R151:   R150[3]    // vec_select(r150:V4SI,3)
> 
> => 
> 
> R151:   R141[3]   //  vec_select(r141:V4SI,3)
> 
>   
> 
> Trying 21 -> 24:
>    21: r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel)
>       REG_DEAD r146:V4SI
>       REG_DEAD r141:V4SI
>    24: {r151:SI=vec_select(r150:V4SI,parallel);clobber scratch;}
> Failed to match this instruction:
> (parallel [
>         (set (reg:SI 151)
>             (vec_select:SI (reg:V4SI 141)
>                 (parallel [
>                         (const_int 3 [0x3])
>                     ])))
>         (clobber (scratch:SI))
>         (set (reg:V4SI 150)
>             (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
>                     (reg:V4SI 141))
>                 (parallel [
>                         (const_int 2 [0x2])
>                         (const_int 6 [0x6])
>                         (const_int 3 [0x3])
>                         (const_int 7 [0x7])
>                     ])))
>     ])
> Failed to match this instruction:
> (parallel [
>         (set (reg:SI 151)
>             (vec_select:SI (reg:V4SI 141)
>                 (parallel [
>                         (const_int 3 [0x3])
>                     ])))
>         (set (reg:V4SI 150)
>             (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
>                     (reg:V4SI 141))
>                 (parallel [
>                         (const_int 2 [0x2])
>                         (const_int 6 [0x6])
>                         (const_int 3 [0x3])
>                         (const_int 7 [0x7])
>                     ])))
>     ])
> Successfully matched this instruction:
> (set (reg:V4SI 150)
>     (vec_select:V4SI (vec_concat:V8SI (reg:V4SI 146)
>             (reg:V4SI 141))
>         (parallel [
>                 (const_int 2 [0x2])
>                 (const_int 6 [0x6])
>                 (const_int 3 [0x3])
>                 (const_int 7 [0x7])
>             ])))
> Successfully matched this instruction:
> (set (reg:SI 151)
>     (vec_select:SI (reg:V4SI 141)
>         (parallel [
>                 (const_int 3 [0x3])
>             ])))
> allowing combination of insns 21 and 24
> original costs 4 + 4 = 8
> replacement costs 4 + 4 = 8
> modifying insn i2    21:
> r150:V4SI=vec_select(vec_concat(r146:V4SI,r141:V4SI),parallel)
>       REG_DEAD r146:V4SI
> deferring rescan insn with uid = 21.
> modifying insn i3    24: {r151:SI=vec_select(r141:V4SI,parallel);clobber
> scratch;}
>       REG_DEAD r141:V4SI
> deferring rescan insn with uid = 24.
> 
> 
> I guess the previous unspec implementation bypassed the LE + LE swap check,
> so now in split2, we should generate vextuwlx instead of vextuwrx on little
> endian?


This nested vec_select+vec_select+vec_concat optimization is introduced by Uros
in simplify-rtx.c by PR32661, unfortunately it only works for Power BE
platforms, disable that piece of code could work due to not combined the nested
vec_select optimizations...

For Power LE, firstly:

Trying 21 -> 24:

 R146:   3 2 1 0
 R141:   7 6 5 4
 R150:   7 3 6 2    // vec_select(vec_concat(r146:V4SI,r141:V4SI),[2 6 3 7])
 R151:   R150[3]    // vec_select(r150:V4SI,3)

 => 

currently:
 R151:   R141[3]   //  vec_select(r141:V4SI,3)

But it should be:
 R151:   R146[3]   //  vec_select(r146:V4SI,3)

Which means current:

R151: R150[3] R141[3]
R153: R150[2] R146[3]
R155: R150[1] R141[2]
R157: R150[0] R146[2]

Should be optimized to after the first nested vec_select optimization:

R151: R150[3] R146[3]
R153: R150[2] R141[3]
R155: R150[1] R146[2]
R157: R150[0] R141[2]

With some little endian check and swap could achieve the result (swap op00 and
op01).  But
Secondly there is another "nested vec_select" optimisation which produces
R151=R165[3]:

Trying 21 -> 26:
...

R146 R165 R163 [7 3 6 2]
R151: R146[3]   =>  R165[3]  (this is wrong!)

While R162, R163, R164, R165 is input value R0 R1 R2 R3.  the
vsx_extract_v4si_di_p9 index should be "0" instead of "3".

correct should be:

R151: R165[0]
R153: R164[0]
R155: R163[0]
R157: R162[0]


(insn 44 2 4 2 (set (reg:V4SI 162)
        (reg:V4SI 66 2 [ R0 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}
     (expr_list:REG_DEAD (reg:V4SI 66 2 [ R0 ])
        (nil)))
(note 4 44 45 2 NOTE_INSN_DELETED)
(insn 45 4 5 2 (set (reg:V4SI 163)
        (reg:V4SI 67 3 [ R1 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}
     (expr_list:REG_DEAD (reg:V4SI 67 3 [ R1 ])
        (nil)))
(note 5 45 46 2 NOTE_INSN_DELETED)
(insn 46 5 6 2 (set (reg:V4SI 164)
        (reg:V4SI 68 4 [ R2 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}
     (expr_list:REG_DEAD (reg:V4SI 68 4 [ R2 ])
        (nil)))
(note 6 46 47 2 NOTE_INSN_DELETED)
(insn 47 6 7 2 (set (reg:V4SI 165)
        (reg:V4SI 69 5 [ R3 ])) "q.C":36:1 1157 {vsx_movv4si_64bit}
     (expr_list:REG_DEAD (reg:V4SI 69 5 [ R3 ])
        (nil)))
...
(insn 33 32 34 2 (parallel [
            (set (reg:DI 7 7)
                (zero_extend:DI (vec_select:SI (reg:V4SI 162)
                        (parallel [
                                (const_int 3 [0x3])
                            ]))))
            (clobber (scratch:SI))
        ]) "q.C":28:10 1396 {*vsx_extract_v4si_di_p9}
     (expr_list:REG_DEAD (reg:V4SI 162)
        (nil)))
(insn 34 33 35 2 (parallel [
            (set (reg:DI 6 6)
                (zero_extend:DI (vec_select:SI (reg:V4SI 163)
                        (parallel [
                                (const_int 3 [0x3])
                            ]))))
            (clobber (scratch:SI))
        ]) "q.C":28:10 1396 {*vsx_extract_v4si_di_p9}
     (expr_list:REG_DEAD (reg:V4SI 163)
        (nil)))
(insn 35 34 36 2 (parallel [
            (set (reg:DI 5 5)
                (zero_extend:DI (vec_select:SI (reg:V4SI 164)
                        (parallel [
                                (const_int 3 [0x3])
                            ]))))
            (clobber (scratch:SI))
        ]) "q.C":28:10 1396 {*vsx_extract_v4si_di_p9}
     (expr_list:REG_DEAD (reg:V4SI 164)
        (nil)))
(insn 36 35 37 2 (parallel [
            (set (reg:DI 4 4)
                (zero_extend:DI (vec_select:SI (reg:V4SI 165)
                        (parallel [
                                (const_int 3 [0x3])
                            ]))))
            (clobber (scratch:SI))
        ]) "q.C":28:10 1396 {*vsx_extract_v4si_di_p9}
     (expr_list:REG_DEAD (reg:V4SI 165)
        (nil)))



But this is not easy to change the index again... Is the analysis reasonable?
@Segher.

Reply via email to