98694] Fix incorrect optimization by cprop_hardreg.

Richard Sandiford via Gcc-patches Tue, 19 Jan 2021 08:10:44 -0800

Jakub Jelinek via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> On Tue, Jan 19, 2021 at 12:38:47PM +0000, Richard Sandiford via Gcc-patches 
> wrote:
>> > actually only the lower 16bits are needed, the original insn is like
>> >
>> > .294.r.ira
>> > (insn 69 68 70 13 (set (reg:HI 96 [ _52 ])
>> >         (subreg:HI (reg:DI 82 [ var_6.0_1 ]) 0)) "test.c":21:23 76
>> > {*movhi_internal}
>> >      (nil))
>> > (insn 78 75 82 13 (set (reg:V4HI 140 [ _283 ])
>> >         (vec_duplicate:V4HI (truncate:HI (subreg:SI (reg:HI 96 [ _52
>> > ]) 0)))) 1412 {*vec_dupv4hi}
>> >      (nil))
>> >
>> > .295r.reload
>> > (insn 69 68 70 13 (set (reg:HI 5 di [orig:96 _52 ] [96])
>> >         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "test.c":21:23 76
>> > {*movhi_internal}
>> >      (nil))
>> > (insn 489 75 78 13 (set (reg:SI 22 xmm2 [297])
>> >         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>> >      (nil))
>> > (insn 78 489 490 13 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>> >         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297]))))
>> > 1412 {*vec_dupv4hi}
>> >      (nil))
>> >
>> > and insn 489 is created by lra/reload which seems ok for the sequence,
>> > but problemistic with considering the logic of hardreg_cprop.
>> 
>> It looks OK even with the regcprop behaviour though:
>> 
>> - insn 69 defines only the low 16 bits of di,
>> - insn 489 defines only the low 16 bits of xmm2, but copies bits 16-31
>>   too (with unknown contents)
>> - insn 78 uses only the low 16 bits of xmm2 (the unknown contents
>>   introduced by insn 489 are truncated away)
>> 
>> So where do bits 16-31 become significant?  What goes wrong if they're
>> not zero?
>
> The k0 register is initialized I believe with
> (insn 20 2 21 2 (set (reg:DI 68 k0 [orig:82 var_6.0_1 ] [82])
>         (mem/c:DI (symbol_ref:DI ("var_6") [flags 0x40]  <var_decl 
> 0x7f7babeaaf30 var_6>) [3 var_6+0 S8 A64])) "pr98694.C":21:10 74 
> {*movdi_internal}
>      (nil))
> and so it contains all 64-bits, and then the code sometimes uses all the
> bits, sometimes just the low 16-bits and sometimes low 32-bits of that
> value.
> (insn 69 68 70 12 (set (reg:HI 5 di [orig:96 _52 ] [96])
>         (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":27:23 76 
> {*movhi_internal}
>      (nil))
> (insn 74 73 75 12 (set (reg:SI 36 r8 [orig:149 _52 ] [149])
>         (zero_extend:SI (reg:HI 68 k0 [orig:82 var_6.0_1 ] [82]))) 144 
> {*zero_extendhisi2}
>      (nil))
> (insn 489 75 78 12 (set (reg:SI 22 xmm2 [297])
>         (reg:SI 5 di [orig:96 _52 ] [96])) 75 {*movsi_internal}
>      (nil))
> (insn 78 489 490 12 (set (reg:V4HI 20 xmm0 [orig:140 _283 ] [140])
>         (vec_duplicate:V4HI (truncate:HI (reg:SI 22 xmm2 [297])))) 1412 
> {*vec_dupv4hi}
>      (expr_list:REG_DEAD (reg:SI 22 xmm2 [297])
>         (nil)))
> are examples when it uses only the low 16 bits from that, and
> (insn 487 72 73 12 (set (reg:SI 1 dx [148])
>         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
>      (nil))
>
> (insn 85 84 491 13 (set (reg:SI 37 r9 [orig:86 _11 ] [86])
>         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) "pr98694.C":28:14 75 
> {*movsi_internal}
>      (nil))
>
> (insn 491 85 88 13 (set (reg:SI 3 bx [299])
>         (reg:SI 68 k0 [orig:82 var_6.0_1 ] [82])) 75 {*movsi_internal}
>      (nil))
> (insn 88 491 89 13 (set (reg:CCNO 17 flags)
>         (compare:CCNO (reg:SI 3 bx [299])
>             (const_int 0 [0]))) 7 {*cmpsi_ccno_1}
>      (expr_list:REG_DEAD (reg:SI 3 bx [299])
>         (nil)))
>
> (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
>         (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 
> {*movsi_internal}
>      (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
>         (nil)))
> are examples where it uses low 32-bits from k0.
> So the
>  (insn 457 499 460 33 (set (reg:SI 39 r11 [orig:86 _11 ] [86])
> -        (reg:SI 37 r9 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 
> {*movsi_internal}
> -     (expr_list:REG_DEAD (reg:SI 37 r9 [orig:86 _11 ] [86])
> +        (reg:SI 22 xmm2 [orig:86 _11 ] [86])) "pr98694.C":35:36 75 
> {*movsi_internal}
> +     (expr_list:REG_DEAD (reg:SI 22 xmm2 [orig:86 _11 ] [86])
>          (nil)))
> cprop_hardreg change indeed looks bogus, while xmm2 has SImode, it holds
> only the low 16-bits of the value and has the upper bits undefined, while r9
> it is replacing had all of the low 32-bits well defined.


Ah, ok, thanks for the extra context.

So AIUI the problem when recording xmm2<-di isn't just:

 [A] partial_subreg_p (vd->e[sr].mode, GET_MODE (src))

but also that:

 [B] partial_subreg_p (vd->e[sr].mode, vd->e[vd->e[sr].oldest_regno].mode)

For example, all registers in this sequence can be part of the same chain:

    (set (reg:HI R1) (reg:HI R0))
    (set (reg:SI R2) (reg:SI R1)) // [A]
    (set (reg:DI R3) (reg:DI R2)) // [A]
    (set (reg:SI R4) (reg:SI R[0-3]))
    (set (reg:HI R5) (reg:HI R[0-4]))

But:

    (set (reg:SI R1) (reg:SI R0))
    (set (reg:HI R2) (reg:HI R1))
    (set (reg:SI R3) (reg:SI R2)) // [A] && [B]

is problematic because it dips below the precision of the oldest regno
and then increases again.

When this happens, I guess we have two choices:

(1) what the patch does: treat R3 as the start of a new chain.
(2) pretend that the copy occured in vd->e[sr].mode instead
    (i.e. copy vd->e[sr].mode to vd->e[dr].mode)

I guess (2) would need to be subject to REG_CAN_CHANGE_MODE_P.
Maybe the optimisation provided by (2) compared to (1) isn't common
enough to be worth the complication.

I think we should test [B] as well as [A] though.  The pass is set
up to do some quite elaborate mode changes and I think rejecting
[A] on its own would make some of the other code redundant.
It also feels like it should be a seperate “if” or “else if”,
with its own comment.

Thanks,
Richard

Re: [PATCH] [PR rtl/optimization/98694] Fix incorrect optimization by cprop_hardreg.

Reply via email to