> I did bootstrapping and ran the testsuite on x86(-64), aarch64, Power9
> and s390.  Everything looks good except two additional fails on x86
> where code actually looks worse.
> 
> gcc.target/i386/keylocker-encodekey128.c
> 
> 17c17,18
> <       movaps  %xmm4, k2(%rip)
> ---
>>       pxor    %xmm0, %xmm0
>>       movaps  %xmm0, k2(%rip)
> 
> gcc.target/i386/keylocker-encodekey256.c:
> 
> 19c19,20
> <       movaps  %xmm4, k3(%rip)
> ---
>>       pxor    %xmm0, %xmm0
>>       movaps  %xmm0, k3(%rip)

Before the patch and after postreload we have:

(insn (set (reg:V2DI xmm0)
        (reg:V2DI xmm4))
     (expr_list:REG_DEAD (reg:V2DI 24 xmm4)
        (expr_list:REG_EQUIV (const_vector:V2DI [
                    (const_int 0 [0]) repeated x2
                ])))))
(insn (set (mem/c:V2DI (symbol_ref:DI ("k2"))
        (reg:V2DI xmm0))))

which is converted by cprop_hardreg to:

(insn (set (mem/c:V2DI (symbol_ref:DI ("k2")))
        (reg:V2DI xmm4))))

With the change there is:

(insn (set (reg:V2DI xmm0)
        (const_vector:V2DI [
                (const_int 0 [0]) repeated x2
            ])))
(insn (set (mem/c:V2DI (symbol_ref:DI ("k2")))
        (reg:V2DI xmm0))))

which is not simplified further because xmm0 needs to be explicitly
zeroed while xmm4 is assumed to be zeroed by encodekey128.  I'm not
familiar with this so I'm supposing this is correct even though I found
"XMM4 through XMM6 are reserved for future usages and software should
not rely upon them being zeroed." online.

Even inf xmm4 were zeroed explicity, I guess in this case the simple
costing of mov reg,reg vs mov reg,imm (with the latter not being more
expensive) falls short?  cprop_hardreg can actually propagate the zeroed
xmm4 into the next move.
The same mechanism could possibly even elide many such moves which would
mean we'd unnecessarily emit many mov reg,0?  Hmm...

Reply via email to