> I did bootstrapping and ran the testsuite on x86(-64), aarch64, Power9 > and s390. Everything looks good except two additional fails on x86 > where code actually looks worse. > > gcc.target/i386/keylocker-encodekey128.c > > 17c17,18 > < movaps %xmm4, k2(%rip) > --- >> pxor %xmm0, %xmm0 >> movaps %xmm0, k2(%rip) > > gcc.target/i386/keylocker-encodekey256.c: > > 19c19,20 > < movaps %xmm4, k3(%rip) > --- >> pxor %xmm0, %xmm0 >> movaps %xmm0, k3(%rip)
Before the patch and after postreload we have: (insn (set (reg:V2DI xmm0) (reg:V2DI xmm4)) (expr_list:REG_DEAD (reg:V2DI 24 xmm4) (expr_list:REG_EQUIV (const_vector:V2DI [ (const_int 0 [0]) repeated x2 ]))))) (insn (set (mem/c:V2DI (symbol_ref:DI ("k2")) (reg:V2DI xmm0)))) which is converted by cprop_hardreg to: (insn (set (mem/c:V2DI (symbol_ref:DI ("k2"))) (reg:V2DI xmm4)))) With the change there is: (insn (set (reg:V2DI xmm0) (const_vector:V2DI [ (const_int 0 [0]) repeated x2 ]))) (insn (set (mem/c:V2DI (symbol_ref:DI ("k2"))) (reg:V2DI xmm0)))) which is not simplified further because xmm0 needs to be explicitly zeroed while xmm4 is assumed to be zeroed by encodekey128. I'm not familiar with this so I'm supposing this is correct even though I found "XMM4 through XMM6 are reserved for future usages and software should not rely upon them being zeroed." online. Even inf xmm4 were zeroed explicity, I guess in this case the simple costing of mov reg,reg vs mov reg,imm (with the latter not being more expensive) falls short? cprop_hardreg can actually propagate the zeroed xmm4 into the next move. The same mechanism could possibly even elide many such moves which would mean we'd unnecessarily emit many mov reg,0? Hmm...