On Fri, Dec 18, 2009 at 6:11 PM, Richard Henderson <r...@twiddle.net> wrote: >> Also note that tcg_out_modrm will generate an unneeded prefix >> for some registers. cf. the patch I sent to the list months ago. > > Huh. Didn't notice since the disassembler printed what I expected to see. > Is fixing this at the same time a requirement for acceptance? > I'd prefer to tackle that separately, since no doubt it affects every use of > P_REXB.
I agree this change can be delayed. >>> + tgen_arithi32(s, ARITH_AND, arg0, 0xff); >> >> Wouldn't movzbl be better? > > Handled inside tgen_arithi32: > > } else if (c == ARITH_AND && val == 0xffu) { > /* movzbl */ > tcg_out_modrm(s, 0xb6 | P_EXT | P_REXB, r0, r0); > > I didn't feel the need to replicate that. Oups, I compared with my code which has an explicit mozbl :) >> Regarding the xor optimization, I tested it on my i7 and it was >> (very) slightly slower running a 64-bit SPEC2k gcc. > > Huh. It used to be recommended. The partial word store used to stall the > pipeline until the old value was ready, and the XOR was special-cased as a > clear, which broke both the input dependency and also prevented a > partial-register stall on the output. > > Actually, this recommendation is still present: Section 3.5.1.6 in the > November 2009 revision of the Intel Optimization Reference Manual. > > If it's all the same, I'd prefer to keep what I have there. All other > things being equal, the XOR is 2 bytes and the MOVZBL is 3. I agree too. Anyway my measure is not representative enough to mean anything. And in that case I think shorter code is better, so let's go for XOR. >>> +static void tcg_out_movcond(TCGContext *s, int cond, TCGArg arg0, >>> + TCGArg arg1, TCGArg arg2, int const_arg2, >>> + TCGArg arg3, TCGArg arg4, int rexw) >> >> Perhaps renaming arg0 to dest would make things slightly >> more readable. > > Ok. > >> You should also add a note stating that arg3 != arg4. > > I don't believe that's true though. It's caught immediately when we emit > the movcond opcode, but there's no check later once copy-propagation has > been done within TCG. > > I check for that in the i386 and sparc backends, because dest==arg3 && > dest==arg4 would actually generate incorrect code. Here in the x86_64 > backend, where we always use cmov it doesn't generate incorrect code, merely > inefficient. > > I could add an early out for that case, if you prefer. No, you can leave it as is unless someone else objects. Laurent