On 2015-07-18 23:18, Aurelien Jarno wrote: > On 2015-07-18 08:58, Richard Henderson wrote: > > Enforce the invariant that 32-bit quantities are zero extended > > in the register. This avoids having to re-zero-extend at memory > > accesses for 32-bit guests. > > > > Signed-off-by: Richard Henderson <r...@twiddle.net> > > --- > > Here's an alternative to the other things we've been considering. > > We could even make this conditional on USER_ONLY if you like. > > > > This does in fact fix the mips test case. Consider the fact that > > memory operations are probably more common than truncations, and > > it would seem that we have a net size win by forcing the truncate > > over adding a byte for the ADDR32 (or 2 bytes for a zero-extend). > > I think we should go with your previous patch for 2.4, and think calmly > about how to do that better for 2.5. It slightly increases the generated > code, but only in bytes, not in number of instructions, so I don't think > the performance impact is huge. > > > Indeed, for 2.5, we could look at dropping the existing zero-extend > > from the softmmu path. Also for 2.5, split trunc_shr into two parts, > > From a quick look, we need to move the address to new registers anyway, > so not zero-extending will mean adding the REXW prefix.
Well looking more in details, we can move one instruction from the fast-path to the slow-path. Here is a typical TLB code for store: fast-path: mov %rbp,%rdi mov %rbp,%rsi shr $0x7,%rdi and $0xfffffffffffff003,%rsi and $0x1fe0,%edi lea 0x4e68(%r14,%rdi,1),%rdi cmp (%rdi),%rsi mov %rbp,%rsi jne 0x7f45b8bcc800 add 0x10(%rdi),%rsi mov %ebx,(%rsi) slow-path: mov %r14,%rdi mov %ebx,%edx mov $0x22,%ecx lea -0x156(%rip),%r8 push %r8 jmpq 0x7f45cb337010 If we know that %rbp is properly zero-extend when needed, we can change the end of the fast path into: cmp (%rdi),%rsi jne 0x7f45b8bcc800 mov 0x10(%rdi),%rsi mov %ebx,(%rsi,%rbp,1) However that means that %rsi is not loaded anymore with the address, so we have to load it in the slow path. At the end it means moving one instruction from the fast-path to the slow-path. Now I have no idea what would really improve the performances. Smaller fast-path so there are less instructions to execute? Smaller code in general so that the caches are better used? -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurel...@aurel32.net http://www.aurel32.net