On Mon, Aug 26, 2013 at 03:34:15PM -0700, Richard Henderson wrote: > On 08/26/2013 03:26 PM, Paolo Bonzini wrote: > > Something that can be done on top of this patch: what about moving the > > "-1" to helper_ret_*? It is common to pretty much all the targets > > (except ARM has -2), and it would allow some simplifications. > > I suppose so, yes. > > > li rN, retaddr > > mtlr rN > > b st_trampoline[i] > > > > sequence instead of one of > > > > li rN, retaddr > > mtlr rN > > bl st_trampoline[i] > > b retaddr > > This sort of thing is very difficult to evaluate, because of the > cpu's return address prediction stack. I have so far avoided it. > > The only cpus that I believe can make good use of tail calls into > the memory helpers are those with predicated stores and calls, i.e. > arm and ia64. >
On the other hand calling the helper is the exception more than the rule (that's why they have been moved at the end of the TB), so we should not look to much at generating fast code, but rather small code in order to use the caches (both TB and CPU caches) more efficiently. Therefore even on x86, if we move the -1 at the helper level, it should be possible to use a tail call for the stores, something like: mov %r14,%rdi mov %ebx,%edx xor %ecx,%ecx lea -0x10f(%rip),%r8 # 0x7f2541a6f69a pushq %r8 jmpq 0x7f25526757a0 Instead of: mov %r14,%rdi mov %ebx,%edx xor %ecx,%ecx lea -0x10f(%rip),%r8 # 0x7f2541a6f69a callq 0x7f25526757a0 jmpq 0x7f2541a6f69b -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net