On 08/27/2013 03:46 AM, Aurelien Jarno wrote: > On the other hand calling the helper is the exception more than the > rule (that's why they have been moved at the end of the TB), so we > should not look to much at generating fast code, but rather small code > in order to use the caches (both TB and CPU caches) more efficiently. > > Therefore even on x86, if we move the -1 at the helper level, it should > be possible to use a tail call for the stores, something like: > > mov %r14,%rdi > mov %ebx,%edx > xor %ecx,%ecx > lea -0x10f(%rip),%r8 # 0x7f2541a6f69a > pushq %r8 > jmpq 0x7f25526757a0 > > Instead of: > > mov %r14,%rdi > mov %ebx,%edx > xor %ecx,%ecx > lea -0x10f(%rip),%r8 # 0x7f2541a6f69a > callq 0x7f25526757a0 > jmpq 0x7f2541a6f69b
Fair enough. I'll have a go at some follow-ups then. r~