On Tue, Aug 27, 2013 at 07:53:56AM -0700, Richard Henderson wrote: > On 08/27/2013 03:46 AM, Aurelien Jarno wrote: > > On the other hand calling the helper is the exception more than the > > rule (that's why they have been moved at the end of the TB), so we > > should not look to much at generating fast code, but rather small code > > in order to use the caches (both TB and CPU caches) more efficiently. > > > > Therefore even on x86, if we move the -1 at the helper level, it should > > be possible to use a tail call for the stores, something like: > > > > mov %r14,%rdi > > mov %ebx,%edx > > xor %ecx,%ecx > > lea -0x10f(%rip),%r8 # 0x7f2541a6f69a > > pushq %r8 > > jmpq 0x7f25526757a0 > > > > Instead of: > > > > mov %r14,%rdi > > mov %ebx,%edx > > xor %ecx,%ecx > > lea -0x10f(%rip),%r8 # 0x7f2541a6f69a > > callq 0x7f25526757a0 > > jmpq 0x7f2541a6f69b > > Fair enough. I'll have a go at some follow-ups then. >
I think this can also be done in a second time. Do you want to create a version 3, or should I just process the current pull request and you will provide additional patches later? -- Aurelien Jarno GPG: 1024D/F1BCDB73 aurel...@aurel32.net http://www.aurel32.net