I'm experimenting with ways to optimize wine (x86 target only) and I believe I can shrink wine's total text size by around 7% by outlining the lengthy pro- and epilogues required for ms_abi functions making sysv_abi calls. Theoretically, fewer instruction cache misses will offset the extra 4 instructions per function and result in a net performance gain. However, I'm new to the gcc project and a novice x86 assembly programmer as well (have been wanting to work on gcc for a while now!) In short, I want to:

1. Replace the prologue that pushes di, sp and xmm6-15 with a single call to a global "ms_abi_push_regs" routine 2. Replace the epilogue that pops these regs with a jmp to a global "ms_abi_pop_regs" routine
3. Add the two routines somewhere so that they are linked into the output.

I have this working in a small-scale experiment (writing the ms_abi function in assembly), but I'm not certain how I would add these routines. Should I make them built-ins?

I have found the code that adds the clobber RTL instructions in ix86_expand_call() (gcc/config/i386/i386.c:25832), and I see that thread_prologue_and_epilogue_insns() (gcc/function.c) is where these clobbers are expanded into the prologue and epilogue, but I'm not sure what the cleanest way to convert this is. My thought was to replace the clobber_reg() calls with one that would add an insn_call, or would it be better to do this in thread_prologue_and_epilogue_insns() where prologue and epilogue generation belongs? But that function is for all targets. Any pointers greatly appreciated!

For reference, this is my 64-bit test case:

outline_test.h:
extern void my_sysv_func(void);
extern int __attribute__((ms_abi)) my_ms_abi_func(void);

outline_test_asm.s:
.global ms_abi_push_regs
.global ms_abi_pop_regs
.global my_ms_abi_func

ms_abi_push_regs:
    pop    %rax
    push   %rdi
    push   %rsi
    sub    $0xa8,%rsp
    movaps %xmm6,(%rsp)
    movaps %xmm7,0x10(%rsp)
    movaps %xmm8,0x20(%rsp)
    movaps %xmm9,0x30(%rsp)
    movaps %xmm10,0x40(%rsp)
    movaps %xmm11,0x50(%rsp)
    movaps %xmm12,0x60(%rsp)
    movaps %xmm13,0x70(%rsp)
    movaps %xmm14,0x80(%rsp)
    movaps %xmm15,0x90(%rsp)
    jmp   *(%rax)

ms_abi_pop_regs:
    movaps (%rsp),%xmm6
    movaps 0x10(%rsp),%xmm7
    movaps 0x20(%rsp),%xmm8
    movaps 0x30(%rsp),%xmm9
    movaps 0x40(%rsp),%xmm10
    movaps 0x50(%rsp),%xmm11
    movaps 0x60(%rsp),%xmm12
    movaps 0x70(%rsp),%xmm13
    movaps 0x80(%rsp),%xmm14
    movaps 0x90(%rsp),%xmm15
    add    $0xa8,%rsp
    pop    %rsi
    pop    %rdi
    retq

my_ms_abi_func:
    callq ms_abi_push_regs
    callq my_sysv_func
    xor %eax, %eax
    jmp ms_abi_pop_regs

Thanks!
Daniel

Reply via email to