Due to ABI differences, when a 64-bit Microsoft function calls and System V function, it must consider RSI, RDI and XMM6-15 as clobbered. Saving these registers can cost as much as 109 bytes and a similar amount for restoring. This patch set targets 64-bit Wine and aims to mitigate some of these costs by adding ms-->sysv save & restore stubs to libgcc, which are called from pro/epilogues rather than emitting the code inline. And since we're already tinkering with stubs, they will also manages the save/restore of up to 6 additional registers. Analysis of building Wine 64 demonstrates a reduction of .text by around 20%.

The basic theory is that a reduction of I-cache misses will offset the extra instructions required for implementation. And since there are only a handful of stubs that will be in memory, I'm using the larger mov instructions instead of push/pop to facilitate better parallelization. I have not yet produced actual performance data.

Here is a sample of some generated code:

Prologue:
   23c20:       48 8d 44 24 88          lea -0x78(%rsp),%rax
   23c25:       48 81 ec 08 01 00 00    sub    $0x108,%rsp
   23c2c:       e8 1a 4b 03 00          callq  5874b <__savms64_15>

Epilogue (r10 stores the value to restore the stack pointer to):
   23c7c:       48 8d b4 24 90 00 00    lea 0x90(%rsp),%rsi
   23c83:       00
   23c84:       4c 8d 56 78             lea 0x78(%rsi),%r10
   23c88:       e9 5e 4b 03 00          jmpq   587eb <__resms64x_15>

It would appear that forced stack realignment has become the new normal for Wine 64, since there are many Windows programs that violate the 16-byte alignment requirement, but just so *happen* to not crash on Windows (and therefore claim that Wine should work as Windows happens to behave given the UB).

Prologue, stack realignment case:
   23c20:       55                      push   %rbp
   23c21:       48 89 e5                mov    %rsp,%rbp
   23c24:       48 83 e4 f0             and $0xfffffffffffffff0,%rsp
   23c28:       48 8d 44 24 90          lea -0x70(%rsp),%rax
   23c2d:       48 81 ec 00 01 00 00    sub    $0x100,%rsp
   23c34:       e8 8e 43 03 00          callq  57fc7 <__savms64f_15>

Epilogue, stack realignment case:
   23c86:       48 8d b4 24 90 00 00    lea 0x90(%rsp),%rsi
   23c8d:       00
   23c8e:       e9 80 43 03 00          jmpq   58013 <__resms64fx_15>

No additional regression tests fail with this patch set. I have tested about 12 builds Wine (with varying optimizations & options) and no additional tests fails for that either. (Actually, there appears to be some type of regression prior to this patch set because it magically fixes about 30 failed Wine tests, that don't fail when building with Wine with gcc-5.4.0.)

Outstanding issues:

1. My x86 assembly expertise is limited, so I would appreciate
   examination of my stubs & emitted code!
2. Regression tests only run on my old Phenom. Have not yet tested on
   AVX cpu (which should use vmovaps instead of movaps).
3. My test program is inadequate (and is not included in this patch
   set) and needs a lot of cleanup.  During development it failed to
produce many optimization errors that I got when building Wine. I've been building 64-bit Wine and running Wine's tests in the mean
   time.
4. It would help to write a benchmarking program/script.
5. I haven't yet figured out how to get Wine building with -flto and I
   thus haven't tested how these changes affect it yet.
6. I'm not 100% certain yet, but the stubs __resms64f* (restore with
   hard frame pointer, but return to the function) doesn't appear to
   ever be used because enabling hard frame pointers disables sibling
   calls, which is what it's intended to facilitate.


gcc/config/i386/i386.c | 704 ++++++++++++++++++++++++++++++++++++++---
 gcc/config/i386/i386.h         |  22 +-
 gcc/config/i386/i386.opt       |   5 +
 gcc/config/i386/predicates.md  | 155 +++++++++
 gcc/config/i386/sse.md         |  46 +++
 gcc/doc/invoke.texi            |  11 +-
 libgcc/config.host             |   2 +-
 libgcc/config/i386/i386-asm.h  |  82 +++++
 libgcc/config/i386/resms64.S   |  63 ++++
 libgcc/config/i386/resms64f.S  |  59 ++++
 libgcc/config/i386/resms64fx.S |  61 ++++
 libgcc/config/i386/resms64x.S  |  65 ++++
 libgcc/config/i386/savms64.S   |  63 ++++
 libgcc/config/i386/savms64f.S  |  64 ++++
 libgcc/config/i386/t-msabi     |   7 +
 15 files changed, 1358 insertions(+), 51 deletions(-)


Changes in Version 2:

 * Added ChangeLogs (attached).
 * Changed option from -f to -m and moved from gcc/common.opt to
   gcc/config/i386/i386.opt.
 * Solved problem with uncombined SP modifications.
 * Optimization now works when hard frame pointers are used and stack
   realignment is not needed.
 * Added documentation to gcc/doc/invoke.texi

Feedback and comments would be most appreciated!

Thanks,
Daniel





        * config/i386/i386.opt: Add option -moutline-msabi-xlogues.

        * config/i386/i386.h
        (x86_64_ms_sysv_extra_clobbered_registers): Change type to unsigned.
        (NUM_X86_64_MS_CLOBBERED_REGS): New macro.
        (struct machine_function): Add new members outline_ms_sysv,
        outline_ms_sysv_pad_in, outline_ms_sysv_pad_out and
        outline_ms_sysv_extra_regs.

        * config/i386/i386.c
        (enum xlogue_stub): New enum.
        (enum xlogue_stub_sets): New enum.
        (class xlogue_layout): New class.
        (struct ix86_frame): Add outlined_save_offset member, modify comments
        to detail stack layout when using out-of-line stubs.
        (ix86_target_string): Add -moutline-msabi-xlogues option.

        (stub_managed_regs): New static variable.
        (ix86_save_reg): Add new parameter ignore_outlined to optionally omit
        registers managed by out-of-line stub.
        (ix86_nsaved_regs): Modify to accommodate changes to ix86_save_reg.
        (ix86_nsaved_sseregs): Likewise.
        (ix86_emit_save_regs): Likewise.
        (ix86_emit_save_regs_using_mov): Likewise.
        (ix86_emit_save_sse_regs_using_mov): Likewise.
        (get_scratch_register_on_entry): Likewise.
        (ix86_compute_frame_layout): Modify to disable m->outline_ms_sysv when
        appropriate and compute frame layout for out-of-line stubs.
        (gen_frame_set): New function.
        (gen_frame_load): Likewise.
        (gen_frame_store): Likewise.
        (emit_msabi_outlined_save): Likewise.
        (ix86_expand_prologue): Modify to call emit_msabi_outlined_save when
        appropriate.
        (ix86_emit_leave): Add parameter rtx_insn *insn, allowing it to be used
        to only generate the notes.
        (emit_msabi_outlined_restore): New function.
        (ix86_expand_epilogue): Modify to call emit_msabi_outlined_restore when
        appropriate.
        (ix86_expand_call): Modify to enable m->outline_ms_sysv when
        appropriate.

        * config/i386/predicates.md
        (save_multiple): New predicate.
        (restore_multiple): Likewise.
        * config/i386/sse.md
        (save_multiple<mode>): New pattern.
        (save_multiple_realign<mode>): Likewise.
        (restore_multiple<mode>): Likewise.
        (restore_multiple_and_return<mode>): Likewise.
        (restore_multiple_leave_return<mode>): Likewise.
        * config.host: Add i386/t-msabi to i386/t-linux file list.
        * config/i386/i386-asm.h: New file.
        * config/i386/resms64.S: New file.
        * config/i386/resms64f.S: New file.
        * config/i386/resms64fx.S: New file.
        * config/i386/resms64x.S: New file.
        * config/i386/savms64.S: New file.
        * config/i386/savms64f.S: New file.
        * config/i386/t-msabi: New file.

Reply via email to