On Thu, Dec 18, 2014 at 3:09 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>> The Linux kernel never passes floating point arguments around, vararg >>>>> functions or not. Hence no vector registers are ever used when calling a >>>>> vararg function. But gcc still dutifully emits an "xor %eax,%eax" before >>>>> each and every call of a vararg function. Since no callee use that for >>>>> anything, these instructions are redundant. >>>>> >>>>> This patch adds the -mskip-rax-setup option to skip setting up RAX >>>>> register when SSE is disabled and there are no variable arguments passed >>>>> in vector registers. Since RAX register is used to avoid unnecessarily >>>>> saving vector registers on stack when passing variable arguments, the >>>>> impacts of this option are callees may waste some stack space, misbehave >>>>> or jump to a random location. GCC 4.4 or newer don't those issues, >>>>> regardless the RAX register value since they don't check the RAX register >>>>> value when SSE is disabled, regardless the RAX register value: >>>>> >>>>> https://gcc.gnu.org/ml/gcc-patches/2008-09/msg00127.html >>>>> >>>>> I used it on kernel 3.17.7: >>>>> >>>>> text data bss dec hex filename >>>>> 11493571 2271232 5926912 19691715 12c78c3 vmlinux.skip-rax >>>>> 11517879 2271232 5926912 19716023 12cd7b7 vmlinux.orig >>>>> >>>>> It removed 14309 redundant "xor %eax,%eax" instructions and saved about >>>>> 27KB. I am currently running the new kernel without any problem. OK >>>>> for trunk? >>>> >>>> How about skipping RAX setup unconditionally for !TARGET_SSE? Please >>>> see ix86_conditional_register_usage, where SSE registers are squashed >>>> for !TARGET_SSE, so it is not possible to use them even in the inline >>>> asm. >>> >>> ... when -ffreestanding is in effect, of course. >> >> Ops, this is not the unconditional default kernel compile flag. It is >> defined only for 32bit builds, where: >> >> # temporary until string.h is fixed >> KBUILD_CFLAGS += -ffreestanding >> >> Yes, it looks to me that new option is the way to go. > > Is this an OK? In principle, I'm OK with the patch approach, but let's wait for eventual comments from Linux people. > Some really old gcc versions used an indirect jump based on the eax input > and they didn't zero extend first. So with those compilers, you could > actually > jump to a random location. You can enable it only when you can compile > everything with a newer GCC. Uros.