On Thu, Jul 24, 2014 at 6:25 PM, Michel Dänzer <mic...@daenzer.net> wrote: > > Attached is fair.s from Debian gcc 4.8.3-5. Does that look better? I'm > going to try reproducing the problem with a kernel built by that now.
This looks better. For roughly that same code sequence it does (ignoring the debug line and cfi information): subq $184, %rsp #, movq (%r12), %rax # sd_22(D)->parent, sd_parent movl %edi, -156(%rbp) # this_cpu, %sfp movl %ecx, -160(%rbp) # idle, %sfp movq %r8, -184(%rbp) # continue_balancing, %sfp movq %rax, -176(%rbp) # sd_parent, %sfp movq $load_balance_mask, %rax #, tcp_ptr__ #APP add %gs:this_cpu_off, %rax # this_cpu_off, tcp_ptr__ #NO_APP so it updates the stack pointer before any spills, and it also doesn't spill that constant value. I still have no idea why it does the 4-byte rep stosl/movsl thing, but that's a whole separate guessing game and might have something to do with the fact that you do CONFIG_CC_OPTIMIZE_FOR_SIZE and the 4-byte form is one byte smaller. I'm a big believer in not blowing up the I$ footprint, and I have to admit to pushing that myself a few years ago, but gcc does some rather bad things with '-Os', so it's not actually suggested for the kernel any more. I wish there was some middle ground model that cared about size, but not to exclusion of everything else. The string instructions are not good for performance when it's a compile-time known small size. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/