https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
Bug ID: 66122 Summary: Bad uninlining decisions Product: gcc Version: 4.9.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: vda.linux at googlemail dot com Target Milestone: --- On linux kernel build, I found thousands of cases where functions which are expected (by programmer) to be inlined, aren't actually inlined. The following script is used to find them: nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^ *1 ' | sort -rn It caltually finds functions which have same name, size, and occur more than once. There are a few false positives, but vast majority of them are functions which were supposed to be inlined, but weren't: (Count) (size) (name) 473 000000000000000b t spin_unlock_irqrestore 449 000000000000005f t rcu_read_unlock 355 0000000000000009 t atomic_inc 353 000000000000006e t rcu_read_lock 350 0000000000000075 t rcu_read_lock_sched_held 291 000000000000000b t spin_unlock 266 0000000000000019 t arch_local_irq_restore 215 000000000000000b t spin_lock 180 0000000000000011 t kzalloc 165 0000000000000012 t list_add_tail 161 0000000000000019 t arch_local_save_flags 153 0000000000000016 t test_and_set_bit 134 000000000000000b t spin_unlock_irq 134 0000000000000009 t atomic_dec 130 000000000000000b t spin_unlock_bh 122 0000000000000010 t brelse 120 0000000000000016 t test_and_clear_bit 120 000000000000000b t spin_lock_irq 119 000000000000001e t get_dma_ops 117 0000000000000053 t cpumask_next 116 0000000000000036 t kref_get 114 000000000000001a t schedule_work 106 000000000000000b t spin_lock_bh 103 0000000000000019 t arch_local_irq_disable 98 0000000000000014 t atomic_dec_and_test 83 0000000000000020 t sg_page 81 0000000000000037 t cpumask_check 79 0000000000000036 t pskb_may_pull 72 0000000000000044 t perf_fetch_caller_regs 70 000000000000002f t cpumask_next 68 0000000000000036 t clk_prepare_enable 65 0000000000000018 t pci_write_config_byte 65 0000000000000013 t tasklet_schedule 61 0000000000000023 t init_completion 60 000000000000002b t trace_handle_return 59 0000000000000043 t nlmsg_trim 59 0000000000000019 t pci_read_config_dword 59 000000000000000c t slow_down_io ... ... Note tiny sizes of some functions. Let's take a look at atomic_inc: static inline void atomic_inc(atomic_t *v) { asm volatile(LOCK_PREFIX "incl %0" : "+m" (v->counter)); } You would imagine that this won't ever be deinlined, right? It's one assembly instruction. Well, it isn't always inlined. Here's the disassembly of vmlinux: ffffffff81003000 <atomic_inc>: ffffffff81003000: 55 push %rbp ffffffff81003001: 48 89 e5 mov %rsp,%rbp ffffffff81003004: f0 ff 07 lock incl (%rdi) ffffffff81003007: 5d pop %rbp ffffffff81003008: c3 retq This can be fixed using __always_inline, but kernel developers hesitate to slap thousands of __always_inline everywhere, the mood is that this is a compiler's fault and it should not be accomodated for, but fixed. This happens quite easily with -Os (IOW: with CC_OPTIMIZE_FOR_SIZE=y kernel build), but -O2 is not immune either. I found a file which exhibits an example of bad deinlining for both -O2 and -Os and I'm going to attach it.