https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122

            Bug ID: 66122
           Summary: Bad uninlining decisions
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: vda.linux at googlemail dot com
  Target Milestone: ---

On linux kernel build, I found thousands of cases where functions which are
expected (by programmer) to be inlined, aren't actually inlined.

The following script is used to find them:

nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^ *1 ' | sort -rn

It caltually finds functions which have same name, size, and occur more than
once. There are a few false positives, but vast majority of them are functions
which were supposed to be inlined, but weren't:

(Count) (size)             (name)
    473 000000000000000b t spin_unlock_irqrestore
    449 000000000000005f t rcu_read_unlock
    355 0000000000000009 t atomic_inc
    353 000000000000006e t rcu_read_lock
    350 0000000000000075 t rcu_read_lock_sched_held
    291 000000000000000b t spin_unlock
    266 0000000000000019 t arch_local_irq_restore
    215 000000000000000b t spin_lock
    180 0000000000000011 t kzalloc
    165 0000000000000012 t list_add_tail
    161 0000000000000019 t arch_local_save_flags
    153 0000000000000016 t test_and_set_bit
    134 000000000000000b t spin_unlock_irq
    134 0000000000000009 t atomic_dec
    130 000000000000000b t spin_unlock_bh
    122 0000000000000010 t brelse
    120 0000000000000016 t test_and_clear_bit
    120 000000000000000b t spin_lock_irq
    119 000000000000001e t get_dma_ops
    117 0000000000000053 t cpumask_next
    116 0000000000000036 t kref_get
    114 000000000000001a t schedule_work
    106 000000000000000b t spin_lock_bh
    103 0000000000000019 t arch_local_irq_disable
     98 0000000000000014 t atomic_dec_and_test
     83 0000000000000020 t sg_page
     81 0000000000000037 t cpumask_check
     79 0000000000000036 t pskb_may_pull
     72 0000000000000044 t perf_fetch_caller_regs
     70 000000000000002f t cpumask_next
     68 0000000000000036 t clk_prepare_enable
     65 0000000000000018 t pci_write_config_byte
     65 0000000000000013 t tasklet_schedule
     61 0000000000000023 t init_completion
     60 000000000000002b t trace_handle_return
     59 0000000000000043 t nlmsg_trim
     59 0000000000000019 t pci_read_config_dword
     59 000000000000000c t slow_down_io
...
...

Note tiny sizes of some functions. Let's take a look at atomic_inc:

static inline void atomic_inc(atomic_t *v)
{
        asm volatile(LOCK_PREFIX "incl %0"
                     : "+m" (v->counter));
}

You would imagine that this won't ever be deinlined, right? It's one assembly
instruction. Well, it isn't always inlined. Here's the disassembly of vmlinux:

ffffffff81003000 <atomic_inc>:
ffffffff81003000:       55                      push   %rbp
ffffffff81003001:       48 89 e5                mov    %rsp,%rbp
ffffffff81003004:       f0 ff 07                lock incl (%rdi)
ffffffff81003007:       5d                      pop    %rbp
ffffffff81003008:       c3                      retq

This can be fixed using __always_inline, but kernel developers hesitate to slap
thousands of __always_inline everywhere, the mood is that this is a compiler's
fault and it should not be accomodated for, but fixed.

This happens quite easily with -Os (IOW: with CC_OPTIMIZE_FOR_SIZE=y kernel
build), but -O2 is not immune either.

I found a file which exhibits an example of bad deinlining for both -O2 and -Os
and I'm going to attach it.

Reply via email to