On Fri, Apr 19, 2019 at 01:08:26PM -0700, Andrew Morton wrote:
> On Fri, 19 Apr 2019 23:03:43 +0300 Alexey Dobriyan <adobri...@gmail.com> 
> wrote:
> 
> > Get "current_pt_regs" pointer right before usage.
> > 
> > Space savings on x86_64:
> > 
> >     add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-180 (-180)
> >     Function                           old     new   delta
> >     load_elf_binary                   5806    5626    -180 !!!
> 
> -256 bytes with my setup.
> 
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -704,12 +704,12 @@ static int load_elf_binary(struct linux_binprm *bprm)
> >     unsigned long start_code, end_code, start_data, end_data;
> >     unsigned long reloc_func_desc __maybe_unused = 0;
> >     int executable_stack = EXSTACK_DEFAULT;
> > -   struct pt_regs *regs = current_pt_regs();
> >     struct {
> >             struct elfhdr elf_ex;
> >             struct elfhdr interp_elf_ex;
> >     } *loc;
> >     struct arch_elf_state arch_state = INIT_ARCH_ELF_STATE;
> > +   struct pt_regs *regs;
> >  
> >     loc = kmalloc(sizeof(*loc), GFP_KERNEL);
> >     if (!loc) {
> > @@ -1159,6 +1159,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> >                             MAP_FIXED | MAP_PRIVATE, 0);
> >     }
> >  
> > +   regs = current_pt_regs();
> >  #ifdef ELF_PLAT_INIT
> >     /*
> >      * The ABI may specify that certain registers be set up in special
> 
> Why the heck does this make such a difference?

Good question. Looks like compiler doesn't know that "current_pt_regs" is
stable pointer (because it doesn't know ->stack isn't) even though it knows
that "current" is stable pointer. So it saves it in the very beginning and
then tries to carry it through a lot of code.

Here is what happens here:

load_elf_binary()
                ...
        mov     rax,QWORD PTR gs:0x14c00
        mov     r13,QWORD PTR [rax+0x18]        r13 = current->stack
        call    kmem_cache_alloc                # first kmalloc

                [980 bytes later!]

        # let's spill that sucker because we need a register
        # for "load_bias" calculations at
        #
        #       if (interpreter) {
        #               load_bias = ELF_ET_DYN_BASE;
        #               if (current->flags & PF_RANDOMIZE)
        #                       load_bias += arch_mmap_rnd();
        #               elf_flags |= elf_fixed;
        #       }
        mov     QWORD PTR [rsp+0x68],r13

If this is not _the_ root cause it is still eeeeh.

After the patch things become much simpler:

        mov     rax, QWORD PTR gs:0x14c00       # current
        mov     rdx, QWORD PTR [rax+0x18]       # current->stack
        movq    [rdx+0x3fb8], 0                 # fill pt_regs
                ...
        call finalize_exec

Reply via email to