On Tue, May 05, 2020 at 11:13:53AM -0700, Nick Desaulniers wrote:
> On Tue, May 5, 2020 at 2:36 AM Peter Zijlstra <pet...@infradead.org> wrote:
> >
> >
> > HJ, Nick,
> >
> > Any chance any of you can see a way to make your respective compilers
> > not emit utter junk for this?
> >
> > On Mon, May 04, 2020 at 10:14:45PM +0200, Peter Zijlstra wrote:
> >
> > > https://godbolt.org/z/SDRG2q
> 
> Woah, a godbolt link! Now we're speaking the same language.  What were
> you expecting?

Given the output for x86-64 clang (trunk)

        bar:                                    # @bar
                movl    %edi, .L_x$local(%rip)
                retq
        ponies:                                 # @ponies
                movq    .Lfoo$local(%rip), %rax
                testq   %rax, %rax
                movl    $__static_call_nop, %ecx
                cmovneq %rax, %rcx
                jmpq    *%rcx                   # TAILCALL
        __static_call_nop:                      # @__static_call_nop
                retq
        _x:
        .L_x$local:
                .long   0                       # 0x0

        foo:
        .Lfoo$local:
                .zero   8


I was hoping for:

        bar:                                    # @bar
                movl    %edi, .L_x$local(%rip)
                retq
        ponies:                                 # @ponies
                movq    .Lfoo$local(%rip), %rax
                testq   %rax, %rax
                jz      1f
                jmpq    *%rcx                   # TAILCALL
        1:
                retq

That avoids the indirect call (possible retpoline) and does an immediate
return.

So it does 2 things different:

 - it realizes the NULL case is a constant and uses an
   immediate call and avoids the indirect call/jmp.

 - it realizes __static_call_nop() is a no-op and avoids the call
   entirely and does an immediate return.

> Us to remove the conditional check that a volatile read
> wasn't NULL?

No, obviously the load is required, and the READ_ONCE() is so that the
compiler will not emit 2 different loads (just for giggles).

That is:

        tmp1 = name.func;
        if (!tmp) {
                tmp2 = name.func;
                tmp2(args);
        }

is a valid translation of:

        if (!name.func)
                name.func(args)

and allows for a NULL dereference (as noted by Rasmus).

What I did do want, per the above, is to avoid the indirect (tail) call.
Because indirect jmp/call are evil and expensive.

> I am simultaneously impressed
> and disgusted by this btw, cool stuff.

Yes, it's nasty, esp the casting of a function pointer like that is
gruesome.

Reply via email to