On Thu, Feb 23, 2017 at 06:14:45AM -0800, Arjan van de Ven wrote:
> On 2/23/2017 5:28 AM, Peter Zijlstra wrote:
> >
> >By using "UD0" for WARNs we remove the function call and its possible
> >__FILE__ and __LINE__ immediate arguments from the instruction stream.
> >
> >Total image size will not change much, what we win in the instruction
> >stream we'll loose because of the __bug_table entries. Still, saves on
> >I$ footprint and the total image size does go down a bit.
> 
> well I am a little sceptical; WARNs are rare so the code (other than the test)
> should be waaay out of line already (unlikely() and co).

There's only so much you can do in small functions. Sure it tries to
move the crud to the end, but at the end of the day, its still in the
same function.

> And I assume you're not removing the __FILE__ and __LINE__ info, since that 
> info
> is actually high value for us developers... so what are you actually saving?

OK, so going by my own numbers the total image size does not in fact go
down (it did earlier when I initially wrote this patch) :/

I think back when I wrote this I had refcount_t generate inline WARNs
and that generates a huge amount of junk all over the place. This very
much did clear much of that up.

And as said; there's only so much you can do in small functions.

Look at the below for example (frobbed the code to do WARN_ON instead of
WARN), depending on alignment the WARN code will be in the same
cacheline as 'normal' code.


0000000000000016 <refcount_add_not_zero>:
  16:   55                      push   %rbp
  17:   8b 16                   mov    (%rsi),%edx
  19:   41 83 c8 ff             or     $0xffffffff,%r8d
  1d:   48 89 e5                mov    %rsp,%rbp
  20:   85 d2                   test   %edx,%edx
  22:   74 25                   je     49 <refcount_add_not_zero+0x33>
  24:   83 fa ff                cmp    $0xffffffff,%edx
  27:   74 1c                   je     45 <refcount_add_not_zero+0x2f>
  29:   89 d1                   mov    %edx,%ecx
  2b:   89 d0                   mov    %edx,%eax
  2d:   01 f9                   add    %edi,%ecx
  2f:   41 0f 42 c8             cmovb  %r8d,%ecx
  33:   f0 0f b1 0e             lock cmpxchg %ecx,(%rsi)
  37:   39 c2                   cmp    %eax,%edx
  39:   74 04                   je     3f <refcount_add_not_zero+0x29>
  3b:   89 c2                   mov    %eax,%edx
  3d:   eb e1                   jmp    20 <refcount_add_not_zero+0xa>
  3f:   ff c1                   inc    %ecx
  41:   75 02                   jne    45 <refcount_add_not_zero+0x2f>
  43:   0f ff                   (bad)  
  45:   b0 01                   mov    $0x1,%al
  47:   eb 02                   jmp    4b <refcount_add_not_zero+0x35>
  49:   31 c0                   xor    %eax,%eax
  4b:   5d                      pop    %rbp
  4c:   c3                      retq  



0000000000000016 <refcount_add_not_zero>:
  16:   8b 16                   mov    (%rsi),%edx
  18:   41 83 c8 ff             or     $0xffffffff,%r8d
  1c:   85 d2                   test   %edx,%edx
  1e:   74 3b                   je     5b <refcount_add_not_zero+0x45>
  20:   83 fa ff                cmp    $0xffffffff,%edx
  23:   75 03                   jne    28 <refcount_add_not_zero+0x12>
  25:   b0 01                   mov    $0x1,%al
  27:   c3                      retq   
  28:   89 d1                   mov    %edx,%ecx
  2a:   89 d0                   mov    %edx,%eax
  2c:   01 f9                   add    %edi,%ecx
  2e:   41 0f 42 c8             cmovb  %r8d,%ecx
  32:   f0 0f b1 0e             lock cmpxchg %ecx,(%rsi)
  36:   39 c2                   cmp    %eax,%edx
  38:   74 04                   je     3e <refcount_add_not_zero+0x28>
  3a:   89 c2                   mov    %eax,%edx
  3c:   eb de                   jmp    1c <refcount_add_not_zero+0x6>
  3e:   ff c1                   inc    %ecx
  40:   75 e3                   jne    25 <refcount_add_not_zero+0xf>
  42:   55                      push   %rbp
  43:   be 3f 00 00 00          mov    $0x3f,%esi
  48:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        4b: R_X86_64_32S        .rodata.str1.1
  4f:   48 89 e5                mov    %rsp,%rbp
  52:   e8 00 00 00 00          callq  57 <refcount_add_not_zero+0x41>
                        53: R_X86_64_PC32       warn_slowpath_null-0x4
  57:   b0 01                   mov    $0x1,%al
  59:   5d                      pop    %rbp
  5a:   c3                      retq   
  5b:   31 c0                   xor    %eax,%eax
  5d:   c3                      retq   

Reply via email to