On Thu, Feb 23, 2017 at 06:14:45AM -0800, Arjan van de Ven wrote: > On 2/23/2017 5:28 AM, Peter Zijlstra wrote: > > > >By using "UD0" for WARNs we remove the function call and its possible > >__FILE__ and __LINE__ immediate arguments from the instruction stream. > > > >Total image size will not change much, what we win in the instruction > >stream we'll loose because of the __bug_table entries. Still, saves on > >I$ footprint and the total image size does go down a bit. > > well I am a little sceptical; WARNs are rare so the code (other than the test) > should be waaay out of line already (unlikely() and co).
There's only so much you can do in small functions. Sure it tries to move the crud to the end, but at the end of the day, its still in the same function. > And I assume you're not removing the __FILE__ and __LINE__ info, since that > info > is actually high value for us developers... so what are you actually saving? OK, so going by my own numbers the total image size does not in fact go down (it did earlier when I initially wrote this patch) :/ I think back when I wrote this I had refcount_t generate inline WARNs and that generates a huge amount of junk all over the place. This very much did clear much of that up. And as said; there's only so much you can do in small functions. Look at the below for example (frobbed the code to do WARN_ON instead of WARN), depending on alignment the WARN code will be in the same cacheline as 'normal' code. 0000000000000016 <refcount_add_not_zero>: 16: 55 push %rbp 17: 8b 16 mov (%rsi),%edx 19: 41 83 c8 ff or $0xffffffff,%r8d 1d: 48 89 e5 mov %rsp,%rbp 20: 85 d2 test %edx,%edx 22: 74 25 je 49 <refcount_add_not_zero+0x33> 24: 83 fa ff cmp $0xffffffff,%edx 27: 74 1c je 45 <refcount_add_not_zero+0x2f> 29: 89 d1 mov %edx,%ecx 2b: 89 d0 mov %edx,%eax 2d: 01 f9 add %edi,%ecx 2f: 41 0f 42 c8 cmovb %r8d,%ecx 33: f0 0f b1 0e lock cmpxchg %ecx,(%rsi) 37: 39 c2 cmp %eax,%edx 39: 74 04 je 3f <refcount_add_not_zero+0x29> 3b: 89 c2 mov %eax,%edx 3d: eb e1 jmp 20 <refcount_add_not_zero+0xa> 3f: ff c1 inc %ecx 41: 75 02 jne 45 <refcount_add_not_zero+0x2f> 43: 0f ff (bad) 45: b0 01 mov $0x1,%al 47: eb 02 jmp 4b <refcount_add_not_zero+0x35> 49: 31 c0 xor %eax,%eax 4b: 5d pop %rbp 4c: c3 retq 0000000000000016 <refcount_add_not_zero>: 16: 8b 16 mov (%rsi),%edx 18: 41 83 c8 ff or $0xffffffff,%r8d 1c: 85 d2 test %edx,%edx 1e: 74 3b je 5b <refcount_add_not_zero+0x45> 20: 83 fa ff cmp $0xffffffff,%edx 23: 75 03 jne 28 <refcount_add_not_zero+0x12> 25: b0 01 mov $0x1,%al 27: c3 retq 28: 89 d1 mov %edx,%ecx 2a: 89 d0 mov %edx,%eax 2c: 01 f9 add %edi,%ecx 2e: 41 0f 42 c8 cmovb %r8d,%ecx 32: f0 0f b1 0e lock cmpxchg %ecx,(%rsi) 36: 39 c2 cmp %eax,%edx 38: 74 04 je 3e <refcount_add_not_zero+0x28> 3a: 89 c2 mov %eax,%edx 3c: eb de jmp 1c <refcount_add_not_zero+0x6> 3e: ff c1 inc %ecx 40: 75 e3 jne 25 <refcount_add_not_zero+0xf> 42: 55 push %rbp 43: be 3f 00 00 00 mov $0x3f,%esi 48: 48 c7 c7 00 00 00 00 mov $0x0,%rdi 4b: R_X86_64_32S .rodata.str1.1 4f: 48 89 e5 mov %rsp,%rbp 52: e8 00 00 00 00 callq 57 <refcount_add_not_zero+0x41> 53: R_X86_64_PC32 warn_slowpath_null-0x4 57: b0 01 mov $0x1,%al 59: 5d pop %rbp 5a: c3 retq 5b: 31 c0 xor %eax,%eax 5d: c3 retq