On Thu, 06 Feb 2014 12:39:14 -0500
Steven Rostedt <rost...@goodmis.org> wrote:

> From: Steven Rostedt <srost...@redhat.com>
> 
> The functions that assign the contents for the perf software events are
> defined by the TRACE_EVENT() macros. Each event has its own unique
> way to assign data to its buffer. When you have over 500 events,
> that means there's 500 functions assigning data uniquely for each
> event.
> 
> By making helper functions in the core kernel to do the work
> instead, we can shrink the size of the kernel down a bit.
> 
> With a kernel configured with 707 events, the change in size was:
> 
>    text    data     bss     dec     hex filename
> 12959102        1913504 9785344 24657950        178401e /tmp/vmlinux
> 12917629        1913568 9785344 24616541        1779e5d /tmp/vmlinux.patched
> 
> That's a total of 41473 bytes, which comes down to 82 bytes per event.
> 
> Note, most of the savings comes from moving the setup and final submit
> into helper functions, where the setup does the work and stores the
> data into a structure, and that structure is passed to the submit function,
> moving the setup of the parameters of perf_trace_buf_submit().
> 
> Link: http://lkml.kernel.org/r/20120810034708.589220...@goodmis.org
> 
> Cc: Peter Zijlstra <a.p.zijls...@chello.nl>
> Cc: Frederic Weisbecker <fweis...@gmail.com>

Peter, Frederic,

Can you give an ack to this. Peter, you pretty much gave you ack before
except for one nit:

http://marc.info/?l=linux-kernel&m=134484533217124&w=2

> Signed-off-by: Steven Rostedt <rost...@goodmis.org>
> ---
>  include/linux/ftrace_event.h    | 17 ++++++++++++++
>  include/trace/ftrace.h          | 33 ++++++++++----------------
>  kernel/trace/trace_event_perf.c | 51 
> +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 80 insertions(+), 21 deletions(-)
> 

> +
> +/**
> + * perf_trace_event_submit - submit from perf sw event
> + * @pe: perf event structure that holds all the necessary data
> + *
> + * This is a helper function that removes a lot of the setting up of
> + * the function parameters to call perf_trace_buf_submit() from the
> + * inlined code. Using the perf event structure @pe to store the
> + * information passed from perf_trace_event_setup() keeps the overhead
> + * of building the function call paremeters out of the inlined functions.
> + */
> +void perf_trace_event_submit(struct perf_trace_event *pe)
> +{
> +     perf_trace_buf_submit(pe->entry, pe->entry_size, pe->rctx, pe->addr,
> +                           pe->count, &pe->regs, pe->head, pe->task);
> +}
> +EXPORT_SYMBOL_GPL(perf_trace_event_submit);
> +

You wanted the perf_trace_buf_submit() to go away. Now I could do that,
bu that would require all other users to use the new perf_trace_event
structure to pass in. The only reason I did that was because this
structure is set up in perf_trace_event_setup() which passes in only
the event_call and the pe structure. In the setup function, the pe
structure is assigned all the information required for
perf_trace_event_submit().

What this does is to remove the function parameter setup from the
inlined tracepoint callers, which is quite a lot!

This is what a perf tracepoint currently looks like:

0000000000000b44 <perf_trace_sched_pi_setprio>:
     b44:       55                      push   %rbp
     b45:       48 89 e5                mov    %rsp,%rbp
     b48:       41 56                   push   %r14
     b4a:       41 89 d6                mov    %edx,%r14d
     b4d:       41 55                   push   %r13
     b4f:       49 89 fd                mov    %rdi,%r13
     b52:       41 54                   push   %r12
     b54:       49 89 f4                mov    %rsi,%r12
     b57:       53                      push   %rbx
     b58:       48 81 ec c0 00 00 00    sub    $0xc0,%rsp
     b5f:       48 8b 9f 80 00 00 00    mov    0x80(%rdi),%rbx
     b66:       e8 00 00 00 00          callq  b6b 
<perf_trace_sched_pi_setprio+0x27>
                        b67: R_X86_64_PC32      debug_smp_processor_id-0x4
     b6b:       89 c0                   mov    %eax,%eax
     b6d:       48 03 1c c5 00 00 00    add    0x0(,%rax,8),%rbx
     b74:       00 
                        b71: R_X86_64_32S       __per_cpu_offset
     b75:       48 83 3b 00             cmpq   $0x0,(%rbx)
     b79:       0f 84 92 00 00 00       je     c11 
<perf_trace_sched_pi_setprio+0xcd>
     b7f:       48 8d bd 38 ff ff ff    lea    -0xc8(%rbp),%rdi
     b86:       e8 ab fe ff ff          callq  a36 <perf_fetch_caller_regs>
     b8b:       41 8b 75 40             mov    0x40(%r13),%esi
     b8f:       48 8d 8d 34 ff ff ff    lea    -0xcc(%rbp),%rcx
     b96:       48 8d 95 38 ff ff ff    lea    -0xc8(%rbp),%rdx
     b9d:       bf 24 00 00 00          mov    $0x24,%edi
     ba2:       81 e6 ff ff 00 00       and    $0xffff,%esi
     ba8:       e8 00 00 00 00          callq  bad 
<perf_trace_sched_pi_setprio+0x69>
                        ba9: R_X86_64_PC32      perf_trace_buf_prepare-0x4
     bad:       48 85 c0                test   %rax,%rax
     bb0:       74 5f                   je     c11 
<perf_trace_sched_pi_setprio+0xcd>
     bb2:       49 8b 94 24 b0 04 00    mov    0x4b0(%r12),%rdx
     bb9:       00 
     bba:       4c 8d 85 38 ff ff ff    lea    -0xc8(%rbp),%r8
     bc1:       49 89 d9                mov    %rbx,%r9
     bc4:       b9 24 00 00 00          mov    $0x24,%ecx
     bc9:       be 01 00 00 00          mov    $0x1,%esi
     bce:       31 ff                   xor    %edi,%edi
     bd0:       48 89 50 08             mov    %rdx,0x8(%rax)
     bd4:       49 8b 94 24 b8 04 00    mov    0x4b8(%r12),%rdx
     bdb:       00 
     bdc:       48 89 50 10             mov    %rdx,0x10(%rax)
     be0:       41 8b 94 24 0c 03 00    mov    0x30c(%r12),%edx
     be7:       00 
     be8:       89 50 18                mov    %edx,0x18(%rax)
     beb:       41 8b 54 24 50          mov    0x50(%r12),%edx
     bf0:       44 89 70 20             mov    %r14d,0x20(%rax)
     bf4:       89 50 1c                mov    %edx,0x1c(%rax)
     bf7:       8b 95 34 ff ff ff       mov    -0xcc(%rbp),%edx
     bfd:       48 c7 44 24 08 00 00    movq   $0x0,0x8(%rsp)
     c04:       00 00 
     c06:       89 14 24                mov    %edx,(%rsp)
     c09:       48 89 c2                mov    %rax,%rdx
     c0c:       e8 00 00 00 00          callq  c11 
<perf_trace_sched_pi_setprio+0xcd>
                        c0d: R_X86_64_PC32      perf_tp_event-0x4
     c11:       48 81 c4 c0 00 00 00    add    $0xc0,%rsp
     c18:       5b                      pop    %rbx
     c19:       41 5c                   pop    %r12
     c1b:       41 5d                   pop    %r13
     c1d:       41 5e                   pop    %r14
     c1f:       5d                      pop    %rbp
     c20:       c3                      retq   


This is what it looks like after this patch:

0000000000000ab1 <perf_trace_sched_pi_setprio>:
     ab1:       55                      push   %rbp
     ab2:       48 89 e5                mov    %rsp,%rbp
     ab5:       41 54                   push   %r12
     ab7:       41 89 d4                mov    %edx,%r12d
     aba:       53                      push   %rbx
     abb:       48 89 f3                mov    %rsi,%rbx
     abe:       48 8d b5 08 ff ff ff    lea    -0xf8(%rbp),%rsi
     ac5:       48 81 ec f0 00 00 00    sub    $0xf0,%rsp
     acc:       48 c7 45 b8 00 00 00    movq   $0x0,-0x48(%rbp)
     ad3:       00 
     ad4:       c7 45 e8 01 00 00 00    movl   $0x1,-0x18(%rbp)
     adb:       c7 45 e0 24 00 00 00    movl   $0x24,-0x20(%rbp)
     ae2:       48 c7 45 d0 00 00 00    movq   $0x0,-0x30(%rbp)
     ae9:       00 
     aea:       48 c7 45 d8 01 00 00    movq   $0x1,-0x28(%rbp)
     af1:       00 
     af2:       e8 00 00 00 00          callq  af7 
<perf_trace_sched_pi_setprio+0x46>
                        af3: R_X86_64_PC32      perf_trace_event_setup-0x4
     af7:       48 85 c0                test   %rax,%rax
     afa:       74 35                   je     b31 
<perf_trace_sched_pi_setprio+0x80>
     afc:       48 8b 93 b0 04 00 00    mov    0x4b0(%rbx),%rdx
     b03:       48 8d bd 08 ff ff ff    lea    -0xf8(%rbp),%rdi
     b0a:       48 89 50 08             mov    %rdx,0x8(%rax)
     b0e:       48 8b 93 b8 04 00 00    mov    0x4b8(%rbx),%rdx
     b15:       48 89 50 10             mov    %rdx,0x10(%rax)
     b19:       8b 93 0c 03 00 00       mov    0x30c(%rbx),%edx
     b1f:       89 50 18                mov    %edx,0x18(%rax)
     b22:       8b 53 50                mov    0x50(%rbx),%edx
     b25:       44 89 60 20             mov    %r12d,0x20(%rax)
     b29:       89 50 1c                mov    %edx,0x1c(%rax)
     b2c:       e8 00 00 00 00          callq  b31 
<perf_trace_sched_pi_setprio+0x80>
                        b2d: R_X86_64_PC32      perf_trace_event_submit-0x4
     b31:       48 81 c4 f0 00 00 00    add    $0xf0,%rsp
     b38:       5b                      pop    %rbx
     b39:       41 5c                   pop    %r12
     b3b:       5d                      pop    %rbp
     b3c:       c3                      retq   


Thus, it's not really just a wrapper function, but a function that is
paired with the tracepoint setup version.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to