Em Mon, Jun 20, 2016 at 12:16:55PM -0600, David Ahern escreveu:
> On 6/20/16 12:13 PM, Arnaldo Carvalho de Melo wrote:
> > 'perf cc' seems sensible, and has the added bonus of being one letter
> > shorter :-)
 
> perf is now a general front-end to a compiler?

Well, it is for quite a while already, what we're talking about here is
to have this:

  # cat filter.c 
  #include <uapi/linux/bpf.h>
  #define SEC(NAME) __attribute__((section(NAME), used))

  SEC("func=hrtimer_nanosleep rqtp->tv_nsec")
  int func(void *ctx, int err, long nsec)
  {
        return nsec > 1000;
  }
  char _license[] SEC("license") = "GPL";
  int _version SEC("version") = LINUX_VERSION_CODE;
  # perf trace -e nanosleep --event filter.c usleep 1
     0.063 ( 0.063 ms): usleep/8041 nanosleep(rqtp: 0x7fff62bead80) = 0
  # perf trace -e nanosleep --event filter.c usleep 2
     0.008 ( 0.008 ms): usleep/8325 nanosleep(rqtp: 0x7ffc2afdf3b0) ...
     0.008 (         ): perf_bpf_probe:func:(ffffffff811137d0) tv_nsec=2000)
     0.070 ( 0.070 ms): usleep/8325  ... [continued]: nanosleep()) = 0
  # 

To not cal the clang compiler under the hood all the time, i.e.
pre-building the .o file that will then be used when present.

What Wang did was to make that possible by adding this to ~/.perfconfig:

  # cat ~/.perfconfig 
  [llvm]
        dump-obj = true
  # 

This way, when we run we get:

  # trace -e nanosleep --event filter.c usleep 6
  LLVM: dumpping filter.o
     0.008 ( 0.008 ms): usleep/9189 nanosleep(rqtp: 0x7fff97a704d0              
                          ) ...
     0.008 (         ): perf_bpf_probe:func:(ffffffff811137d0) tv_nsec=6000)
     0.070 ( 0.070 ms): usleep/9189  ... [continued]: nanosleep()) = 0
  #
  # file filter.o
  filter.o: ELF 64-bit LSB relocatable, no machine, version 1 (SYSV), not 
stripped
  # readelf -SW filter.o
  There are 7 section headers, starting at offset 0x148:

  Section Headers:
    [Nr] Name              Type            Address          Off    Size   ES 
Flg Lk Inf Al
    [ 0]                   NULL            0000000000000000 000000 000000 00    
  0   0  0
    [ 1] .strtab           STRTAB          0000000000000000 0000e8 00005a 00    
  0   0  1
    [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  
AX  0   0  4
    [ 3] func=hrtimer_nanosleep rqtp->tv_nsec PROGBITS        0000000000000000 
000040 000028 00  AX  0   0  8
    [ 4] license           PROGBITS        0000000000000000 000068 000004 00  
WA  0   0  1
    [ 5] version           PROGBITS        0000000000000000 00006c 000004 00  
WA  0   0  4
    [ 6] .symtab           SYMTAB          0000000000000000 000070 000078 18    
  1   2  8
  Key to Flags:
    W (write), A (alloc), X (execute), M (merge), S (strings)
    I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
    O (extra OS processing required) o (OS specific), p (processor specific)
  #

Generating this .o file explicitely and then, when found and somehow checked
that it matches what is in filter.c, shortcircuit the process bypassing the
clang call and using filter.o directly.

This will remove the need for having clang in embedded systems, for instance,
and will speed up using eBPF scripts with perf.

- Arnaldo

Reply via email to