[v2: Addressed review comments. Fixed display problems and 
correctly compute IPC now. See patches for detailed changes.]
[v3: Merged with current Arnaldo perf/core and added acked-by.]

[Note the respective kernel patches to report cycles are in
peterz's perf/core queue, but so far not in tip. The patchkit
can be tested however with the "fake cycles" debug patch added at
the end]

The upcoming Skylake CPU has a new timed branch stack feature,
that reports cycle counts for individual branches in the
last branch record.

This allows to get fine grained cost information for code, and also allows
to compute fine grained IPC.

Available from
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git perf/skl-tools3

This patchkit adds support for this in the perf tools:
- Basic support for the cycles field like other branch fields
- Show cycles in the standard branch sort view (no IPC here,
  as IPC needs the instruction counts from annotation)
- Annotate cycles and IPC in the assembler annotate view
- Add branch support to top, so we can do live annotation.
- Misc support, like dumping it in perf report -D

Example output for annotate (with made up numbers):
    
The second column is the IPC and third average cycles for the basic block.

                   │    static int hex(char ch)                                 
                                                                      ▒
                   │    {                                                       
                                                                      ▒
        0.12       │      push   %rbp                                           
                                                                      ◆
        0.12       │      mov    %rsp,%rbp                                      
                                                                      ▒
        0.12       │      sub    $0x20,%rsp                                     
                                                                      ▒
        0.12       │      mov    %edi,%eax                                      
                                                                      ▒
        0.12       │      mov    %al,-0x14(%rbp)                                
                                                                      ▒
        0.12       │      mov    %fs:0x28,%rax                                  
                                                                      ▒
        0.12       │      mov    %rax,-0x8(%rbp)                                
                                                                      ▒
        0.12       │      xor    %eax,%eax                                      
                                                                      ▒
                   │            if ((ch >= '0') && (ch <= '9'))                 
                                                                      ▒
        0.12       │      cmpb   $0x2f,-0x14(%rbp)                              
                                                                      ▒
 66.67  0.12   123 │    ↓ jle    31                                             
                                                                      ▒
        0.12       │      cmpb   $0x39,-0x14(%rbp)                              
                                                                      ▒
        0.12   123 │    ↓ jg     31                                             
                                                                      ▒
                   │                    return ch - '0';                        
                                                                      ▒
 22.22  0.12       │      movsbl -0x14(%rbp),%eax                               
                                                                      ▒
        0.12       │      sub    $0x30,%eax                                     
                                                                      ▒
        0.12   123 │    ↓ jmp    60                                             
                                                                      ▒
                   │            if ((ch >= 'a') && (ch <= 'f'))                 
                                                                      ▒
        0.06       │31:   cmpb   $0x60,-0x14(%rbp)                              
                                                                      ▒
        0.06   123 │    ↓ jle    46                                             
                                                                      ▒
        0.06       │      cmpb   $0x66,-0x14(%rbp)                              
                                                                      ▒
        0.06       │    ↓ jg     46                                             
                                                                      ▒
                   │                    return ch - 'a' + 10;                   
                                                                      ▒
        0.06       │      movsbl -0x14(%rbp),%eax                               
  

Example output for branch view (again with fake data):

Overhead  Command  Source Shared Object  Source Symbol                          
     Target Symbol                               Basic Block Cycles   ◆
  30.08%  tcall    tcall                 [.] f1                                 
     [.] f2                                      123                  ▒
  27.44%  tcall    tcall                 [.] f2                                 
     [.] f1                                      123                  ▒
  15.60%  tcall    tcall                 [.] main                               
     [.] f1                                      123                  ▒
  12.96%  tcall    tcall                 [.] f1                                 
     [.] main                                    123                  ▒
  12.86%  tcall    tcall                 [.] main                               
     [.] main                                    123                  ▒
   0.08%  tcall    [kernel.kallsyms]     [k] hrtimer_interrupt                  
     [k] hrtimer_interrupt                       123             

IPC computation has a few limitations (see the comments in the respective 
patches),
in particular it punts on overlaping basic blocks.

The annotation only works for the interactive annotation. Currently it is not
working in the scripted perf annotate, as that is missing a lot of the
infrastructure needed for per instruction state.

It would be nice to add column headers to annotate.

So far no support in --branch-history or in perf script.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to