This patch set is to sort cache line for all load operations which hit
any cache levels.  For single cache line view, it shows the load
references for loads with cache hits and with cache misses respectively.

This series is a following for the old patch set "perf c2c: Sort
cacheline with LLC load" [1], in the old patch set it tries to sort
cache line with the load operations in last level cache (LLC), after
testing we found the trace data doesn't contain LLC events if the
platform isn't a NUMA system.  For this reason, this series refines the
implementation to sort on all cache levels hits of load operations; it's
reasonable for us to review the load and store opreations, if detects
any cache line is accessed by multi-threads, this hints that the cache
line is possible for false sharing.

This patch set is clearly applied on perf/core branch with the latest
commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
evlist' record methods").  And the changes has been tested on x86 and
Arm64, the testing result is shown as below.

The testing result on x86:

  # perf c2c record -- false_sharing.exe 2
  # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table
  =================================================
  #
  #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    
Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  
- RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt       Pct     Total  records    
Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm  
  RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  ........  ........  .......  
.......  .......  .......  .......  .......  .......  .......  ........  
.......  ........  .......  ........  ........
  #
        0      0x556f25dff100     0    1895    75.73%      4591     7840     
4591     3249     2633      616      849     2734       67        58      883   
      0        0         0         0
        1      0x556f25dff080     0       1    13.10%       794      794      
794        0        0        0      164      486       28        20       96    
     0        0         0         0
        2      0x556f25dff0c0     0       1    10.01%       607      607      
607        0        0        0      107        5        5       488        2    
     0        0         0         0

  =================================================
        Shared Cache Line Distribution Pareto
  =================================================
  #
  #        --  Load Refs --  -- Store Refs --  --------- Data address --------- 
                                                  ---------- cycles ----------  
  Total       cpu                                  Shared
  #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt 
     Pid                 Tid        Code address  rmt hitm  lcl hitm      load  
records       cnt               Symbol             Object                  
Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ...... 
 .......  ..................  ..................  ........  ........  ........  
.......  ........  ...................  .................  
...........................  ....
  #
    -------------------------------------------------------------
        0     4591        0     2633      616      0x556f25dff100
    -------------------------------------------------------------
            20.52%    0.00%    0.00%    0.00%                 0x0     0       1 
   28079    28082:lock_th         0x556f25bfdc1d         0      2200      1276  
    942         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
            19.82%    0.00%   38.06%    0.00%                 0x0     0       1 
   28079    28082:lock_th         0x556f25bfdc16         0      2190      1130  
   1912         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   0
            18.25%    0.00%   56.63%    0.00%                 0x0     0       1 
   28079    28081:lock_th         0x556f25bfdc16         0      2173      1074  
   2329         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   0
            18.23%    0.00%    0.00%    0.00%                 0x0     0       1 
   28079    28081:lock_th         0x556f25bfdc1d         0      2013      1220  
    837         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
             0.00%    0.00%    3.11%   59.90%                 0x0     0       1 
   28079    28081:lock_th         0x556f25bfdc28         0         0         0  
    451         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
             0.00%    0.00%    2.20%   40.10%                 0x0     0       1 
   28079    28082:lock_th         0x556f25bfdc28         0         0         0  
    305         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
            12.00%    0.00%    0.00%    0.00%                0x20     0       1 
   28079    28083:reader_thd      0x556f25bfdc73         0       159       107  
    551         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:155   0
            11.17%    0.00%    0.00%    0.00%                0x20     0       1 
   28079    28084:reader_thd      0x556f25bfdc73         0       148       108  
    513         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:155   0

  [...]


The testing result on Arm64 (Hisilicon D06); please note, the Arm SPE
data source patch set has not been merged into the mainline kernel and
a potential issue for store operations is working in progress, so the
final outputting result might have minor differences.

  # perf c2c record -- false_sharing.exe 2
  # perf c2c report -d all --coalesce tid,pid,iaddr,dso --stdio

  [...]

  =================================================
             Shared Data Cache Line Table          
  =================================================
  #
  #        ----------- Cacheline ----------  Load Hit  Load Hit    Total    
Total    Total  ---- Stores ----  ----- Core Load Hit -----  - LLC Load Hit --  
- RMT Load Hit --  --- Load Dram ----
  # Index             Address  Node  PA cnt       Pct     Total  records    
Loads   Stores    L1Hit   L1Miss       FB       L1       L2    LclHit  LclHitm  
  RmtHit  RmtHitm       Lcl       Rmt
  # .....  ..................  ....  ......  ........  ........  .......  
.......  .......  .......  .......  .......  .......  .......  ........  
.......  ........  .......  ........  ........
  #
        0      0xaaaab4e8b100   N/A       0    35.04%    100447   104933   
100447     4486     4486        0        0    11269        0     89178        0 
        0        0         0         0
        1      0xaaaab4e8af80   N/A       0    17.29%     49571    49571    
49571        0        0        0        0    49571        0         0        0  
       0        0         0         0
        2      0xaaaab4e8afc0   N/A       0    16.72%     47922    47922    
47922        0        0        0        0    47922        0         0        0  
       0        0         0         0
        3      0xaaaab4e8b080   N/A       0     8.94%     25641    67718    
25641    42077    42077        0        0     4397        0     21244        0  
       0        0         0         0
        4      0xaaaab4e7a480   N/A       0     4.42%     12680    12680    
12680        0        0        0        0    12680        0         0        0  
       0        0         0         0
        5      0xffffa2ffc980   N/A       0     2.62%      7511     7511     
7511        0        0        0        0     7511        0         0        0   
      0        0         0         0
        6      0xffffa3ffe980   N/A       0     2.57%      7374     7374     
7374        0        0        0        0     7374        0         0        0   
      0        0         0         0
        7      0xaaaab4e8b000   N/A       0     2.41%      6907     6907     
6907        0        0        0        0     6907        0         0        0   
      0        0         0         0
        8      0xaaaab4e8b0c0   N/A       0     2.30%      6592     6592     
6592        0        0        0        0     2822        0      3770        0   
      0        0         0         0
        9      0xffffa37fd980   N/A       0     2.24%      6408     6408     
6408        0        0        0        0     6408        0         0        0   
      0        0         0         0
       10      0xffffb8d80980   N/A       0     2.18%      6254     6254     
6254        0        0        0        0     6254        0         0        0   
      0        0         0         0
       11      0xffffb9d82980   N/A       0     1.31%      3763     9706     
3763     5943     5943        0        0     3763        0         0        0   
      0        0         0         0
       12      0xffffb9581980   N/A       0     1.22%      3507    11484     
3507     7977     7977        0        0     3507        0         0        0   
      0        0         0         0
       13      0xffffbad84980   N/A       0     0.33%       932     7766      
932     6834     6834        0        0      932        0         0        0    
     0        0         0         0
       14      0xffffba583980   N/A       0     0.24%       700     6503      
700     5803     5803        0        0      700        0         0        0    
     0        0         0         0
  
  =================================================
        Shared Cache Line Distribution Pareto      
  =================================================
  #
  #        --  Load Refs --  -- Store Refs --  --------- Data address --------- 
                                                  ---------- cycles ----------  
  Total       cpu                                  Shared                       
            
  #   Num      Hit     Miss   L1 Hit  L1 Miss              Offset  Node  PA cnt 
     Pid                 Tid        Code address  rmt hitm  lcl hitm      load  
records       cnt               Symbol             Object                  
Source:Line  Node
  # .....  .......  .......  .......  .......  ..................  ....  ...... 
 .......  ..................  ..................  ........  ........  ........  
.......  ........  ...................  .................  
...........................  ....
  #
    -------------------------------------------------------------
        0   100447        0     4486        0      0xaaaab4e8b100
    -------------------------------------------------------------
            15.44%    0.00%    0.00%    0.00%                 0x0   N/A       0 
   15046    15049:lock_th         0xaaaab4e79dd0         0         0         0  
  15508         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   1
            14.43%    0.00%    0.00%    0.00%                 0x0   N/A       0 
   15046    15048:lock_th         0xaaaab4e79dd0         0         0         0  
  14499         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
            11.57%    0.00%    0.00%    0.00%                 0x0   N/A       0 
   15046    15048:lock_th         0xaaaab4e79db8         0         0         0  
  11622         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   0
            11.38%    0.00%    0.00%    0.00%                 0x0   N/A       0 
   15046    15050:lock_th         0xaaaab4e79dd0         0         0         0  
  11429         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   2
            10.57%    0.00%    0.00%    0.00%                 0x0   N/A       0 
   15046    15051:lock_th         0xaaaab4e79dd0         0         0         0  
  10614         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   3
             9.69%    0.00%    0.00%    0.00%                 0x0   N/A       0 
   15046    15049:lock_th         0xaaaab4e79db8         0         0         0  
   9731         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   1
             5.74%    0.00%    0.00%    0.00%                 0x0   N/A       0 
   15046    15050:lock_th         0xaaaab4e79db8         0         0         0  
   5763         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   2
             4.84%    0.00%    0.00%    0.00%                 0x0   N/A       0 
   15046    15051:lock_th         0xaaaab4e79db8         0         0         0  
   4866         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   3
             0.00%    0.00%   14.02%    0.00%                 0x0   N/A       0 
   15046    15048:lock_th         0xaaaab4e79dbc         0         0         0  
    629         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   0
             0.00%    0.00%    6.44%    0.00%                 0x0   N/A       0 
   15046    15048:lock_th         0xaaaab4e79de0         0         0         0  
    289         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   0
             0.00%    0.00%   12.37%    0.00%                 0x0   N/A       0 
   15046    15049:lock_th         0xaaaab4e79dbc         0         0         0  
    555         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   1
             0.00%    0.00%    6.46%    0.00%                 0x0   N/A       0 
   15046    15049:lock_th         0xaaaab4e79de0         0         0         0  
    290         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   1
             0.00%    0.00%   21.38%    0.00%                 0x0   N/A       0 
   15046    15050:lock_th         0xaaaab4e79dbc         0         0         0  
    959         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   2
             0.00%    0.00%    9.61%    0.00%                 0x0   N/A       0 
   15046    15050:lock_th         0xaaaab4e79de0         0         0         0  
    431         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   2
             0.00%    0.00%   22.14%    0.00%                 0x0   N/A       0 
   15046    15051:lock_th         0xaaaab4e79dbc         0         0         0  
    993         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:145   3
             0.00%    0.00%    7.58%    0.00%                 0x0   N/A       0 
   15046    15051:lock_th         0xaaaab4e79de0         0         0         0  
    340         2  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:146   3
             6.66%    0.00%    0.00%    0.00%                0x20   N/A       0 
   15046    15054:reader_thd      0xaaaab4e79e54         0         0         0  
   6687         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:155   2
             3.76%    0.00%    0.00%    0.00%                0x28   N/A       0 
   15046    15052:reader_thd      0xaaaab4e79e80         0         0         0  
   3774         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:159   0
             3.54%    0.00%    0.00%    0.00%                0x28   N/A       0 
   15046    15055:reader_thd      0xaaaab4e79e80         0         0         0  
   3551         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:159   3
             2.39%    0.00%    0.00%    0.00%                0x30   N/A       0 
   15046    15053:reader_thd      0xaaaab4e79eac         0         0         0  
   2403         1  [.] read_write_func  false_sharing.exe  
false_sharing_example.c:163   1


  [...]

Changes from v1:
* Changed from sorting on LLC to sorting on all loads with cache hits;
* Added patches 06/11, 07/11 for refactoring macros;
* Added patch 08/11 for refactoring node header, so can display "%loads"
  rather than "%hitms" in the header;
* Added patch 09/11 to add local pointers for pointing to output metrics
  string and sort string (Juri);
* Added warning in percent_hitm() for the display "all", which should
  never happen (Juri).

[1] https://lore.kernel.org/patchwork/cover/1321514/


Leo Yan (11):
  perf c2c: Add dimensions for total load hit
  perf c2c: Add dimensions for load hit
  perf c2c: Add dimensions for load miss
  perf c2c: Rename for shared cache line stats
  perf c2c: Refactor hist entry validation
  perf c2c: Refactor display filter macro
  perf c2c: Refactor node display macro
  perf c2c: Refactor node header
  perf c2c: Add local variables for output metrics
  perf c2c: Sort on all cache hit for load operations
  perf c2c: Update documentation for display option 'all'

 tools/perf/Documentation/perf-c2c.txt |  21 +-
 tools/perf/builtin-c2c.c              | 548 ++++++++++++++++++++++----
 2 files changed, 487 insertions(+), 82 deletions(-)

-- 
2.17.1

Reply via email to