Peter Zijlstra <pet...@infradead.org> writes: > On Wed, Sep 08, 2021 at 05:17:53PM +1000, Michael Ellerman wrote: >> Kajol Jain <kj...@linux.ibm.com> writes: > >> > diff --git a/include/uapi/linux/perf_event.h >> > b/include/uapi/linux/perf_event.h >> > index f92880a15645..030b3e990ac3 100644 >> > --- a/include/uapi/linux/perf_event.h >> > +++ b/include/uapi/linux/perf_event.h >> > @@ -1265,7 +1265,9 @@ union perf_mem_data_src { >> > #define PERF_MEM_LVLNUM_L2 0x02 /* L2 */ >> > #define PERF_MEM_LVLNUM_L3 0x03 /* L3 */ >> > #define PERF_MEM_LVLNUM_L4 0x04 /* L4 */ >> > -/* 5-0xa available */ >> > +#define PERF_MEM_LVLNUM_OC_L2 0x05 /* On Chip L2 */ >> > +#define PERF_MEM_LVLNUM_OC_L3 0x06 /* On Chip L3 */ >> >> The obvious use for 5 is for "L5" and so on. >> >> I'm not sure adding new levels is the best idea, because these don't fit >> neatly into the hierarchy, they are off to the side. >> >> >> I wonder if we should use the remote field. >> >> ie. for another core's L2 we set: >> >> mem_lvl = PERF_MEM_LVL_L2 >> mem_remote = 1 > > This mixes APIs (see below), IIUC the correct usage would be something > like: lvl_num=L2 remote=1
Aha, I was wondering how lvl and lvl_num were supposed to interact. >> Which would mean "remote L2", but not remote enough to be >> lvl = PERF_MEM_LVL_REM_CCE1. >> >> It would be printed by the existing tools/perf code as "Remote L2", vs >> "Remote cache (1 hop)", which seems OK. >> >> >> ie. we'd be able to express: >> >> Current core's L2: LVL_L2 >> Other core's L2: LVL_L2 | REMOTE >> Other chip's L2: LVL_REM_CCE1 | REMOTE >> >> And similarly for L3. >> >> I think that makes sense? Unless people think remote should be reserved >> to mean on another chip, though we already have REM_CCE1 for that. > > IIRC the PERF_MEM_LVL_* namespace is somewhat depricated in favour of > the newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Of > course, ABIs being what they are, we get to support both :/ But I'm not > sure mixing them is a great idea. OK. > Also, clearly this could use a comment... > > The 'new' composite doesnt have a hops field because the hardware that > nessecitated that change doesn't report it, but we could easily add a > field there. > > Suppose we add, mem_hops:3 (would 6 hops be too small?) and the > corresponding PERF_MEM_HOPS_{NA, 0..6} It's really 7 if we use remote && hop = 0 to mean the first hop. If we're wanting to use some of the hop levels to represent intra-chip/package hops then we could possibly use them all on a really big system. eg. you could imagine something like: L2 | - local L2 L2 | REMOTE | HOPS_0 - L2 of neighbour core L2 | REMOTE | HOPS_1 - L2 of near core on same chip (same 1/2 of chip) L2 | REMOTE | HOPS_2 - L2 of far core on same chip (other 1/2 of chip) L2 | REMOTE | HOPS_3 - L2 of sibling chip in same package L2 | REMOTE | HOPS_4 - L2 on separate package 1 hop away L2 | REMOTE | HOPS_5 - L2 on separate package 2 hops away L2 | REMOTE | HOPS_6 - L2 on separate package 3 hops away Whether it's useful to represent all those levels I'm not sure, but it's probably good if we have the ability. I guess I'm 50/50 on whether that's enough levels, or whether we want another bit to allow for future growth. > Then I suppose you can encode things like: > > L2 - local L2 > L2 | REMOTE - remote L2 at an unspecified distance (NA) > L2 | REMOTE | HOPS_0 - remote L2 on the same node > L2 | REMOTE | HOPS_1 - remote L2 on a node 1 removed > > Would that work? Yeah that looks good to me. cheers