L3 accesses

Michael Ellerman Thu, 09 Sep 2021 05:46:35 -0700

Peter Zijlstra <pet...@infradead.org> writes:
> On Wed, Sep 08, 2021 at 05:17:53PM +1000, Michael Ellerman wrote:
>> Kajol Jain <kj...@linux.ibm.com> writes:
>
>> > diff --git a/include/uapi/linux/perf_event.h 
>> > b/include/uapi/linux/perf_event.h
>> > index f92880a15645..030b3e990ac3 100644
>> > --- a/include/uapi/linux/perf_event.h
>> > +++ b/include/uapi/linux/perf_event.h
>> > @@ -1265,7 +1265,9 @@ union perf_mem_data_src {
>> >  #define PERF_MEM_LVLNUM_L2        0x02 /* L2 */
>> >  #define PERF_MEM_LVLNUM_L3        0x03 /* L3 */
>> >  #define PERF_MEM_LVLNUM_L4        0x04 /* L4 */
>> > -/* 5-0xa available */
>> > +#define PERF_MEM_LVLNUM_OC_L2     0x05 /* On Chip L2 */
>> > +#define PERF_MEM_LVLNUM_OC_L3     0x06 /* On Chip L3 */
>> 
>> The obvious use for 5 is for "L5" and so on.
>> 
>> I'm not sure adding new levels is the best idea, because these don't fit
>> neatly into the hierarchy, they are off to the side.
>> 
>> 
>> I wonder if we should use the remote field.
>> 
>> ie. for another core's L2 we set:
>> 
>>   mem_lvl = PERF_MEM_LVL_L2
>>   mem_remote = 1
>
> This mixes APIs (see below), IIUC the correct usage would be something
> like: lvl_num=L2 remote=1


Aha, I was wondering how lvl and lvl_num were supposed to interact.

>> Which would mean "remote L2", but not remote enough to be
>> lvl = PERF_MEM_LVL_REM_CCE1.
>> 
>> It would be printed by the existing tools/perf code as "Remote L2", vs
>> "Remote cache (1 hop)", which seems OK.
>> 
>> 
>> ie. we'd be able to express:
>> 
>>   Current core's L2: LVL_L2
>>   Other core's L2:   LVL_L2 | REMOTE
>>   Other chip's L2:   LVL_REM_CCE1 | REMOTE
>> 
>> And similarly for L3.
>> 
>> I think that makes sense? Unless people think remote should be reserved
>> to mean on another chip, though we already have REM_CCE1 for that.
>
> IIRC the PERF_MEM_LVL_* namespace is somewhat depricated in favour of
> the newer composite PERF_MEM_{LVLNUM_,REMOTE_,SNOOPX_} fields. Of
> course, ABIs being what they are, we get to support both :/ But I'm not
> sure mixing them is a great idea.

OK.

> Also, clearly this could use a comment...
>
> The 'new' composite doesnt have a hops field because the hardware that
> nessecitated that change doesn't report it, but we could easily add a
> field there.
>
> Suppose we add, mem_hops:3 (would 6 hops be too small?) and the
> corresponding PERF_MEM_HOPS_{NA, 0..6}

It's really 7 if we use remote && hop = 0 to mean the first hop.

If we're wanting to use some of the hop levels to represent
intra-chip/package hops then we could possibly use them all on a really
big system.

eg. you could imagine something like:

 L2 |                   - local L2
 L2 | REMOTE | HOPS_0   - L2 of neighbour core
 L2 | REMOTE | HOPS_1   - L2 of near core on same chip (same 1/2 of chip)
 L2 | REMOTE | HOPS_2   - L2 of far core on same chip (other 1/2 of chip)
 L2 | REMOTE | HOPS_3   - L2 of sibling chip in same package
 L2 | REMOTE | HOPS_4   - L2 on separate package 1 hop away
 L2 | REMOTE | HOPS_5   - L2 on separate package 2 hops away
 L2 | REMOTE | HOPS_6   - L2 on separate package 3 hops away


Whether it's useful to represent all those levels I'm not sure, but it's
probably good if we have the ability.

I guess I'm 50/50 on whether that's enough levels, or whether we want
another bit to allow for future growth.

> Then I suppose you can encode things like:
> 
>       L2                      - local L2
>       L2 | REMOTE             - remote L2 at an unspecified distance (NA)
>       L2 | REMOTE | HOPS_0    - remote L2 on the same node
>       L2 | REMOTE | HOPS_1    - remote L2 on a node 1 removed
> 
> Would that work?

Yeah that looks good to me.

cheers

Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses

Reply via email to