On Thu, Sep 29, 2022 at 5:56 AM James Clark <james.cl...@arm.com> wrote: > > > > On 29/09/2022 09:49, Athira Rajeev wrote: > > > > > >> On 28-Sep-2022, at 9:05 PM, James Clark <james.cl...@arm.com> wrote: > >> > >> > >> > > > > Hi James, > > > > Thanks for looking at the patch and sharing review comments. > > > >> On 13/09/2022 12:57, Athira Rajeev wrote: > >>> perf stat includes option to specify aggr_mode to display > >>> per-socket, per-core, per-die, per-node counter details. > >>> Also there is option -A ( AGGR_NONE, -no-aggr ), where the > >>> counter values are displayed for each cpu along with "CPU" > >>> value in one field of the output. > >>> > >>> Each of the aggregate mode uses the information fetched > >>> from "/sys/devices/system/cpu/cpuX/topology" like core_id, > >> > >> I thought that this wouldn't apply to the cpu field because cpu is > >> basically interchangeable as an index in cpumap, rather than anything > >> being read from the topology file. > > > > The cpu value is filled in this function: > > > > Function : aggr_cpu_id__cpu > > Code: util/cpumap.c > > > >> > >>> physical_package_id. Utility functions in "cpumap.c" fetches > >>> this information and populates the socket id, core id, cpu etc. > >>> If the platform does not expose the topology information, > >>> these values will be set to -1. Example, in case of powerpc, > >>> details like physical_package_id is restricted to be exposed > >>> in pSeries platform. So id.socket, id.core, id.cpu all will > >>> be set as -1. > >>> > >>> In case of displaying socket or die value, there is no check > >>> done in the "aggr_printout" function to see if it points to > >>> valid socket id or die. But for displaying "cpu" value, there > >>> is a check for "if (id.core > -1)". In case of powerpc pSeries > >>> where detail like physical_package_id is restricted to be > >>> exposed, id.core will be set to -1. Hence the column or field > >>> itself for CPU won't be displayed in the output. > >>> > >>> Result for per-socket: > >>> > >>> <<>> > >>> perf stat -e branches --per-socket -a true > >>> > >>> Performance counter stats for 'system wide': > >>> > >>> S-1 32 416,851 branches > >>> <<>> > >>> > >>> Here S has -1 in above result. But with -A option which also > >>> expects CPU in one column in the result, below is observed. > >>> > >>> <<>> > >>> /bin/perf stat -e instructions -A -a true > >>> > >>> Performance counter stats for 'system wide': > >>> > >>> 47,146 instructions > >>> 45,226 instructions > >>> 43,354 instructions > >>> 45,184 instructions > >>> <<>> > >>> > >>> If the cpu id value is pointing to -1 also, it makes sense > >>> to display the column in the output to replicate the behaviour > >>> or to be in precedence with other aggr options(like per-socket, > >>> per-core). Remove the check "id.core" so that CPU field gets > >>> displayed in the output. > >> > >> Why would you want to print -1 out? Seems like the if statement was a > >> good one to me, otherwise the output looks a bit broken to users. Are > >> the other aggregation modes even working if -1 is set for socket and > >> die? Maybe we need to not print -1 in those cases or exit earlier with a > >> failure. > >> > >> The -1 value has a specific internal meaning which is "to not > >> aggregate". It doesn't mean "not set". > > > > Currently, this check is done only for printing cpu value. > > For socket/die/core values, this check is not done. Pasting an > > example snippet from a powerpc system ( specifically from pseries platform > > where > > the value is set to -1 ) > > > > ./perf stat --per-core -a -C 1 true > > > > Performance counter stats for 'system wide': > > > > S-1-D-1-C-1 1 1.06 msec cpu-clock > > # 1.018 CPUs utilized > > S-1-D-1-C-1 1 2 context-switches > > # 1.879 K/sec > > S-1-D-1-C-1 1 0 cpu-migrations > > # 0.000 /sec > > > > Here though the value is -1, we are displaying it. Where as in case of cpu, > > the first column will be > > empty since we do a check before printing. > > > > Example: > > > > ./perf stat --per-core -A -C 1 true > > > > Performance counter stats for 'CPU(s) 1': > > > > 0.88 msec cpu-clock # 1.022 CPUs > > utilized > > 2 context-switches > > 0 cpu-migrations > > > > > > No sure, whether there are scripts out there, which consume the current > > format and > > not displaying -1 may break it. That is why we tried with change to remove > > check for cpu, similar to > > other modes like socket, die, core etc. > > I wouldn't worry about that because there are json and CSV modes which > are machine readable, and -1 is already not always displayed. If > anything this change here is also likely to break parsing by adding -1 > where it wasn't before. > > > > > Also perf code ie “aggr_cpu_id__empty” in util/cpumap.c initialises the > > values to -1 . I was checking to see where we are mapping -1 to “to not > > aggregate”. > > What I could find is AGGR_NONE ( which is for no-aggr ) has value as zero. > > > > Reference: defined in util/stat.h > > > > enum aggr_mode { > > AGGR_NONE, > > > > That enum is never written to any of the cpumap members, that defines > the mode of how to fill the cpu map instead. 0 is a valid value, for > example "CPU 0". -1 is used as a special case and shouldn't be displayed > IMO. > > Did you see my comment in the code below about the bad merge? Could that > not be related to your issue?
I'm suspicious of this too. In Claire's patch: case AGGR_NONE: - if (evsel->percore && !config->percore_show_thread) { - fprintf(config->output, "S%d-D%d-C%*d%s", - id.socket, - id.die, - config->csv_output ? 0 : -3, - id.core, config->csv_sep); - } else if (id.cpu.cpu > -1) { - fprintf(config->output, "CPU%*d%s", - config->csv_output ? 0 : -7, - id.cpu.cpu, config->csv_sep); + if (config->json_output) { + if (evsel->percore && !config->percore_show_thread) { + fprintf(config->output, "\"core\" : \"S%d-D%d-C%d\"", + id.socket, + id.die, + id.core); + } else if (id.core > -1) { + fprintf(config->output, "\"cpu\" : \"%d\", ", + id.cpu.cpu); + } + } else { + if (evsel->percore && !config->percore_show_thread) { + fprintf(config->output, "S%d-D%d-C%*d%s", + id.socket, + id.die, + config->csv_output ? 0 : -3, + id.core, config->csv_sep); + } else if (id.core > -1) { + fprintf(config->output, "CPU%*d%s", + config->csv_output ? 0 : -7, + id.cpu.cpu, config->csv_sep); + } } break; The old code was using "id.cpu.cpu > -1" while the new code is "id.core > -1". The value printed is id.cpu.cpu and so testing id.core makes less sense to me. Going back to the original patch: https://lore.kernel.org/lkml/20210811224317.1811618-1-cje...@google.com/ case AGGR_NONE: - if (evsel->percore && !config->percore_show_thread) { - fprintf(config->output, "S%d-D%d-C%*d%s", - id.socket, - id.die, - config->csv_output ? 0 : -3, - id.core, config->csv_sep); + if (config->json_output) { + if (evsel->percore && !config->percore_show_thread) { + fprintf(config->output, "\"core\" : \"S%d-D%d-C%d\"", + id.socket, + id.die, + id.core); + } else if (id.core > -1) { + fprintf(config->output, "\"cpu\" : \"%d\", ", + evsel__cpus(evsel)->map[id.core]); + } + } else { + if (evsel->percore && !config->percore_show_thread) { + fprintf(config->output, "S%d-D%d-C%*d%s", + id.socket, + id.die, + config->csv_output ? 0 : -3, + id.core, config->csv_sep); } else if (id.core > -1) { fprintf(config->output, "CPU%*d%s", config->csv_output ? 0 : -7, evsel__cpus(evsel)->map[id.core], config->csv_sep); - } + } + } + break; So testing the id.core isn't a bad index makes sense. However, we changed from core to CPU here: https://lore.kernel.org/all/20220105061351.120843-26-irog...@google.com/ and that was because of: https://lore.kernel.org/r/20220105061351.120843-25-irog...@google.com So I think the code needs to test CPU and not core. Whether that is addressing the Power test failures is another matter, as James said we may need a fix in the tests for that. Thanks, Ian > Or the one about fixing it in the test instead? Or failing early if the > topology can't be read? > > I'm still not convinced that any of the modes where -1 is printed are > even working properly so it might be best to fix that rather than just > the printout. > > > James, can you point me to reference for that meaning if I have missed > > anything. > > It's here: > > /** Identify where counts are aggregated, -1 implies not to aggregate. */ > struct aggr_cpu_id { > > > > > Thanks > > Athira > > > >> > >>> > >>> After the fix: > >>> > >>> <<>> > >>> perf stat -e instructions -A -a true > >>> > >>> Performance counter stats for 'system wide': > >>> > >>> CPU-1 64,034 instructions > >>> CPU-1 68,941 instructions > >>> CPU-1 59,418 instructions > >>> CPU-1 70,478 instructions > >>> CPU-1 65,201 instructions > >>> CPU-1 63,704 instructions > >>> <<>> > >>> > >>> This is caught while running "perf test" for > >>> "stat+json_output.sh" and "stat+csv_output.sh". > >> > >> Is it possible to fix the issue by making the tests cope with the lack > >> of the CPU id? > >> > >>> > >>> Reported-by: Disha Goel <disg...@linux.vnet.ibm.com> > >>> Signed-off-by: Athira Rajeev <atraj...@linux.vnet.ibm.com> > >>> --- > >>> tools/perf/util/stat-display.c | 6 ++---- > >>> 1 file changed, 2 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/tools/perf/util/stat-display.c > >>> b/tools/perf/util/stat-display.c > >>> index b82844cb0ce7..1b751a730271 100644 > >>> --- a/tools/perf/util/stat-display.c > >>> +++ b/tools/perf/util/stat-display.c > >>> @@ -168,10 +168,9 @@ static void aggr_printout(struct perf_stat_config > >>> *config, > >>> id.socket, > >>> id.die, > >>> id.core); > >>> - } else if (id.core > -1) { > >>> + } else > >> > >> This should have been "id.cpu.cpu > -1". Looks like it was changed by > >> some kind of bad merge or rebase in df936cadfb because there is no > >> obvious justification for the change to .core in that commit. > > > >> > >>> fprintf(config->output, "\"cpu\" : \"%d\", ", > >>> id.cpu.cpu); > >>> - } > >>> } else { > >>> if (evsel->percore && !config->percore_show_thread) { > >>> fprintf(config->output, "S%d-D%d-C%*d%s", > >>> @@ -179,11 +178,10 @@ static void aggr_printout(struct perf_stat_config > >>> *config, > >>> id.die, > >>> config->csv_output ? 0 : -3, > >>> id.core, config->csv_sep); > >>> - } else if (id.core > -1) { > >>> + } else > >>> fprintf(config->output, "CPU%*d%s", > >>> config->csv_output ? 0 : -7, > >>> id.cpu.cpu, config->csv_sep); > >>> - } > >>> } > >>> break; > >>> case AGGR_THREAD: > >