> >> On Thu, Aug 6, 2015 at 1:25 PM, Liang, Kan <kan.li...@intel.com> wrote: > >> > > >> >> >> >> >> +static cpumask_t power_cstate_core_cpu_mask; > >> >> >> >> > > >> >> >> >> > That one typically does not need a cpumask. > >> >> >> >> > > >> >> >> >> You need to pick one CPU out of the multi-core. But it is > >> >> >> >> for client parts thus there is only one socket. At least > >> >> >> >> this is my > >> >> understanding. > >> >> >> >> > >> >> >> > > >> >> >> > CORE_C*_RESIDENCY are available for physical processor core. > >> >> >> > So logical processor in same physical processor core share > >> >> >> > the same counter. > >> >> >> > I think we need the cpumask to identify the default logical > >> >> >> > processor which do counting. > >> >> >> > > >> >> >> Did you restrict these events to system-wide mode only? > >> >> >> > >> >> Ok, so that means that your cpumask includes one HT per physical > core. > >> >> But then, the result is not the simple aggregation of all the N/2 CPUs. > >> > > >> > The counter counts per physical core. The result is the aggregation > >> > of all HT cpus in same physical core. > >> > >> But then don't you need to divide by 2 to get a meaningful result? > > > > Rethink of it. I think I was unclear about the aggregation of all HT > > cpus in same physical core. > > > > physical core Cstate should equal to min(logical core C-state). > > So only all logical core enters C6-state, the physical core enters > > C6-state, then CORE_C6_RESIDENCY counts. > > > > So if we only count on one logical core/HT for CORE_C6_RESIDENCY. > > We don't need to divide by 2. The count result is the residency when > > all logical core in C6 (some may deeper). > > > Ok and here you are assuming you are only measuring one logical CPU per > physical core. If this is the case, then I think you are alright. But I wonder > what you'd get when perf stat -a aggregates across all measured CPUs, i.e., > one CPU per core.
Just add them all together. I think we do the same thing for other PMUs as well. For uncore or rapl, we get meaningful result by applying --per-socket. Here we can use --per-core. Thanks, Kan