Hi Rui, On 6/11/2024 2:00 PM, Zhang, Rui wrote: >> @@ -345,9 +353,14 @@ static int rapl_pmu_event_init(struct perf_event >> *event) >> u64 cfg = event->attr.config & RAPL_EVENT_MASK; >> int bit, ret = 0; >> struct rapl_pmu *rapl_pmu; >> + struct rapl_pmus *curr_rapl_pmus; >> >> /* only look at RAPL events */ >> - if (event->attr.type != rapl_pmus->pmu.type) >> + if (event->attr.type == rapl_pmus->pmu.type) >> + curr_rapl_pmus = rapl_pmus; >> + else if (rapl_pmus_per_core && event->attr.type == >> rapl_pmus_per_core->pmu.type) >> + curr_rapl_pmus = rapl_pmus_per_core; >> + else >> return -ENOENT; > > can we use container_of(event->pmu, struct rapl_pmus, pmu)?
Yes! that would be cleaner, will add it in next version. > >> >> /* check only supported bits are set */ >> @@ -374,9 +387,14 @@ static int rapl_pmu_event_init(struct perf_event >> *event) >> return -EINVAL; >> >> /* must be done before validate_group */ >> - rapl_pmu = cpu_to_rapl_pmu(event->cpu); >> + if (curr_rapl_pmus == rapl_pmus_per_core) >> + rapl_pmu = curr_rapl_pmus- >>> rapl_pmu[topology_core_id(event->cpu)]; >> + else >> + rapl_pmu = curr_rapl_pmus- >>> rapl_pmu[get_rapl_pmu_idx(event->cpu)]; >> + >> if (!rapl_pmu) >> return -EINVAL; > > Current code has PERF_EV_CAP_READ_ACTIVE_PKG flag set. > Can you help me understand why it does not affect the new per-core pmu? Good question, I went back and looked thru the code, it turns out that we are not going thru the code path that checks this flag and decides whether to run on the local cpu(cpu on which perf is running) or the event->cpu. So, having or not having this flag doesnt make a difference here, I did a small experiment for this. On a single package system, any core should be able to read the energy-pkg RAPL MSR and return the value, so there would be no need for a smp call to the event->cpu, but if we look thru the ftrace below we can see that only core 0 executes the pmu event even though we launched the perf stat for core 1. -------------------------------------------------------------------------- root@shatadru:/sys/kernel/tracing# perf stat -C 1 -e power/energy-pkg/ -- dd if=/dev/zero of=/dev/null bs=1M count=100000 100000+0 records in 100000+0 records out 104857600000 bytes (105 GB, 98 GiB) copied, 2.03295 s, 51.6 GB/s Performance counter stats for 'CPU(s) 1': 231.59 Joules power/energy-pkg/ 2.033916467 seconds time elapsed root@shatadru:/sys/kernel/tracing# echo 0 > tracing_on root@shatadru:/sys/kernel/tracing# cat trace # tracer: function # # entries-in-buffer/entries-written: 12/12 #P:192 # # _-----=> irqs-off/BH-disabled # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / _-=> migrate-disable # |||| / delay # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | perf-3309 [096] ...1. 3422.558183: rapl_get_attr_cpumask <-dev_attr_show perf-3309 [001] ...1. 3422.559436: rapl_pmu_event_init <-perf_try_init_event perf-3309 [001] ...1. 3422.559441: rapl_pmu_event_init <-perf_try_init_event perf-3309 [001] ...1. 3422.559449: rapl_pmu_event_init <-perf_try_init_event perf-3309 [001] ...1. 3422.559537: smp_call_function_single <-event_function_call <-- smp call to the event owner cpu(i.e. CPU0) <idle>-0 [000] d.h3. 3422.559544: rapl_pmu_event_add <-event_sched_in <-- CPU# column changed to 0 <idle>-0 [000] d.h4. 3422.559545: __rapl_pmu_event_start <-rapl_pmu_event_add perf-3309 [001] ...1. 3424.593398: smp_call_function_single <-event_function_call <-- smp call to the event owner cpu(i.e. CPU0) <idle>-0 [000] d.h3. 3424.593403: rapl_pmu_event_del <-event_sched_out <-- CPU# column changed to 0 <idle>-0 [000] d.h3. 3424.593403: rapl_pmu_event_stop <-rapl_pmu_event_del <idle>-0 [000] d.h4. 3424.593404: rapl_event_update.isra.0 <-rapl_pmu_event_stop perf-3309 [001] ...1. 3424.593514: smp_call_function_single <-event_function_call -------------------------------------------------------------------------- So, as we always use the event->cpu to run the event, the per-core PMU is not being affected by this flag. Anyway in next version, I will only selectively enable this flag for package scope events. But we will need to look into fixing this ineffective flag. > >> + >> event->cpu = rapl_pmu->cpu; >> event->pmu_private = rapl_pmu; >> event->hw.event_base = rapl_msrs[bit].msr; >> @@ -408,17 +426,38 @@ static struct attribute_group >> rapl_pmu_attr_group = { >> .attrs = rapl_pmu_attrs, >> }; >> >> +static ssize_t rapl_get_attr_per_core_cpumask(struct device *dev, >> + struct device_attribute >> *attr, char *buf) >> +{ >> + return cpumap_print_to_pagebuf(true, buf, >> &rapl_pmus_per_core->cpumask); >> +} >> + >> +static struct device_attribute dev_attr_per_core_cpumask = >> __ATTR(cpumask, 0444, >> + >> rapl_get_attr_per_core_cpumask, >> + >> NULL); > > DEVICE_ATTR I was not able to use DEVICE_ATTR, because there is already a "device_attribute dev_attr_cpumask_name" created for package PMU cpumask using DEVICE_ATTR(). So I had to create a "device_attribute dev_attr_per_core_cpumask" manually to avoid variable name clash. > >> + >> +static struct attribute *rapl_pmu_per_core_attrs[] = { >> + &dev_attr_per_core_cpumask.attr, >> + NULL, >> +}; >> + >> +static struct attribute_group rapl_pmu_per_core_attr_group = { >> + .attrs = rapl_pmu_per_core_attrs, >> +}; >> + >> RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01"); >> RAPL_EVENT_ATTR_STR(energy-pkg , rapl_pkg, "event=0x02"); >> RAPL_EVENT_ATTR_STR(energy-ram , rapl_ram, "event=0x03"); >> RAPL_EVENT_ATTR_STR(energy-gpu , rapl_gpu, "event=0x04"); >> RAPL_EVENT_ATTR_STR(energy-psys, rapl_psys, "event=0x05"); >> +RAPL_EVENT_ATTR_STR(energy-per-core, rapl_per_core, "event=0x06"); > > energy-per-core is for a separate pmu, so the event id does not need to > be 6. The same applies to PERF_RAPL_PERCORE. Correct, will fix in next version. > >> >> static struct rapl_model model_amd_hygon = { >> - .events = BIT(PERF_RAPL_PKG), >> + .events = BIT(PERF_RAPL_PKG) | >> + BIT(PERF_RAPL_PERCORE), >> .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT, >> .rapl_msrs = amd_rapl_msrs, >> + .per_core = true, >> }; > > can we use bit PERF_RAPL_PERCORE to check per_core pmu suppot? Makes sense, will modify. > > Just FYI, arch/x86/events/intel/cstate.c handles package/module/core > scope cstate pmus. It uses a different approach in the probing part, > which IMO is clearer. Yes, I went thru it, I see that separate variables are being used to mark the valid events for package and core scope and a wrapper fn around perf_msr_probe is created, will see if that will make sense here as well. Thanks for the review, Dhananjay > > thanks, > rui >