On Thu, Jan 28, 2016 at 04:28:48PM +0100, Peter Zijlstra wrote: > On Thu, Jan 28, 2016 at 10:03:15AM +0100, Borislav Petkov wrote: > > > + > > +struct power_pmu { > > + raw_spinlock_t lock; > > Now that the list is gone, what does this thing protect? >
Protect the event count value before measure it. > > + struct pmu *pmu; > > This member seems superfluous, there's only the one possible value. > Currently, it's only one. But there will be more power pmu types in future processors. Acc power is one of them. > > + local64_t cpu_sw_pwr_ptsc; > > + > > + /* > > + * These two cpumasks are used for avoiding the allocations on the > > + * CPU_STARTING phase because power_cpu_prepare() will be called with > > + * IRQs disabled. > > + */ > > + cpumask_var_t mask; > > + cpumask_var_t tmp_mask; > > +}; > > + > > +static struct pmu pmu_class; > > + > > +/* > > + * Accumulated power represents the sum of each compute unit's (CU) power > > + * consumption. On any core of each CU we read the total accumulated power > > from > > + * MSR_F15H_CU_PWR_ACCUMULATOR. cpu_mask represents CPU bit map of all > > cores > > + * which are picked to measure the power for the CUs they belong to. > > + */ > > +static cpumask_t cpu_mask; > > + > > +static DEFINE_PER_CPU(struct power_pmu *, amd_power_pmu); > > + > > +static u64 event_update(struct perf_event *event, struct power_pmu *pmu) > > +{ > > Is there ever a case where @pmu != __this_cpu_read(power_pmu) ? > It only might be called at pmu:{read, stop}, they ensure __this_cpu_read(amd_power_pmu). Is there any other case I missed? > > + struct hw_perf_event *hwc = &event->hw; > > + u64 prev_raw_count, new_raw_count, prev_ptsc, new_ptsc; > > + u64 delta, tdelta; > > + > > +again: > > + prev_raw_count = local64_read(&hwc->prev_count); > > + prev_ptsc = local64_read(&pmu->cpu_sw_pwr_ptsc); > > + rdmsrl(event->hw.event_base, new_raw_count); > > Is hw.event_base != MSR_F15H_CU_PWR_ACCUMULATOR possible? > Any case that I missed? Could you explain more? > > + rdmsrl(MSR_F15H_PTSC, new_ptsc); > > > Also, I suspect this doesn't do what you expect it to do. > > We measure per-event PWR_ACC deltas, but per CPU PTSC values. These do > not match when there's more than 1 event on the CPU. > OK, I see. My intention of pre-event's count (event->count) should be PWR_ACC values after divided by PTSC. But here we cannot use local64_read(&hwc->prev_count) as previous value of PWR_ACC before divided by PTSC. Thanks to catch it. > I would suggest adding a new struct to the hw_perf_event union with the > two u64 deltas like: > > struct { /* amd_power */ > u64 pwr_acc; > u64 ptsc; > }; > > And track these values per-event. > Thanks to reminder. Thanks, Rui