Nicholas Piggin <npig...@gmail.com> writes: > Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm: >> During Live Partition Migration (LPM), it is observed that perf >> counter values reports zero post migration completion. However >> 'perf stat' with workload continues to show counts post migration >> since PMU gets disabled/enabled during sched switches. But incase >> of system/cpu wide monitoring, zero counts were reported with 'perf >> stat' after migration completion. >> >> Example: >> ./perf stat -e r1001e -I 1000 >> time counts unit events >> 1.001010437 22,137,414 r1001e >> 2.002495447 15,455,821 r1001e >> <<>> As seen in next below logs, the counter values shows zero >> after migration is completed. >> <<>> >> 86.142535370 129,392,333,440 r1001e >> 87.144714617 0 r1001e >> 88.146526636 0 r1001e >> 89.148085029 0 r1001e > > This is the output without the patch? After the patch it keeps counting > I suppose? And does the very large count go away too? > >> >> Here PMU is enabled during start of perf session and counter >> values are read at intervals. Counters are only disabled at the >> end of session. The powerpc mobility code presently does not handle >> disabling and enabling back of PMU counters during partition >> migration. Also since the PMU register values are not saved/restored >> during migration, PMU registers like Monitor Mode Control Register 0 >> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain >> the value it was programmed with. Hence PMU counters will not be >> enabled correctly post migration. >> >> Fix this in mobility code by handling disabling and enabling of >> PMU in all cpu's before and after migration. Patch introduces two >> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'. >> mobility_pmu_disable() is called before the processor threads goes >> to suspend state so as to disable the PMU counters. And disable is >> done only if there are any active events running on that cpu. >> mobility_pmu_enable() is called after the migrate is done to enable >> back the PMU counters. >> >> Since the performance Monitor counters ( PMCs) are not >> saved/restored during LPM, results in PMC value being zero and the >> 'event->hw.prev_count' being non-zero value. This causes problem >> during updation of event->count since we always accumulate >> (event->hw.prev_count - PMC value) in event->count. If >> event->hw.prev_count is greater PMC value, event->count becomes >> negative. To fix this, 'prev_count' also needs to be re-initialised >> for all events while enabling back the events. Hence read the >> existing events and clear the PMC index (stored in event->hw.idx) >> for all events im mobility_pmu_disable. By this way, event count >> settings will get re-initialised correctly in power_pmu_enable. >> >> Signed-off-by: Athira Rajeev <atraj...@linux.vnet.ibm.com> >> [ Fixed compilation error reported by kernel test robot ] >> Reported-by: kernel test robot <l...@intel.com> >> --- >> Changelog: >> Change from v2 -> v3: >> Addressed review comments from Nicholas Piggin. >> - Removed the "migrate" field which was added in initial >> patch to address updation of event count settings correctly >> in power_pmu_enable. Instead read off existing events in >> mobility_pmu_disable before power_pmu_enable. >> - Moved the mobility_pmu_disable/enable declaration from >> rtas.h to perf event header file. >> >> Addressed review comments from Nathan. >> - Moved the mobility function calls from stop_machine >> context out to pseries_migrate_partition. Also now this >> is a per cpu invocation. >> >> Change from v1 -> v2: >> - Moved the mobility_pmu_enable and mobility_pmu_disable >> declarations under CONFIG_PPC_PERF_CTRS in rtas.h. >> Also included 'asm/rtas.h' in core-book3s to fix the >> compilation warning reported by kernel test robot. >> >> arch/powerpc/include/asm/perf_event.h | 8 +++++ >> arch/powerpc/perf/core-book3s.c | 39 +++++++++++++++++++++++ >> arch/powerpc/platforms/pseries/mobility.c | 7 ++++ >> 3 files changed, 54 insertions(+) >> >> diff --git a/arch/powerpc/include/asm/perf_event.h >> b/arch/powerpc/include/asm/perf_event.h >> index 164e910bf654..88aab6cf840c 100644 >> --- a/arch/powerpc/include/asm/perf_event.h >> +++ b/arch/powerpc/include/asm/perf_event.h >> @@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return >> false; } >> static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; } >> #endif >> >> +#ifdef CONFIG_PPC_PERF_CTRS >> +void mobility_pmu_disable(void *unused); >> +void mobility_pmu_enable(void *unused); >> +#else >> +static inline void mobility_pmu_disable(void *unused) { } >> +static inline void mobility_pmu_enable(void *unused) { } >> +#endif >> + >> #ifdef CONFIG_FSL_EMB_PERF_EVENT >> #include <asm/perf_event_fsl_emb.h> >> #endif >> diff --git a/arch/powerpc/perf/core-book3s.c >> b/arch/powerpc/perf/core-book3s.c >> index 73e62e9b179b..2e8c4c668fa3 100644 >> --- a/arch/powerpc/perf/core-book3s.c >> +++ b/arch/powerpc/perf/core-book3s.c >> @@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu) >> local_irq_restore(flags); >> } >> >> +/* >> + * Called from pseries_migrate_partition() function >> + * before migration, from powerpc/mobility code. >> + */
These are only needed if pseries is built, so should be inside a PSERIES ifdef. This function should handle iterating over CPUs, that shouldn't be left up to the mobility.c code. And the names should be something like pmu_start_migration(), pmu_finish_migration(). >> +void mobility_pmu_disable(void *unused) >> +{ >> + struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events); >> + struct perf_event *event; >> + >> + if (cpuhw->n_events != 0) { >> + int i; >> + >> + power_pmu_disable(NULL); >> + /* >> + * Read off any pre-existing events because the register >> + * values may not be migrated. >> + */ >> + for (i = 0; i < cpuhw->n_events; ++i) { >> + event = cpuhw->event[i]; >> + if (event->hw.idx) { >> + power_pmu_read(event); >> + event->hw.idx = 0; >> + } >> + } >> + } >> +} >> + >> /* >> * Re-enable all events if disable == 0. >> * If we were previously disabled and events were added, then >> @@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu) >> local_irq_restore(flags); >> } >> >> +/* >> + * Called from pseries_migrate_partition() function >> + * after migration, from powerpc/mobility code. >> + */ >> +void mobility_pmu_enable(void *unused) >> +{ >> + struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events); >> + >> + cpuhw->n_added = cpuhw->n_events; >> + power_pmu_enable(NULL); >> +} >> + >> static int collect_events(struct perf_event *group, int max_count, >> struct perf_event *ctrs[], u64 *events, >> unsigned int *flags) >> diff --git a/arch/powerpc/platforms/pseries/mobility.c >> b/arch/powerpc/platforms/pseries/mobility.c >> index e83e0891272d..3e96485ccba4 100644 >> --- a/arch/powerpc/platforms/pseries/mobility.c >> +++ b/arch/powerpc/platforms/pseries/mobility.c >> @@ -22,6 +22,7 @@ >> #include <linux/delay.h> >> #include <linux/slab.h> >> #include <linux/stringify.h> >> +#include <linux/perf_event.h> >> >> #include <asm/machdep.h> >> #include <asm/rtas.h> >> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle) >> if (ret) >> return ret; >> >> + /* Disable PMU before suspend */ >> + on_each_cpu(&mobility_pmu_disable, NULL, 0); > > Why was this moved out of stop machine and to an IPI? > > My concern would be, what are the other CPUs doing at this time? Is it > possible they could take interrupts and schedule? Could that mess up the > perf state here? pseries_migrate_partition() is called directly from migration_store(), which is the sysfs store function, which can be called concurrently by different CPUs. It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(), from sys_rtas(), again with no locking. So we could have two CPUs calling into here at the same time, which might not crash, but is unlikely to work well. I think the lack of locking might have been OK in the past because only one CPU will successfully get the other CPUs to call do_join() in pseries_suspend(). But I could be wrong. Anyway, now that we're mutating the PMU state before suspending we need to be more careful. So I think we need a lock around the whole sequence. cheers