> On 29-Oct-2021, at 6:45 PM, Michael Ellerman <m...@ellerman.id.au> wrote: > > Nicholas Piggin <npig...@gmail.com <mailto:npig...@gmail.com>> writes: >> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm: >>> During Live Partition Migration (LPM), it is observed that perf >>> counter values reports zero post migration completion. However >>> 'perf stat' with workload continues to show counts post migration >>> since PMU gets disabled/enabled during sched switches. But incase >>> of system/cpu wide monitoring, zero counts were reported with 'perf >>> stat' after migration completion. >>> >>> Example: >>> ./perf stat -e r1001e -I 1000 >>> time counts unit events >>> 1.001010437 22,137,414 r1001e >>> 2.002495447 15,455,821 r1001e >>> <<>> As seen in next below logs, the counter values shows zero >>> after migration is completed. >>> <<>> >>> 86.142535370 129,392,333,440 r1001e >>> 87.144714617 0 r1001e >>> 88.146526636 0 r1001e >>> 89.148085029 0 r1001e >> >> This is the output without the patch? After the patch it keeps counting >> I suppose? And does the very large count go away too? >> >>> >>> Here PMU is enabled during start of perf session and counter >>> values are read at intervals. Counters are only disabled at the >>> end of session. The powerpc mobility code presently does not handle >>> disabling and enabling back of PMU counters during partition >>> migration. Also since the PMU register values are not saved/restored >>> during migration, PMU registers like Monitor Mode Control Register 0 >>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain >>> the value it was programmed with. Hence PMU counters will not be >>> enabled correctly post migration. >>> >>> Fix this in mobility code by handling disabling and enabling of >>> PMU in all cpu's before and after migration. Patch introduces two >>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'. >>> mobility_pmu_disable() is called before the processor threads goes >>> to suspend state so as to disable the PMU counters. And disable is >>> done only if there are any active events running on that cpu. >>> mobility_pmu_enable() is called after the migrate is done to enable >>> back the PMU counters. >>> >>> Since the performance Monitor counters ( PMCs) are not >>> saved/restored during LPM, results in PMC value being zero and the >>> 'event->hw.prev_count' being non-zero value. This causes problem >>> during updation of event->count since we always accumulate >>> (event->hw.prev_count - PMC value) in event->count. If >>> event->hw.prev_count is greater PMC value, event->count becomes >>> negative. To fix this, 'prev_count' also needs to be re-initialised >>> for all events while enabling back the events. Hence read the >>> existing events and clear the PMC index (stored in event->hw.idx) >>> for all events im mobility_pmu_disable. By this way, event count >>> settings will get re-initialised correctly in power_pmu_enable. >>> >>> Signed-off-by: Athira Rajeev <atraj...@linux.vnet.ibm.com> >>> [ Fixed compilation error reported by kernel test robot ] >>> Reported-by: kernel test robot <l...@intel.com> >>> --- >>> Changelog: >>> Change from v2 -> v3: >>> Addressed review comments from Nicholas Piggin. >>> - Removed the "migrate" field which was added in initial >>> patch to address updation of event count settings correctly >>> in power_pmu_enable. Instead read off existing events in >>> mobility_pmu_disable before power_pmu_enable. >>> - Moved the mobility_pmu_disable/enable declaration from >>> rtas.h to perf event header file. >>> >>> Addressed review comments from Nathan. >>> - Moved the mobility function calls from stop_machine >>> context out to pseries_migrate_partition. Also now this >>> is a per cpu invocation. >>> >>> Change from v1 -> v2: >>> - Moved the mobility_pmu_enable and mobility_pmu_disable >>> declarations under CONFIG_PPC_PERF_CTRS in rtas.h. >>> Also included 'asm/rtas.h' in core-book3s to fix the >>> compilation warning reported by kernel test robot. >>> >>> arch/powerpc/include/asm/perf_event.h | 8 +++++ >>> arch/powerpc/perf/core-book3s.c | 39 +++++++++++++++++++++++ >>> arch/powerpc/platforms/pseries/mobility.c | 7 ++++ >>> 3 files changed, 54 insertions(+) >>> >>> diff --git a/arch/powerpc/include/asm/perf_event.h >>> b/arch/powerpc/include/asm/perf_event.h >>> index 164e910bf654..88aab6cf840c 100644 >>> --- a/arch/powerpc/include/asm/perf_event.h >>> +++ b/arch/powerpc/include/asm/perf_event.h >>> @@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return >>> false; } >>> static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; } >>> #endif >>> >>> +#ifdef CONFIG_PPC_PERF_CTRS >>> +void mobility_pmu_disable(void *unused); >>> +void mobility_pmu_enable(void *unused); >>> +#else >>> +static inline void mobility_pmu_disable(void *unused) { } >>> +static inline void mobility_pmu_enable(void *unused) { } >>> +#endif >>> + >>> #ifdef CONFIG_FSL_EMB_PERF_EVENT >>> #include <asm/perf_event_fsl_emb.h> >>> #endif >>> diff --git a/arch/powerpc/perf/core-book3s.c >>> b/arch/powerpc/perf/core-book3s.c >>> index 73e62e9b179b..2e8c4c668fa3 100644 >>> --- a/arch/powerpc/perf/core-book3s.c >>> +++ b/arch/powerpc/perf/core-book3s.c >>> @@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu) >>> local_irq_restore(flags); >>> } >>> >>> +/* >>> + * Called from pseries_migrate_partition() function >>> + * before migration, from powerpc/mobility code. >>> + */ > > These are only needed if pseries is built, so should be inside a PSERIES > ifdef.
Sure mpe, will address this change in next version > > This function should handle iterating over CPUs, that shouldn't be left > up to the mobility.c code. > > And the names should be something like pmu_start_migration(), > pmu_finish_migration(). Ok, will change > >>> +void mobility_pmu_disable(void *unused) >>> +{ >>> + struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events); >>> + struct perf_event *event; >>> + >>> + if (cpuhw->n_events != 0) { >>> + int i; >>> + >>> + power_pmu_disable(NULL); >>> + /* >>> + * Read off any pre-existing events because the register >>> + * values may not be migrated. >>> + */ >>> + for (i = 0; i < cpuhw->n_events; ++i) { >>> + event = cpuhw->event[i]; >>> + if (event->hw.idx) { >>> + power_pmu_read(event); >>> + event->hw.idx = 0; >>> + } >>> + } >>> + } >>> +} >>> + >>> /* >>> * Re-enable all events if disable == 0. >>> * If we were previously disabled and events were added, then >>> @@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu) >>> local_irq_restore(flags); >>> } >>> >>> +/* >>> + * Called from pseries_migrate_partition() function >>> + * after migration, from powerpc/mobility code. >>> + */ >>> +void mobility_pmu_enable(void *unused) >>> +{ >>> + struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events); >>> + >>> + cpuhw->n_added = cpuhw->n_events; >>> + power_pmu_enable(NULL); >>> +} >>> + >>> static int collect_events(struct perf_event *group, int max_count, >>> struct perf_event *ctrs[], u64 *events, >>> unsigned int *flags) >>> diff --git a/arch/powerpc/platforms/pseries/mobility.c >>> b/arch/powerpc/platforms/pseries/mobility.c >>> index e83e0891272d..3e96485ccba4 100644 >>> --- a/arch/powerpc/platforms/pseries/mobility.c >>> +++ b/arch/powerpc/platforms/pseries/mobility.c >>> @@ -22,6 +22,7 @@ >>> #include <linux/delay.h> >>> #include <linux/slab.h> >>> #include <linux/stringify.h> >>> +#include <linux/perf_event.h> >>> >>> #include <asm/machdep.h> >>> #include <asm/rtas.h> >>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle) >>> if (ret) >>> return ret; >>> >>> + /* Disable PMU before suspend */ >>> + on_each_cpu(&mobility_pmu_disable, NULL, 0); >> >> Why was this moved out of stop machine and to an IPI? >> >> My concern would be, what are the other CPUs doing at this time? Is it >> possible they could take interrupts and schedule? Could that mess up the >> perf state here? > > pseries_migrate_partition() is called directly from migration_store(), > which is the sysfs store function, which can be called concurrently by > different CPUs. > > It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(), > from sys_rtas(), again with no locking. > > So we could have two CPUs calling into here at the same time, which > might not crash, but is unlikely to work well. > > I think the lack of locking might have been OK in the past because only > one CPU will successfully get the other CPUs to call do_join() in > pseries_suspend(). But I could be wrong. > > Anyway, now that we're mutating the PMU state before suspending we need > to be more careful. So I think we need a lock around the whole sequence. Thanks for the review comments. The PMU callback invocations were from inside stop machine ( inside “do_join” ) in initial version. Moved these out to pseries_migrate_partition as an IPI so as to minimise calls to other sub-system from “do_join” context, as pointed by Nathan. But if it is not safe to have per-cpu invocation from pseries_migrate_partition, should we handle this under the interrupt disable path in do_join itself ( as in the initial version ) ? Please share your thoughts/suggestions. Thanks Athira > > cheers