> On 21-Oct-2021, at 10:47 PM, Nathan Lynch <nath...@linux.ibm.com> wrote: > > Athira Rajeev <atraj...@linux.vnet.ibm.com > <mailto:atraj...@linux.vnet.ibm.com>> writes: >> During Live Partition Migration (LPM), it is observed that perf >> counter values reports zero post migration completion. However >> 'perf stat' with workload continues to show counts post migration >> since PMU gets disabled/enabled during sched switches. But incase >> of system/cpu wide monitoring, zero counts were reported with 'perf >> stat' after migration completion. >> >> Example: >> ./perf stat -e r1001e -I 1000 >> time counts unit events >> 1.001010437 22,137,414 r1001e >> 2.002495447 15,455,821 r1001e >> <<>> As seen in next below logs, the counter values shows zero >> after migration is completed. >> <<>> >> 86.142535370 129,392,333,440 r1001e >> 87.144714617 0 r1001e >> 88.146526636 0 r1001e >> 89.148085029 0 r1001e > > Confirmed in my environment: > > 51.099987985 300,338 cache-misses > 52.101839374 296,586 cache-misses > 53.116089796 263,150 cache-misses > 54.117949249 232,290 cache-misses > 55.602029375 68,700,421,711 cache-misses > 56.610073969 0 cache-misses > 57.614732000 0 cache-misses > > I wonder what it means that there is a very unlikely huge value before > the counter stops working -- I believe your example has this phenomenon > too. > > >> diff --git a/arch/powerpc/platforms/pseries/mobility.c >> b/arch/powerpc/platforms/pseries/mobility.c >> index e83e089..ff7a77c 100644 >> --- a/arch/powerpc/platforms/pseries/mobility.c >> +++ b/arch/powerpc/platforms/pseries/mobility.c >> @@ -476,6 +476,8 @@ static int do_join(void *arg) >> retry: >> /* Must ensure MSR.EE off for H_JOIN. */ >> hard_irq_disable(); >> + /* Disable PMU before suspend */ >> + mobility_pmu_disable(); >> hvrc = plpar_hcall_norets(H_JOIN); >> >> switch (hvrc) { >> @@ -530,6 +532,8 @@ static int do_join(void *arg) >> * reset the watchdog. >> */ >> touch_nmi_watchdog(); >> + /* Enable PMU after resuming */ >> + mobility_pmu_enable(); >> return ret; >> } > > We should minimize calls into other subsystems from this context (the > callback function we've passed to stop_machine); it's fairly sensitive. > Can this be moved out to pseries_migrate_partition() or similar?
Hi Nathan Thanks for the review. I will move the callbacks to “pseries_migrate_partition” in next version Athira.