On Wed, Jun 12, 2019 at 10:17 AM Nathan Lynch <nath...@linux.ibm.com> wrote: > > It's common for the platform to replace the cache device nodes after a > migration. Since the cacheinfo code is never informed about this, it > never drops its references to the source system's cache nodes, causing > it to wind up in an inconsistent state resulting in warnings and oopses > as soon as CPU online/offline occurs after the migration, e.g. > > cache for /cpus/l3-cache@3113(Unified) refers to cache for > /cpus/l2-cache@200d(Unified) > WARNING: CPU: 15 PID: 86 at arch/powerpc/kernel/cacheinfo.c:176 > release_cache+0x1bc/0x1d0 > [...] > NIP [c00000000002d9bc] release_cache+0x1bc/0x1d0 > LR [c00000000002d9b8] release_cache+0x1b8/0x1d0 > Call Trace: > [c0000001fc99fa70] [c00000000002d9b8] release_cache+0x1b8/0x1d0 (unreliable) > [c0000001fc99fb10] [c00000000002ebf4] cacheinfo_cpu_offline+0x1c4/0x2c0 > [c0000001fc99fbe0] [c00000000002ae58] unregister_cpu_online+0x1b8/0x260 > [c0000001fc99fc40] [c000000000165a64] cpuhp_invoke_callback+0x114/0xf40 > [c0000001fc99fcd0] [c000000000167450] cpuhp_thread_fun+0x270/0x310 > [c0000001fc99fd40] [c0000000001a8bb8] smpboot_thread_fn+0x2c8/0x390 > [c0000001fc99fdb0] [c0000000001a1cd8] kthread+0x1b8/0x1c0 > [c0000001fc99fe20] [c00000000000c2d4] ret_from_kernel_thread+0x5c/0x68 > > Using device tree notifiers won't work since we want to rebuild the > hierarchy only after all the removals and additions have occurred and > the device tree is in a consistent state. Call cacheinfo_teardown() > before processing device tree updates, and rebuild the hierarchy > afterward. > > Fixes: 410bccf97881 ("powerpc/pseries: Partition migration in the kernel") > Signed-off-by: Nathan Lynch <nath...@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <e...@linux.vnet.ibm.com> > --- > arch/powerpc/platforms/pseries/mobility.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/arch/powerpc/platforms/pseries/mobility.c > b/arch/powerpc/platforms/pseries/mobility.c > index edc1ec408589..b8c8096907d4 100644 > --- a/arch/powerpc/platforms/pseries/mobility.c > +++ b/arch/powerpc/platforms/pseries/mobility.c > @@ -23,6 +23,7 @@ > #include <asm/machdep.h> > #include <asm/rtas.h> > #include "pseries.h" > +#include "../../kernel/cacheinfo.h" > > static struct kobject *mobility_kobj; > > @@ -345,11 +346,20 @@ void post_mobility_fixup(void) > */ > cpus_read_lock(); > > + /* > + * It's common for the destination firmware to replace cache > + * nodes. Release all of the cacheinfo hierarchy's references > + * before updating the device tree. > + */ > + cacheinfo_teardown(); > + > rc = pseries_devicetree_update(MIGRATION_SCOPE); > if (rc) > printk(KERN_ERR "Post-mobility device tree update " > "failed: %d\n", rc); > > + cacheinfo_rebuild(); > + > cpus_read_unlock(); > > /* Possibly switch to a new RFI flush type */ > -- > 2.20.1 > -- Thanks and Regards gautham.