Hi Juliet, Juliet Kim<juli...@linux.vnet.ibm.com> writes: > Fix extending start/stop topology update scope during LPM > Commit 65b9fdadfc4d ("powerpc/pseries/mobility: Extend start/stop > topology update scope") made the change to the duration that > topology updates are suppressed during LPM to allow the complete > device tree update which leaves the property update notifier > unregistered until device tree update completes. This prevents > topology update during LPM. > > Instead, use mutex_lock, which serializes LPM and PRRN operation > in pseries_devicetree_update.
I think this is conflating two issues: 1. Insufficient serialization/ordering of handling PRRNs and LPM. E.g. we could migrate while processing a PRRN from the source system and end up with incorrect contents in the device tree on the destination if the LPM changes the same nodes. The OS is supposed to drain any outstanding PRRNs before proceeding with migration, which is a stronger requirement than simple serialization of device tree updates. If we don't impose this ordering already we should fix that. 2. The NUMA topology update processing. Generally speaking, start/stop_topology_update() enable/disable dt_update_callback(), which we use to update CPU-node assignments. Since we now know that doing that is Bad, it's sort of a happy accident that migration_store() was changed to re-register the notifier after updating the device tree, which is too late. So I don't think we should try to "fix" this. Instead we should remove the broken code (dt_update_callback -> dlpar_cpu_readdd and so on). Do you agree? Thanks, Nathan