Laurent Dufour <lduf...@linux.ibm.com> writes: > When a partition is transferred, once it arrives at the destination node, > the partition is active but much of its memory must be transferred from the > start node. > > It depends on the activity in the partition, but the more CPU the partition > has, the more memory to be transferred is likely to be. This causes latency > when accessing pages that need to be transferred, and often, for large > partitions, it triggers the NMI watchdog.
It also triggers warnings from other watchdogs and subsystems that have soft latency requirements - softlockup, RCU, workqueue. The issue is more general than the NMI watchdog. > The NMI watchdog causes the CPU stack to dump where it appears to be > stuck. In this case, it does not bring much information since it can happen > during any memory access of the kernel. When the site of a watchdog backtrace shows a thread stuck on a routine memory access as opposed to something like a lock acquisition, that is actually useful information that shouldn't be discarded. It tells us the platform is failing to adequately virtualize partition memory. This isn't a benign situation and it's likely to unacceptably affect real workloads. The kernel is ideally situated to detect and warn about this. > In addition, the NMI interrupt mechanism is not secure and can generate a > dump system in the event that the interruption is taken while > MSR[RI]=0. This sounds like a general problem with that facility that isn't specific to partition migration? Maybe it should be disabled altogether until that can be fixed? > Given how often hard lockups are detected when transferring large > partitions, it seems best to disable the watchdog NMI until the memory > transfer from the start node is complete. At this time, I'm far from convinced. Disabling the watchdog is going to make the underlying problems in the platform and/or network harder to understand.