On Fri, Jun 20, 2014 at 10:28:13AM -0400, Boris Ostrovsky wrote: > Commit 9c15a24b038f4d8da93a2bc2554731f8953a7c17 (x86/mce: Improve > mcheck_init_device() error handling) unregisters (or never registers) > MCE's hotplug notifier if an error is encountered.
Well, mcheck_init_device() did encounter errors before that commit too, can you please go into detail on how exactly you're triggering this? Which error are you talking about exactly? Lemme guess: some xen special handling which baremetal doesn't need. > Since unplugging a CPU would normally result in the notifier deleting > MCE timer we are now left with the timer running if a CPU is removed on > a system where mcheck_init_device() had failed. > > If we later hotplug this CPU back we add this timer again in > mcheck_cpu_init()). Eventually the two timers start intefering with each > other, causing soft lockups or system hangs. > > We should leave the notifier always on and, in fact, set it up early > during the boot. We do leave it always on - we only unregister it if we've encountered an error. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/