At work we have some custom watchdog hardware that sends an NMI upon expiry. We've modified the kernel to panic when it receives the watchdog NMI. I've been trying the "stop_scheduler_on_panic" mode, and I've discovered that when my watchdog expires, the system gets completely wedged. After some digging, I've discovered is that I have multiple CPUs getting the watchdog NMI and trying to panic concurrently. One of the CPUs wins, and the rest spin forever in this code:
/* * We don't want multiple CPU's to panic at the same time, so we * use panic_cpu as a simple spinlock. We have to keep checking * panic_cpu if we are spinning in case the panic on the first * CPU is canceled. */ if (panic_cpu != PCPU_GET(cpuid)) while (atomic_cmpset_int(&panic_cpu, NOCPU, PCPU_GET(cpuid)) == 0) while (panic_cpu != NOCPU) ; /* nothing */ The system wedges when stop_cpus_hard() is called, which sends NMIs to all of the other CPUs and waits for them to acknowledge that they are stopped before returning. However the CPU will not deliver an NMI to a CPU that is already handling an NMI, so the other CPUs that got a watchdog NMI and are spinning will never go into the NMI handler and acknowledge that they are stopped. I've been able to work around this with the following hideous hack: --- kern_shutdown.c 2012-08-17 10:25:02.000000000 -0400 +++ kern_shutdown.c 2012-11-15 17:04:10.000000000 -0500 @@ -658,11 +658,15 @@ * panic_cpu if we are spinning in case the panic on the first * CPU is canceled. */ - if (panic_cpu != PCPU_GET(cpuid)) + if (panic_cpu != PCPU_GET(cpuid)) { while (atomic_cmpset_int(&panic_cpu, NOCPU, - PCPU_GET(cpuid)) == 0) + PCPU_GET(cpuid)) == 0) { + atomic_set_int(&stopped_cpus, PCPU_GET(cpumask)); while (panic_cpu != NOCPU) ; /* nothing */ + } + atomic_clear_int(&stopped_cpus, PCPU_GET(cpumask)); + } if (stop_scheduler_on_panic) { if (panicstr == NULL && !kdb_active) But I'm hoping that somebody has some ideas on a better way to fix this kind of problem. _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"