On Wednesday, October 17, 2012 7:20:56 am Andriy Gapon wrote: > > _mtx_lock_spin has the following check in its retry loop: > if (i < 60000000 || kdb_active || panicstr != NULL) > DELAY(1); > else > _mtx_lock_spin_failed(m); > > Which means that in the (!kdb_active && panicstr == NULL) case we will make at > most 60000000 iterations and then call _mtx_lock_spin_failed (which proceeds > to > panic). When either kdb_active or panicstr is set, then we are going to loop > forever. > > I've done some digging through the lengthy history and many evolutions of the > code (unfortunately I haven't kept records during the research), and my > conclusion is that the kdb_active and panicstr checks were added at the quite > early era of FreeBSD SMP, where we didn't have a mechanism to stop/block other > CPUs when kdb or panic were entered. We didn't even prevent parallel > execution > of panic. > So the above code was a sort of defense where we hoped that "other" CPUs would > eventually stumble upon some held spinlock and would be captured there. Maybe > there was a specific set of spinlocks, which were supposed to help.
It wasn't so much as a way of hoping CPUs would stop so much as a way to prevent other CPUs from panic'ing while another CPU had already panic'd or was already in DDB making debugging harder. > Nowadays, we do try to stop other CPUs during panic and kdb activation and > there > are good chances that they are indeed stopped. In this circumstances, should > the main CPU be so unlucky as to run into the held spinlock, the above check > would do more harm than good - the main CPU would just spin there forever, > because a lock owner is also spinning in the stop loop and so can't release > the > lock. > Actually, this is only true for the kdb case. In the panic case we make a > check > earlier and simply ignore/skip/bust all the locks. That makes the panicstr > check in the code in question both harmless and useless. > > So I'd like to propose to remove those checks altogether. Or perhaps to > "reverse" them and immediately induce a (possibly secondary) panic if we ever > get to that wait loop and kdb_active || panicstr != NULL. > > What do you think? I think this sounds fine. I do think though that there are two behaviors. If for some reason you are not able to stop the other CPUs, you would rather them spin than trigger another panic while you are in DDB or writing out a crashdump. However, the CPU that is currently in the debugger or writing out a crashdump should probably bust all locks (code executed in debugger backends should generally avoid all locking at all, and depend on things like try locks where it gracefully fails if it must use locking. That would make the kdb_active case here irrelevant, and the panic case is already handled as you noted.) -- John Baldwin _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"