On Fri, 24 May 2002, Archie Cobbs wrote: > I'm trying to debug a mbuf corruption bug in the kernel. I've added > an mbuf sanity check routine which calls panic() if anything is amiss > with the mbuf free list, etc. This function runs at splimp() and if/when > it calls panic() the cpl is still at splimp(). > > My question is: does this guarantee that the mbuf free lists, etc. will > not be modified between the time panic() is called and the time a core > file is generated? For example, if an incoming packet causes a networking > interrupt after panic() has been called but before the core file is > written, will that interrupt be blocked when it calls splimp()?
No (apart from it being too late to block the interrupt after it has occurred). panic() should run entirely at the ipl that it is called at, or higher, and it should not undo any other interrupt disables (e.g. the CPU interrupt (un)mask or the ICU or APIC interrupt masks on i386's), since unmasking might cause various problems including corruption of your data structures. However, panic() is too broken to actually keep interrupts masked. If does a sync() very early, and sync() obviously cannot work with interrupts masked, since it wanders off into normal disk i/o code that depends on disk interrupts being enabled to work (actually it is the wait for i/o to complete after the sync() that depends on disk interrupts working). But sync() in panic() usually does work in FreeBSD-[1-4]. The usual mechanism for clobbering the interrupt masks so that it works is calling tsleep(). tsleep() knows that it is in a panic, but still "helpfully" enables interrupts. >From the RELENG_4 version: if (cold || panicstr) { /* * After a panic, or during autoconfiguration, * just give interrupts a chance, then just return; * don't run any other procs or panic below, * in case this is the idle process and already asleep. */ splx(safepri); splx(s); return (0); } You could try setting safepri to a priority that is actually safe (0xffff on i386's). There may be other ipl-clobbering mechanism though. sync() in panic() tends to not work in -current, since things are locked by mutexes and there is no kludge like the above to unlock them. The usual failure is to panic recursively on hitting a non-recursive mutex that is already held, usually the same one (in or near bremfree IIRC). There is some chance of dump working for recursive panics, but data structures may already have been clobbered. panic() has two defenses against endless recursion: it turns off sync() after the first entry to panic(), and it turns off dumping after the first entry to doadump(). It has no defense against recursion in all the EVENTHANDLER_INVOKE() shutdowns. All the event handlers are apparently supposed to have their own defenses :-(. > If this is not a valid assumption, is there an easy way to 'freeze' > the mbuf free lists long enough to generate the core file when an > inconsistency is found (other than adding the obvious hack)? Not if removing RB_SYNC is the obvious hack :-). Removing everything except the dump and the final EVENTHANDLER_INVOKE() in boot() should help. (One event handler shutdown is still needed to reboot the system, but it is after the dump so you don't care if it corrupts your structures). Maybe add code to splx() to check that the ipl is not lowered below its value at the start of panic(). Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-net" in the body of the message