Re: Need help for kernel crash dump analysis

Xavier Galleri Fri, 12 Jan 2001 02:01:59 -0800

Thank you for your answer,

OK, let's make it a bit clearer !

I use a private scheme to interact with the 'ipintr' isr. The two following routines are expected to be called either by our modified version of 'ip_input' at network SWI level or at user level.

int my_global_ipl=0;
void my_enter() {
int s=splnet();
/* We do not expect this routine to be reentrant, thus the following sanity check. */
ASSERT(my_global_ipl==0);
my_global_ipl=s;
}
void my_exit() {
int s=my_global_ipl;
my_global_ipl=0;
splx(s);
}

The crashes I got are always due to the assertion failure occuring in the 'ipintr' isr. This *seems* to indicate that 'my_enter' is called at the network SWI level after another execution flow has called 'my_enter' itself and has *NOT* called 'my_exit' yet ! This actually seems strange due to the 'splnet', and the only explanation I have found is that the first execution flow has fallen asleep somewhere in the kernel (while this is not expected, of course !).

Now, if you've read my first mail, I was actually asking for help onhow to dump the stack of an interrupted process with GDB when the kernelcrash occurs in the context of an isr. Actually, I would like to know how I could dump the stack of *any* process at the time of the crash. This way, I would be able to see where my user-land daemon was lying in the kernel when the interrupt occurs.

Anyway, without this information, I am reduced to add some traps on the track of the execution of my process within my kernel code. This brought me to surround calls to MALLOC with counters as follows:

somewhere_else() {
...
my_enter(); /* handle competition with network isr (especially ipintr) */
...
some_counter++;
MALLOC(buf,cast,size,M_DEVBUF,M_NOWAIT);
some_other_counter++;
...
my_exit();
...
}

Then, all crashes I got show the following equation at the time of crash:

( some_counter - some_other_counter == 1 )

which *seems* to indicate that that my process has been somehow preempted during the call to MALLOC.

My belief is that the FreeBSD kernel is (currently) a monolithic non-preemptive non-threaded UNIX kernel, thus implying that :

system-scope scheduling is still done at process level (no kernel thread yet)
any process executing in the kernel cannot be preempted for execution by another process unless it either returns to user code or falls explicitely asleep.
the only interlocking that must be done is with interrupts (when relevant), using the 'spl' management routine set.

Is that correct ?

Well, I am obviously tracking a bug in my own code, but I would greatly appreciate to get help either on my GDB usage question or through technical hints on where I should look at to progress in my investigation.

Thank you very much for your attention,

Rgds,

Xavier

Alfred Perlstein wrote:

[EMAIL PROTECTED]">

* Xavier Galleri <[EMAIL PROTECTED]> [010111 11:27] wrote:

Hi everybody,

I have reached a point where I am wondering if a call to 'malloc' with 
the M_NOWAIT flag is not falling asleep !


M_NOWAIT shouldn't sleep.

In fact, I suspect that the interrupted context is somewhere during a 
call to 'malloc' (I increment a counter just before calling malloc and 
increment another just after and the difference is one !) while I have 
called 'splnet' beforehand, thus normally preventing competing with any 
network isr. I assume that this shouldnever occur unless the code is 
somewhere calling 'sleep' and provoke acontext switch.


if you add 1 to a variable the difference is expected to be one.

Is there anybody who can help on this ?


I'm not sure, you need to be more specific/clear.

Re: Need help for kernel crash dump analysis

Reply via email to