OK, let's make it a bit clearer !
I use a private scheme to interact with the 'ipintr' isr. The two following routines are expected to be called either by our modified version of 'ip_input' at network SWI level or at user level.
int my_global_ipl=0;
void my_enter() {
int s=splnet();
/* We do not expect this routine to be reentrant, thus the following sanity check. */
ASSERT(my_global_ipl==0);
my_global_ipl=s;
}
void my_exit() {
int s=my_global_ipl;
my_global_ipl=0;
splx(s);
}
The crashes I got are always due to the assertion failure occuring in the 'ipintr' isr. This *seems* to indicate that 'my_enter' is called at the network SWI level after another execution flow has called 'my_enter' itself and has *NOT* called 'my_exit' yet ! This actually seems strange due to the 'splnet', and the only explanation I have found is that the first execution flow has fallen asleep somewhere in the kernel (while this is not expected, of course !).
Now, if you've read my first mail, I was actually asking for help onhow to dump the stack of an interrupted process with GDB when the kernelcrash occurs in the context of an isr. Actually, I would like to know how I could dump the stack of *any* process at the time of the crash. This way, I would be able to see where my user-land daemon was lying in the kernel when the interrupt occurs.
Anyway, without this information, I am reduced to add some traps on the track of the execution of my process within my kernel code. This brought me to surround calls to MALLOC with counters as follows:
somewhere_else() {
...
my_enter(); /* handle competition with network isr (especially ipintr) */
...
some_counter++;
MALLOC(buf,cast,size,M_DEVBUF,M_NOWAIT);
some_other_counter++;
...
my_exit();
...
}
Then, all crashes I got show the following equation at the time of crash:
( some_counter - some_other_counter == 1 )which *seems* to indicate that that my process has been somehow preempted during the call to MALLOC.
My belief is that the FreeBSD kernel is (currently) a monolithic non-preemptive non-threaded UNIX kernel, thus implying that :
- system-scope scheduling is still done at process level (no kernel thread yet)
- any process executing in the kernel cannot be preempted for execution by another process unless it either returns to user code or falls explicitely asleep.
- the only interlocking that must be done is with interrupts (when relevant), using the 'spl' management routine set.
Well, I am obviously tracking a bug in my own code, but I would greatly appreciate to get help either on my GDB usage question or through technical hints on where I should look at to progress in my investigation.
Thank you very much for your attention,
Rgds,
Xavier
Alfred Perlstein wrote:
[EMAIL PROTECTED]">* Xavier Galleri <[EMAIL PROTECTED]> [010111 11:27] wrote:Hi everybody,
I have reached a point where I am wondering if a call to 'malloc' with
the M_NOWAIT flag is not falling asleep !
M_NOWAIT shouldn't sleep.In fact, I suspect that the interrupted context is somewhere during a
call to 'malloc' (I increment a counter just before calling malloc and
increment another just after and the difference is one !) while I have
called 'splnet' beforehand, thus normally preventing competing with any
network isr. I assume that this shouldnever occur unless the code is
somewhere calling 'sleep' and provoke acontext switch.
if you add 1 to a variable the difference is expected to be one.Is there anybody who can help on this ?
I'm not sure, you need to be more specific/clear.