On 08-Dec-00 The Hermit Hacker wrote:
>
> Just upgraded the kernel, rebooted and it hung/panic'd with:
>
> panic: spin lock sched lock held by 0x0xc02a73el for > 5 seconds
> cpuid = 1; lapic.id = 01000000
> Debugger("panic")
>
> I have DDB enabled, and ctl-alt-esc doesn't break to the debugger, so its
> totally hung here ...
>
> dual-cpu celeron, smp enabled ...
>
> Marc G. Fournier ICQ#7615664 IRC Nick:
> Scrappy
> Systems Administrator @ hub.org
> primary: [EMAIL PROTECTED] secondary:
> scrappy@{freebsd|postgresql}.org
Yes. Something is broken with mutexes for the non-I386_CPU (and thus for SMP)
case in -current with the latest commit to i386/include/mutex.h. Of course,
you can revert that commit and then your kernel won't compile.... In the code
I've looked at so far, it looks like possibly a weird register allocation bug
in gcc and/or another weird nuance in the register constraints. In the
specific case I am looking at, the mtx_exit() of Giant in STOPEVENT in
syscall2() failed to properly release an unrecursed, uncontested Giant in
mtx_exit() and fell through to mtx_exit_hard(), which assumes that Giant is
either recursed or contested. When I disassembled the kernel and looked at the
code, gcc assumed that when it looked up curproc for the mtx_enter() operation
(which executed ok as far as I can tell), it could leave the value of curproc
cached in %edi _across_ the call to the stopevent() function. My only guess is
that %edi was clobbered during stopevent(), causing the cmpxchgl to fail, and
throwing the code into mtx_exit_hard() when it shouldn't have. :(
If anyone is an expert at register constraints, etc., please feel free to look
at the macros in src/sys/i386/include/mutex.h
--
John Baldwin <[EMAIL PROTECTED]> -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
"Power Users Use the Power to Serve!" - http://www.FreeBSD.org/
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message