Re: 7.2-release/amd64: panic, spin lock held too long

C. C. Tang Thu, 16 Jul 2009 00:06:30 -0700

Attilio Rao wrote:

2009/7/8 Dan Naumov <dan.nau...@gmail.com>:

On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov<dan.nau...@gmail.com> wrote:

On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao<atti...@freebsd.org> wrote:

2009/7/7 Dan Naumov <dan.nau...@gmail.com>:

On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao<atti...@freebsd.org> wrote:

2009/7/7 Dan Naumov <dan.nau...@gmail.com>:

I just got a panic following by a reboot a few seconds after running
"portsnap update", /var/log/messages shows the following:


Jul  7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel
Jul  7 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched lock
1) held by 0xffffff00017d8370 (tid 100054) too long
Jul  7 03:49:38 atom kernel: panic: spin lock held too long

That's a known bug, affecting -CURRENT as well.
The cpustop IPI is handled though an NMI, which means it could
interrupt a CPU in any moment, even while holding a spinlock,
violating one well known FreeBSD rule.
That means that the cpu can stop itself while the thread was holding
the sched lock spinlock and not releasing it (there is no way, modulo
highly hackish, to fix that).
In the while hardclock() wants to schedule something else to run and
got stuck on the thread lock.

Ideal fix would involve not using a NMI for serving the cpustop while
having a cheap way (not making the common path too hard) to tell
hardclock() to avoid scheduling while cpustop is in flight.

Thanks,
Attilio

Any idea if a fix is being worked on and how unlucky must one be to
run into this issue, should I expect it to happen again? Is it
basically completely random?

I'd like to work on that issue before BETA3 (and backport to
STABLE_7), I'm just time-constrained right now.
it is completely random.

Thanks,
Attilio

Ok, this is getting pretty bad, 23 hours later, I get the same kind of
panic, the only difference is that instead of "portsnap update", this
was triggered by "portsnap cron" which I have running between 3 and 4
am every day:

Jul  8 03:03:49 atom kernel: ssppiinn  lloocckk
00xxffffffffffffffff8800bb33eeddc400  ((sscchheedd  lloocck k1 )0 )h
ehledl db yb y 0x0xfffffffffff0f00001081735339760e 0( t(itdi d
10100006070)5 )t otoo ol olnogng
Jul  8 03:03:49 atom kernel: p
Jul  8 03:03:49 atom kernel: anic: spin lock held too long
Jul  8 03:03:49 atom kernel: cpuid = 0
Jul  8 03:03:49 atom kernel: Uptime: 23h2m38s

I have now tried repeating the problem by running "stress --cpu 8 --io
8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed
system load into the 15.50 ballpark and simultaneously running
"portsnap fetch" and "portsnap update" but I couldn't manually trigger
the panic, it seems that this problem is indeed random (although it
baffles me why is it specifically portsnap triggering it). I have now
disabled powerd to check whether that makes any difference to system
stability.


But is that happening at reboot time?

Thanks,
Attilio

I think I am also having similar problem on my Atom machine.(FreeBSD-7.2-Release-p1)

It does not happen at boot/reboot but panic randomly.

And I found that it remains stable for more than a month now after Idisabled powerd... (although I want to have it enabled)


--
C.C.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 7.2-release/amd64: panic, spin lock held too long

Reply via email to