On Wed, Jul 8, 2009 at 3:57 AM, Dan Naumov<dan.nau...@gmail.com> wrote: > On Tue, Jul 7, 2009 at 4:27 AM, Attilio Rao<atti...@freebsd.org> wrote: >> 2009/7/7 Dan Naumov <dan.nau...@gmail.com>: >>> On Tue, Jul 7, 2009 at 4:18 AM, Attilio Rao<atti...@freebsd.org> wrote: >>>> 2009/7/7 Dan Naumov <dan.nau...@gmail.com>: >>>>> I just got a panic following by a reboot a few seconds after running >>>>> "portsnap update", /var/log/messages shows the following: >>>>> >>>>> Jul 7 03:49:38 atom syslogd: kernel boot file is /boot/kernel/kernel >>>>> Jul 7 03:49:38 atom kernel: spin lock 0xffffffff80b3edc0 (sched lock >>>>> 1) held by 0xffffff00017d8370 (tid 100054) too long >>>>> Jul 7 03:49:38 atom kernel: panic: spin lock held too long >>>> >>>> That's a known bug, affecting -CURRENT as well. >>>> The cpustop IPI is handled though an NMI, which means it could >>>> interrupt a CPU in any moment, even while holding a spinlock, >>>> violating one well known FreeBSD rule. >>>> That means that the cpu can stop itself while the thread was holding >>>> the sched lock spinlock and not releasing it (there is no way, modulo >>>> highly hackish, to fix that). >>>> In the while hardclock() wants to schedule something else to run and >>>> got stuck on the thread lock. >>>> >>>> Ideal fix would involve not using a NMI for serving the cpustop while >>>> having a cheap way (not making the common path too hard) to tell >>>> hardclock() to avoid scheduling while cpustop is in flight. >>>> >>>> Thanks, >>>> Attilio >>> >>> Any idea if a fix is being worked on and how unlucky must one be to >>> run into this issue, should I expect it to happen again? Is it >>> basically completely random? >> >> I'd like to work on that issue before BETA3 (and backport to >> STABLE_7), I'm just time-constrained right now. >> it is completely random. >> >> Thanks, >> Attilio > > Ok, this is getting pretty bad, 23 hours later, I get the same kind of > panic, the only difference is that instead of "portsnap update", this > was triggered by "portsnap cron" which I have running between 3 and 4 > am every day: > > Jul 8 03:03:49 atom kernel: ssppiinn lloocckk > 00xxffffffffffffffff8800bb33eeddc400 ((sscchheedd lloocck k1 )0 )h > ehledl db yb y 0x0xfffffffffff0f00001081735339760e 0( t(itdi d > 10100006070)5 )t otoo ol olnogng > Jul 8 03:03:49 atom kernel: p > Jul 8 03:03:49 atom kernel: anic: spin lock held too long > Jul 8 03:03:49 atom kernel: cpuid = 0 > Jul 8 03:03:49 atom kernel: Uptime: 23h2m38s
I have now tried repeating the problem by running "stress --cpu 8 --io 8 --vm 4 --vm-bytes 1024M --timeout 600s --verbose" which pushed system load into the 15.50 ballpark and simultaneously running "portsnap fetch" and "portsnap update" but I couldn't manually trigger the panic, it seems that this problem is indeed random (although it baffles me why is it specifically portsnap triggering it). I have now disabled powerd to check whether that makes any difference to system stability. The only other things running on the system are: sshd, ntpd, smartd, smbd/nmdb and a few instances of irssi in screens. - Sincerely, Dan Naumov _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"