Guy Helmer wrote:
Guy Helmer wrote:
John Baldwin wrote:
On Thursday 26 February 2009 5:27:07 pm Guy Helmer wrote:
John Baldwin wrote:
On Thursday 26 February 2009 4:22:15 pm Guy Helmer wrote:
db> show sleepchain 23110
thread 100181 (pid 23110, vmstat) blocked on sx "user map" XLOCK
thread 100208 (pid 23092, kvoop) is on a run queue
db> show sleepchain 23092
thread 100208 (pid 23092, kvoop) is on a run queue
Ah, so this is normal (well, mostly) in that kvoop is simply on
the run
queue
waiting for a CPU. Can you find the thread pointer for kvoop and
check on things such as if it is pinned and if so to which CPU
(td_pinned will tell you the first, and td_sched->ts_cpu will tell
you the second with ULE).
(kgdb) print td->td_pinned
$2 = 0
Ok, not pinned.
From my captured ddb run:
cpuid = 3
curthread = 0xc5e2f000: pid 23090 "filter"
curpcb = 0xe6f90d90
fpcurthread = none
idlethread = 0xc442daf0: pid 11 "idle: cpu3"
APIC ID = 7
currentldt = 0x50
spin locks held:
At http://www.freebsd.org/~jhb/gdb/ you can find my kgdb scripts.
If you source gdb6 you can run 'runtds' which will show you what
each CPU is doing (more or less) in ps-style output.
I sure wish I could find the root cause of the hangs. On a hunch,
I tried setting "machdep.cpu_idle_hlt=0" on the amd64 machine, and
it has run 32 hours without a hang. It could just be coincidence,
though...
Ahhh, that actually could explain it perhaps. Do your CPUs support
C2 or higher sleep states for idle? You can try limiting it to only
C1 (or disable C1E in your BIOS if it has an option for that) to see
if that fixes it.
I don't think the CPUs support anything lower than C1 - there is no
hw.acpi.cpu.cx_supported sysctl node, and hw.cpi.cpu.cx_lowest is
C1. C1-Enhanced was already disabled in the BIOS, at least on the
machine running amd64. 48 hours of runtime, and no hangs seen yet.
I did reboot it this morning to check the sleep settings in the BIOS.
Despite having machdep.cpu_idle_hlt=0, the machine wedged for 40 hours
over the weekend but came back to life by itself. Could this be lost
IPIs, or a bug in the scheduler?
To finish off this thread, after I disabled hyperthreading in the BIOS
on this machine (dual Nocona Xeons in a Supermicro X6DHR-8G) it was
stable for 96 hours. I applied rev 189023
(machdep.hyperthreading_allowed=0 disables HT cores at boot) to
7.1-release, set machdep.hyperthreading_allowed=0 in /boot/loader.conf,
re-enabled hyperthreading the BIOS to verify the effect of r189023, and
the machine has been stable for 92 hours.
Guy
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"