Guy Helmer wrote:
Guy Helmer wrote:
John Baldwin wrote:
On Thursday 26 February 2009 5:27:07 pm Guy Helmer wrote:
John Baldwin wrote:
On Thursday 26 February 2009 4:22:15 pm Guy Helmer wrote:
db> show sleepchain 23110
thread 100181 (pid 23110, vmstat) blocked on sx "user map" XLOCK
thread 100208 (pid 23092, kvoop) is on a run queue
db> show sleepchain 23092
thread 100208 (pid 23092, kvoop) is on a run queue
Ah, so this is normal (well, mostly) in that kvoop is simply on the run
queue
waiting for a CPU. Can you find the thread pointer for kvoop and check on things such as if it is pinned and if so to which CPU (td_pinned will tell you the first, and td_sched->ts_cpu will tell you the second with ULE).
(kgdb) print td->td_pinned
$2 = 0

Ok, not pinned.

 From my captured ddb run:
cpuid        = 3
curthread    = 0xc5e2f000: pid 23090 "filter"
curpcb       = 0xe6f90d90
fpcurthread  = none
idlethread   = 0xc442daf0: pid 11 "idle: cpu3"
APIC ID      = 7
currentldt   = 0x50
spin locks held:

At http://www.freebsd.org/~jhb/gdb/ you can find my kgdb scripts. If you source gdb6 you can run 'runtds' which will show you what each CPU is doing (more or less) in ps-style output.

I sure wish I could find the root cause of the hangs. On a hunch, I tried setting "machdep.cpu_idle_hlt=0" on the amd64 machine, and it has run 32 hours without a hang. It could just be coincidence, though...

Ahhh, that actually could explain it perhaps. Do your CPUs support C2 or higher sleep states for idle? You can try limiting it to only C1 (or disable C1E in your BIOS if it has an option for that) to see if that fixes it.

I don't think the CPUs support anything lower than C1 - there is no hw.acpi.cpu.cx_supported sysctl node, and hw.cpi.cpu.cx_lowest is C1. C1-Enhanced was already disabled in the BIOS, at least on the machine running amd64. 48 hours of runtime, and no hangs seen yet. I did reboot it this morning to check the sleep settings in the BIOS.
Despite having machdep.cpu_idle_hlt=0, the machine wedged for 40 hours over the weekend but came back to life by itself. Could this be lost IPIs, or a bug in the scheduler?
To finish off this thread, after I disabled hyperthreading in the BIOS on this machine (dual Nocona Xeons in a Supermicro X6DHR-8G) it was stable for 96 hours. I applied rev 189023 (machdep.hyperthreading_allowed=0 disables HT cores at boot) to 7.1-release, set machdep.hyperthreading_allowed=0 in /boot/loader.conf, re-enabled hyperthreading the BIOS to verify the effect of r189023, and the machine has been stable for 92 hours.

Guy

_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to