On Sat, 6 Jan 2007, Frode Nordahl wrote:
I am experiencing a rare livelock on four of my backend mail servers running
6.1-STABLE, 6.2-BETA2 and 6.2-RC1. They are running OpenLDAP slapd, postfix
and UW-IMAPD.
The servers can run for months without any problem, but nevertheless I have
experienced this problem on multiple versions and different hardware
configurations about 5 times since september / october 2006.
Server is responding to pings, but all other activity halts.
On one occasion when one of the servers displayed this behaviour it managed
to recover from the situation by itself after being gone for 20-30 minutes.
Recovery is a sign of possible livelock, but otherwise this description sounds
more like deadlock than livelock. Note that deadlock can be in a specific
subsystem, so other services may still keep running -- for example, interrupts
and the in-bound network stack generally have no interaction with the file
system, so a file system deadlock can leave ping and the keyboard working.
The first step in diagnosing both livelock and deadlock is to figure out what
the system is actually doing. I'd start out with the following commands:
show pcpu
show allpcpu
trace
alltrace
ps
show lockedvnods
show locks
show alllocks
(The last two won't work unless you have WITNESS compiled in). The fact that
you can get into the debugger and run debugging commands is a good sign; the
fact that the debugger breaks into the idle thread suggests that the system
has at least one idle CPU.
Robert N M Watson
Computer Laboratory
University of Cambridge
Typical hardware configuration:
CPU 2x Xeon 3.06GHz or 1x Core2Duo 2.00GHz (SMP)
RAM 4 GB RAM
DISK Intel SRCU42X (amr) or Dell PERC 5/i (mfi)
Kernel config:
include GENERIC
options KDB # Enable kernel debugger support.
options BREAK_TO_DEBUGGER
options DDB # Support DDB.
options GDB # Support remote GDB.
options QUOTA
options SMP
On the last crash i collected the following info from DDB:
db> tr
Tracing pid 11 tid 100005 td 0xc8f90780
kdb_enter(c092f08b) at kdb_enter+0x2b
siointr1(c9120800) at siointr1+0xce
siointr(c9120800) at siointr+0x5e
intr_execute_handlers(c8f864c8,e7b14c94,4,e7b14cd8,c0889503,...) at
intr_execute_handlers+0xe1
lapic_handle_intr(3d) at lapic_handle_intr+0x2e
Xapic_isr1() at Xapic_isr1+0x33
--- interrupt, eip = 0xc0b5b0e5, esp = 0xe7b14cd8, ebp = 0xe7b14cd8 ---
acpi_cpu_c1(0,0,e7b14cf8,c8f90780,1,...) at acpi_cpu_c1+0x5
acpi_cpu_idle(e7b14d10,c066a779,c8f8fa78,c066a6e4,e7b14d24,...) at
acpi_cpu_idle+0x152
cpu_idle(c8f8fa78,c066a6e4,e7b14d24,c066a465,0,...) at cpu_idle+0x28
idle_proc(0,e7b14d38) at idle_proc+0x95
fork_exit(c066a6e4,0,e7b14d38) at fork_exit+0x71
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xe7b14d6c, ebp = 0 ---
db> show lockedbufs
buf at 0xdd08cbd0
b_flags = 0x20000000<vmio>
b_error = 0, b_bufsize = 16384, b_bcount = 16384, b_resid = 0
b_bufobj = (0xc937ed80), b_data = 0xdea14000, b_blkno = 14386688
b_npages = 4, pages(OBJ, IDX, PA): (0xc1045210, 0x1b70c0,
0xdbe35000),(0xc1045210, 0x1b70c1, 0xc17d6000),(0xc1045210, 0x1b70c2,
0x582d7000),(0xc1045210, 0x1b70c3, 0x84498000)
I have a crashdump or two available for further investigation.
--
Frode Nordahl
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"