On Fri, 13 Mar 2009, Nick Withers wrote:
Sorry for the original double-post, by the way, not quite sure how that
happened...
I can reproduce this problem relatively easily, by the way (every 3 days, on
average). I meant to say this before, too, but it seems to happen a lot more
often on the fxp than on rl.
I'm sorry to ask what is probably a very simple question, but is there
somewhere I should look to get clues on debugging from a manually generated
dump? I tried "panic" after manually envoking the kernel debugger but proved
highly inept at getting from the dump the same information "ps" / "where"
gave me within the debugger live.
If this is, in fact, a TCP input lock leak of some sort, then most likely some
particular property of a host your system talks to, or a network it runs over,
triggers this (presumably) unusual edge case -- perhaps a firewall that mucks
with TCP in a funny way, etc. Of course, it might be something completely
different -- the fact that everything is blocked on *tcp_sc_h and *tcp, simply
means that something holding TCP locks hasn't released them, and this could
happen for a number of reasons.
Once you've acquired a crashdump, you can run crashinfo(8), which will produce
a summary of useful debugging information. There are some things that are a
bit easier to do in the run-time debugger, such as lock analysis, as the
run-time debugger is more up-close and personal with in-kernel data
structures; other things are easier in kgdb, which has complete source code
and C type access. I find kgdb works pretty well for everything but "show
much what locks are held". Many of our system monitoring tools, including ps
and portions of netstat, can actually be run on crashdumps to report the state
of the system at the time it crashed -- take a look at the -M and -N command
line arguments, which respectively allow you to point those tools at the
crashdump and at a kernel with debugging symbols (typically kernel.debug or
kernel.symbols) matching the kernel that was booted at the time of the crash.
Robert N M Watson
Computer Laboratory
University of Cambridge
Ta for your help!
Robert N M Watson
Computer Laboratory
University of Cambridge
Tracing PID 31 tid 100030 td 0xffffff00012016e0
sched_switch() at sched_switch+0xf1
mi_switch() at mi_switch+0x18f
turnstile_wait() at turnstile_wait+0x1cf
_mtx_lock_sleep() at _mtx_lock_sleep+0x76
syncache_lookup() at syncache_lookup+0x176
syncache_expand() at syncache_expand+0x38
tcp_input() at tcp_input+0xa7d
ip_input() at ip_input+0xa8
ether_demux() at ether_demux+0x1b9
ether_input() at ether_input+0x1bb
fxp_intr() at fxp_intr+0x233
ithread_loop() at ithread_loop+0x17f
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
____
A "where" on a process stuck in "*tcp", in this case "[swi4: clock]",
gave the somewhat similar:
____
sched_switch() at sched_switch+0xf1
mi_switch() at mi_switch+0x18f
turnstile_wait() at turnstile_wait+0x1cf
_rw_rlock() at _rw_rlock+0x8c
ipfw_chk() at ipfw_chk+0x3ab2
ipfw_check_out() at ipfw_check_out+0xb1
pfil_run_hooks() at pfil_run_hooks+0x9c
ip_output() at ip_output+0x367
syncache_respond() at syncache_respond+0x2fd
syncache_timer() at syncache_timer+0x15a
(...)
____
In this particular case, the fxp0 card is in a lagg with rl0, but this
problem can be triggered with either card on their own...
The scheduler is SCHED_ULE.
I'm not too sure how to give more useful information that this, I'm
afraid. It's a custom kernel, too... Do I need to supply information on
what code actually exists at the relevant addresses (I'm not at all
clued in on how to do this... Sorry!)? Should I chuck WITNESS,
INVARIANTS et al. in?
I *think* every time this has been triggered there's been a "python2.5"
process in the "*tcp" state. This machine runs net-p2p/deluge and
generally has at least 100 TCP connections on the go at any given time.
Can anyone give me a clue as to what I might do to track this down?
Appreciate any pointers.
--
Nick Withers
email: n...@nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446
--
Nick Withers
email: n...@nickwithers.com
Web: http://www.nickwithers.com
Mobile: +61 414 397 446
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"