All, Here is an update with more info. In addition to the lock order reversal, this is the third panic that I have seen that looked like this ...
Tracing id 110 tid 100089 td 0xffffff012f3f0c80 kdb_enter() at kdb_enter+0x2f panic() at panic+0x249 uma_dbg_free() at uma_dbg_free+0x188 uma_zfree_arg() at uma_zfree_arg+0x1b0 pf_purge_expired_states() at pf_purge_expired_states+0x41 pfsync_input at pfsync_input+xb35 pf_input() at ip_input+0x10f netisr_processqueue() at netisr_processqueue+0x17 swi_net() at swi_net+0xa8 ithread_loop() at ithread_loop+0xd9 fork_exit() at fork_exit+0xc3 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb44f9d00, rbp = 0 --- db> continue boot() called on cpu#0 Uptime: 13h42m43s Dumping 4864 MB 16 32 ... I was hoping to get a crash dump but unfortunately can't seem to get one to complete. In any case, this particular install is now toast. Im surprised it lasted as long as it did considering all the pushishment it took. Im not a kernel hacker, but it would seem to me that somthing is up with pfsync. Should it matter that I am running an AMD64 kernel in SMP mode? Matthew Grooms -----Original Message----- From: Grooms, Matthew Sent: Mon 6/6/2005 6:54 PM To: freebsd-stable@freebsd.org Subject: 5.4-RELEASE lockups on amd64 SMP My appologies. With the debug options listed in my previous post ( should have read 5.4 not 5.3 ), I got a lock order reversal. After a while, it paniced and spat out this ... lock order reversal 1st 0xffffffff80752ec0 pf task mtx (pf task mtx) @ contrib/pf/net/if_pfsync.c:1621 2nd 0xffffffff8076e9f0 user map (user man) @ vm/vm_map.c:2998 KDB: stack backtrace: witness_checkorder() at witness_checkorder+0x654 _sx_xlock() at _sx_xlock+0x51 vm_map_lookup() at vm_map_lookup+0x44 vm_fault() at vm_fault+0xba trap() at trap+0x1c5 alltraps_with_regs_pushed() at alltraps_with_regs_pushed+0x5 pf_state_tree_lan_ext_RB_REMOVE() at pf_state_tree_lan_ext_RB_REMOVE+0x10c pf_purge_expired_states() at pf_purge_expired_states+0xab pfsync_input() at ip_input+0x10f netisr_processqueue() at netisr_processqueue+0x17 swi_net() at swi_net+0xa8 ithread_loop() at ithread_loop+0xd9 fork_exit() at fork_exit+0xc3 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb44f9d00, rbp = 0 --- KDB: enter: withness_ckeckorder [thread pid 110 tid 100089] Stopped at kdb_enter+0x2f: nop db> panic blockable sleep lock (sleep mutex) tty @ kern/kern_event.c:1453 cpuid = 0 boot() called on cpu#0 Uptime: 10m40s Dumping 4864 mB 16 32 ......... After a reboot, I received another panic. Tracing pid 603 tid 100140 td 0xffffff012efda500 kdb_enter() at kdb_enter+02f panic() at panic+0x249 ffs_blkfree() at ffs_blkfree+0x483 indir_trunc() at indir_trunc+0x190 indir_trunc() at indir_trunc+0x1fb handle_workitem_freeblocks() at handle_workitem_freeblocks+0x228 softdep_setup_freeblocks() at softdep_setup_freeblocks+0x730 ffs_truncate() at ffs_truncate+0x1c9 ffs_snapshot() at ffs_snapshot+0x717 ffs_omount() at ffs_omount+0x16e vfs_domount() at vfs_domount+0x5a0 mount() at mount+0xd8 syscall() at syscall+0x1fb Xfast_syscall() at Xfast_syscall+0xa8 --- syscall(21, FreeBSD ELF64, mount), rip = 0800697580, rsp = 0x7fffffffec58, fbp = 0x515b10 --- I am guessing this is related to background fsck processes being launched because it happened consistently until I disabled background fsck and performed one manually in single user mode. Now I can boot normally into multi user mode. Not sure where to go from here except to watch the system and wait for more kernel debug output. BTW : To answer a reply to my previous post, I have 6 em interfaces. -Matthew -----Original Message----- From: Grooms, Matthew Sent: Mon 6/6/2005 12:06 PM To: freebsd-stable@freebsd.org Subject: Debug help - 5.3 lockups on amd64 SMP All, I am experiencing lockups on a production 5.4 amd64 SMP system. Its lightly loaded and seems to last about 3-5 days before it stops responding to network or even console interaction. The system is acting as a firewall and runs a mostly stock kernel with IPV6 removed and SMP, PF, PFLOG, CARP and ALTQ added. The only other thing I can think to note is that tcpdump is running constantly on the pflog interface to coax human readable firewall logs out of pf. I have an identical hot spare server with SMP disabled that has taken over flawlessly every time the live lock occurs so I am willing to leave the primary in the production environment to do testing and gather debug info. I have added the following options to primary fw kernel config ... # Debug Options makeoptions DEBUG=-g options DDB options KDB options BREAK_TO_DEBUGGER options INVARIANT_SUPPORT options INVARIANTS options WITNESS options WITNESS_KDB options WITNESS_SKIPSPIN ... and the following to the rc.conf ... dumpdev="/dev/amrd0s1h" dumpdir="/var/crash" Will this do it or should I add anything else? Thanks in advance, -Matthew _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"