Hello,

Robert Watson wrote:

On Thu, 3 Nov 2005, Attila Nagy wrote:

Hello,

I have an "easily" (at least to me) reproduceable freeze with both 6-STABLE and 7-CURRENT on an amd64 SMP machine.

What I do is simply copy a lot of directories, files and symlinks with different uids from another machine to this one, using rsync.


This is probably a vnode lock leak or deadlock. As it looks like you have both the ability to get into the debugger and also a serial or other remote console, the output of:

  show allpcpu
  alltrace
  show lockedvnods
  show allocks

would be helpful. There are also "options DEBUG_LOCKS" which extends the debugging information available via "show lockedvnods" with information about where the lock was acquired, which can be quite helpful.

Robert N M Watson


With quotas enabled it freezes constantly, without quotas it doesn't.

I have MP_WATCHDOG enabled (the kernel config is the default GENERIC+SMP with debuggers and MP_WATCHDOG only) so after the freeze it drops me into the debugger in which I can see the following:

FreeBSD/amd64 (helen) (ttyv0)

login: Watchdog timer: 3
Watchdog timer: 2
Watchdog timer: 1
Watchdog timer: 0
Watchdog firing!
NMI ... going to debugger
[thread pid 157 tid 100143 ]
Stopped at      vinactive+0x83: jz      0x365245a9
db> bt
Tracing pid 157 tid 100143 td 0xffffff007b9c14c0
vinactive() at vinactive+0x83
vput() at vput+0x1cf
qsync() at qsync+0x260
ffs_sync() at ffs_sync+0x159
sync_fsync() at sync_fsync+0x191
VOP_FSYNC_APV() at VOP_FSYNC_APV+0x3d
sched_sync() at sched_sync+0x428
fork_exit() at fork_exit+0x86
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb2029d00, rbp = 0 ---
db> ps
 157 ffffff007b9489c0    0     0     0 0000204 [CPU 0] syncer
This is an HP machine, so I can access it via it's ILO port. But it mangles larger outputs, so alltrace will be truncated.

I will find a way to get a serial console attached to it, but until that...

I've recompiled the kernel with "options DEBUG_LOCKS", rebooted and started the rsync process.

This is what I see now:
login: Watchdog timer: 3
Watchdog timer: 2
Watchdog timer: 1
Watchdog timer: 0
Watchdog firing!
NMI ... going to debugger
[thread pid 14 tid 100010 ]
db> bt
Tracing pid 14 tid 100010 td 0xffffff007b94ebe0
acpi_cpu_c1() at acpi_cpu_c1+0x6
acpi_cpu_idle() at acpi_cpu_idle+0x174
idle_proc() at idle_proc+0x14e
fork_exit() at fork_exit+0x86
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb1e81d00, rbp = 0 ---
db> ps
[...]
   14 ffffff007b97b680    0     0     0 000020c [CPU 0] idle: cpu0
[...]
db> show allpcpu
cpuid        = 0
curthread    = 0xffffff007b94ebe0: pid 14 "idle: cpu0"
curpcb       = 0xffffffffb1e81d10
fpcurthread  = none
idlethread   = 0xffffff007b94ebe0: pid 14 "idle: cpu0"

cpuid        = 1
curthread    = 0xffffff007b94e000: pid 13 "idle: cpu1"
curpcb       = 0xffffffffb1e68d10
fpcurthread  = none
idlethread   = 0xffffff007b94e000: pid 13 "idle: cpu1"

cpuid        = 2
curthread    = 0xffffff007b9014c0: pid 157 "syncer"
curpcb       = 0xffffffffb20e8d10
fpcurthread  = none
idlethread   = 0xffffff007b94dbe0: pid 12 "idle: cpu2"

cpuid        = 3
curthread    = 0xffffff007b94d980: pid 11 "mp_watchdog cpu 3"
curpcb       = 0xffffffffb1e5ed10
fpcurthread  = none
idlethread   = 0xffffff007b94d980: pid 11 "mp_watchdog cpu 3"

db> show lockedvnods
Locked vnodes

0xffffff0061c05470: tag syncer, type VNON
    usecount 1, writecount 0, refcount 2 mountedhere 0
    flags ()
lock type syncer: EXCL (count 1) by thread 0xffffff007b9014c0 (pid 157)

0xffffff00418086a8: tag ufs, type VDIR
    usecount 1, writecount 0, refcount 90 mountedhere 0
    flags ()
     lock type ufs: EXCL (count 1) by thread 0xffffff005d435980 (pid 1180)
        ino 18983427, on dev da1s1d

0xffffff0015a1b6a8: tag ufs, type VLNK
    usecount 0, writecount 0, refcount 1 mountedhere 0
    flags (VI_DOOMED)
     lock type ufs: EXCL (count 1) by thread 0xffffff005d435980 (pid 1180)
        ino 19058341, on dev da1s1d

WARNING this is not complete!
db> alltrace
--- syscall (93, FreeBSD ELF64, select), rip = 0x800d4c14c, rsp = 0x7fffffffc2d8
, rbp = 0x1 ---

Tracing command rsync pid 1180 tid 100267 td 0xffffff005d435980
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
turnstile_wait() at turnstile_wait+0x272
_mtx_lock_sleep() at _mtx_lock_sleep+0xa6
msleep() at msleep+0x2a6
acquire() at acquire+0x76
debuglockmgr() at debuglockmgr+0x500
vop_stdlock() at vop_stdlock+0x3e
VOP_LOCK_APV() at VOP_LOCK_APV+0x49
VOP_LOCK_APV() at VOP_LOCK_APV+0x49
debug_vn_lock() at debug_vn_lock+0x8f
dqsync() at dqsync+0x95
dqrele() at dqrele+0x68
ufs_reclaim() at ufs_reclaim+0x100
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x3d
vgonel() at vgonel+0x12f
vnlru_free() at vnlru_free+0x29f
getnewvnode() at getnewvnode+0x2ca
ffs_vget() at ffs_vget+0xe9
ffs_valloc() at ffs_valloc+0x109
ufs_makeinode() at ufs_makeinode+0x6e
ufs_symlink() at ufs_symlink+0x3f
VOP_SYMLINK_APV() at VOP_SYMLINK_APV+0x65
kern_symlink() at kern_symlink+0x3d3
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (57, FreeBSD ELF64, symlink), rip = 0x8006b93ec, rsp = 0x7fffffffb7d
8, rbp = 0xa64388 ---
Tracing command sh pid 1178 tid 100202 td 0xffffff00601f8be0
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
sleepq_wait_sig() at sleepq_wait_sig+0xe
cv_wait_sig() at cv_wait_sig+0x17f
kern_select() at kern_select+0xbd4
select() at select+0x3e
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (93, FreeBSD ELF64, select), rip = 0x4165ac, rsp = 0x7fffffffe948, r
bp = 0x7fffffffe970 ---

Tracing command adjkerntz pid 271 tid 100195 td 0xffffff0062431000
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
sleepq_wait_sig() at sleepq_wait_sig+0xe
msleep() at msleep+0x21b
kern_sigsuspend() at kern_sigsuspend+0xb1
sigsuspend() at sigsuspend+0x40
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x80067a18c, rsp = 0x7ffffff
fed28, rbp = 0xfffffffffffff1f0 ---
(null)() at 0x80067a18c
*** error reading from address fffffffffffff1f8 ***

I will try to provide a full alltrace output tomorrow.

Thanks,
--
Attila Nagy                                   e-mail: [EMAIL PROTECTED]
Adopt a directory on our free software         phone: +3630 306 6758
server! http://www.fsn.hu/?f=brick
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to