On 11/10/2016 19:40, Andriy Gapon wrote:
On 10/11/2016 19:55, Henri Hennebert wrote:On 11/10/2016 18:33, Andriy Gapon wrote:On 10/11/2016 18:12, Henri Hennebert wrote:On 11/10/2016 16:54, Andriy Gapon wrote:On 10/11/2016 17:20, Henri Hennebert wrote:On 11/10/2016 15:00, Andriy Gapon wrote:Interesting. I can not spot any suspicious thread that would hold the vnode lock. Could you please run kgdb (just like that, no arguments), then execute 'bt' command and then select a frame when _vn_lock is called with 'fr N' command. Then please 'print *vp' and share the result.I Think I miss something in your request:Oh, sorry! The very first step should be 'tid 101112' to switch to the correct context.(kgdb) fr 7 #7 0xffffffff8063c5b3 in _vn_lock (vp=<value optimized out>, flags=2121728,"value optimized out" - not goodfile=<value optimized out>, line=<value optimized out>) at vnode_if.h:859 859 vnode_if.h: No such file or directory. in vnode_if.h (kgdb) print *vpI am not sure if this output is valid, because of the message above. Could you please try to navigate to nearby frames and see if vp itself has a valid value there. If you can find such a frame please do *vp there.Does this seems better?Yes!(kgdb) fr 8 #8 0xffffffff8062a5f7 in vget (vp=0xfffff80049c2c000, flags=2121728, td=0xfffff80009ba0500) at /usr/src/sys/kern/vfs_subr.c:2523 2523 if ((error = vn_lock(vp, flags)) != 0) { (kgdb) print *vp $1 = {v_tag = 0xffffffff813be535 "zfs", v_op = 0xffffffff813d0f70, v_data = 0xfffff80049c1f420, v_mount = 0xfffff800093aa660, v_nmntvnodes = {tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049c2bb30}, v_un = {vu_mount = 0x0, vu_socket = 0x0, vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 0x0}, v_cache_src = {lh_first = 0x0}, v_cache_dst = { tqh_first = 0xfffff800bfc8e3f0, tqh_last = 0xfffff800bfc8e410}, v_cache_dd = 0x0, v_lock = {lock_object = { lo_name = 0xffffffff813be535 "zfs", lo_flags = 117112832, lo_data = 0, lo_witness = 0x0}, lk_lock = 23, lk_exslpfail = 0, lk_timo = 51, lk_pri = 96}, v_interlock = {lock_object = {lo_name = 0xffffffff8099e9e0 "vnode interlock", lo_flags = 16973824, lo_data = 0, lo_witness = 0x0}, mtx_lock = 4}, v_vnlock = 0xfffff80049c2c068, v_actfreelist = { tqe_next = 0xfffff80049c2c938, tqe_prev = 0xfffff80049ae9bd0}, v_bufobj = {bo_lock = {lock_object = { lo_name = 0xffffffff8099e9f0 "bufobj interlock", lo_flags = 86179840, lo_data = 0, lo_witness = 0x0}, rw_lock = 1}, bo_ops = 0xffffffff80c4bf70, bo_object = 0xfffff800b62e9c60, bo_synclist = {le_next = 0x0, le_prev = 0x0}, bo_private = 0xfffff80049c2c000, __bo_vnode = 0xfffff80049c2c000, bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff80049c2c120}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfffff80049c2c140}, bv_root = {pt_root = 0}, bv_cnt = 0}, bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, v_label = 0x0, v_lockf = 0x0, v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfffff80049c2c188}, rl_currdep = 0x0}, v_cstart = 0, v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 9, v_usecount = 6, v_iflag = 512, v_vflag = 32, v_writecount = 0, v_hash = 4833984, v_type = VREG} (kgdb)flags=2121728 = 0x206000 = LK_SHARED | LK_VNHELD | LK_NODDLKTREAT lk_lock = 23 = 0x17 = LK_ONE_SHARER | LK_EXCLUSIVE_WAITERS | LK_SHARED_WAITERS | LK_SHARE So, here's what we have here: this thread tries to get a shared lock on the vnode, the vnode is already locked in shared mode, but there is an exclusive waiter (or, perhaps, multiple waiters). So, this thread can not get the lock because of the exclusive waiter. And I do not see an easy way to identify that waiter. In the procstat output that you provided earlier there was no other thread in vn_lock. Hmm, I see this: procstat: sysctl: kern.proc.kstack: 14789: Device busy procstat: sysctl: kern.proc.kstack: 82034: Device busy Could you please check what those two processes are (if they are still running)? Perhaps try procstat for each of the pids several times.
This 2 processes are the 2 instances of the innd daemon (news server) which seems in accordance with the directory /usr/local/news/bin.
[root@avoriaz ~]# procstat 14789 PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 14789 29165 29165 29165 0 1 root zfs FreeBSD ELF64 innd [root@avoriaz ~]# procstat 82034 PID PPID PGID SID TSID THR LOGIN WCHAN EMUL COMM 82034 29165 29165 29165 0 1 root zfs FreeBSD ELF64 innd [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMM FD T V FLAGS REF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMM FD T V FLAGS REF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMM FD T V FLAGS REF OFFSET PRO NAME [root@avoriaz ~]# procstat -f 14789 procstat: kinfo_getfile(): Device busy PID COMM FD T V FLAGS REF OFFSET PRO NAME [root@avoriaz ~]# procstat -kk 14789 PID TID COMM TDNAME KSTACK procstat: sysctl: kern.proc.kstack: 14789: Device busy [root@avoriaz ~]# procstat -kk 14789 PID TID COMM TDNAME KSTACK procstat: sysctl: kern.proc.kstack: 14789: Device busy [root@avoriaz ~]# procstat -kk 14789 PID TID COMM TDNAME KSTACK procstat: sysctl: kern.proc.kstack: 14789: Device busy [root@avoriaz ~]# procstat -kk 14789 PID TID COMM TDNAME KSTACK procstat: sysctl: kern.proc.kstack: 14789: Device busy [root@avoriaz ~]# Same result for 82034. By the way the news server seems OK: [root@avoriaz ~]# ctlinnd mode Server running Allowing remote connectionsParameters c 14 i 50 (0) l 5000000 o 1010 t 300 H 2 T 60 X 0 normal specified
Not reserved Readers follow enabled Perl filtering enabled [root@avoriaz ~]# I can access the news server with thunderbird. I can restart the news server if you ask Henri _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"