Re: Freebsd 11.0 RELEASE - ZFS deadlock

Henri Hennebert Fri, 18 Nov 2016 05:17:09 -0800


On 11/18/2016 13:30, Andriy Gapon wrote:

On 14/11/2016 14:00, Henri Hennebert wrote:

On 11/14/2016 12:45, Andriy Gapon wrote:

Okay.  Luckily for us, it seems that 'm' is available in frame 5.  It also
happens to be the first field of 'struct faultstate'.  So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?

(kgdb) fr 4
#4  0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, wmesg=<value
optimized out>) at /usr/src/sys/vm/vm_page.c:753
753        msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
(kgdb) print *m
$1 = {plinks = {q = {tqe_next = 0xfffff800dc5d85b0, tqe_prev =
0xfffff800debf3bd0}, s = {ss = {sle_next = 0xfffff800dc5d85b0},
      pv = 0xfffff800debf3bd0}, memguard = {p = 18446735281313646000, v =
18446735281353604048}}, listq = {tqe_next = 0x0,
    tqe_prev = 0xfffff800dc5d85c0}, object = 0xfffff800b62e9c60, pindex = 11,
phys_addr = 3389358080, md = {pv_list = {
      tqh_first = 0x0, tqh_last = 0xfffff800df68cd78}, pv_gen = 426, pat_mode =
6}, wire_count = 0, busy_lock = 6, hold_count = 0,
  flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0',
segind = 3 '\003', order = 13 '\r',
  pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}


If I interpret this correctly the page is in the 'exclusive busy' state.
Unfortunately, I can't tell much beyond that.
But I am confident that this is the root cause of the lock-up.

(kgdb) print *(struct faultstate *)m
$2 = {m = 0xfffff800dc5d85b0, object = 0xfffff800debf3bd0, pindex = 0, first_m =
0xfffff800dc5d85c0,
  first_object = 0xfffff800b62e9c60, first_pindex = 11, map = 0xca058000, entry
= 0x0, lookup_still_valid = -546779784,
  vp = 0x6000001aa}
(kgdb)


I was wrong on this one as 'm' is actually a pointer, so the above is not
correct.  Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'.


(kgdb) fr 5

#5 0xffffffff8089dd4d in vm_page_sleep_if_busy (m=0xfffff800df68cd40,msg=0xffffffff809c51bc "vmpfw")

    at /usr/src/sys/vm/vm_page.c:1086
1086                    vm_page_busy_sleep(m, msg);
(kgdb) info reg
rax            0x0      0
rbx            0xfffff800b62e9c78       -8793036514184
rcx            0x0      0
rdx            0x0      0
rsi            0x0      0
rdi            0x0      0
rbp            0xfffffe0101836810       0xfffffe0101836810
rsp            0xfffffe01018367e0       0xfffffe01018367e0
r8             0x0      0
r9             0x0      0
r10            0x0      0
r11            0x0      0
r12            0xfffff800b642aa00       -8793035200000
r13            0xfffff800df68cd40       -8792344834752
r14            0xfffff800b62e9c60       -8793036514208
r15            0xffffffff809c51bc       -2137239108

rip 0xffffffff8089dd4d 0xffffffff8089dd4d<vm_page_sleep_if_busy+285>

eflags         0x0      0
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

I don't know what to do from here.


I am not sure how to proceed from here.
The only thing I can think of is a lock order reversal between the vnode lock
and the page busying quasi-lock.  But examining the code I can not spot it.
Another possibility is a leak of a busy page, but that's hard to debug.

How hard is it to reproduce the problem?


After 7 days all seems normal only one copy of innd:

[root@avoriaz ~]# ps xa|grep inn
 1193  -  Is       0:01.40 /usr/local/news/bin/innd -r
13498  -  IN       0:00.01 /usr/local/news/bin/innfeed
 1194 v0- IW       0:00.00 /bin/sh /usr/local/news/bin/innwatch -i 60

I will try to stop and restart innd.

All continue to look good:

[root@avoriaz ~]# ps xa|grep inn
31673  -  Ss       0:00.02 /usr/local/news/bin/innd
31694  -  SN       0:00.01 /usr/local/news/bin/innfeed
31674  0  S        0:00.01 /bin/sh /usr/local/news/bin/innwatch -i 60


I think to reproduce is just waiting it occurs by itself...

One thing here: The deadlock occurs at least 5 times since 10.0R. Andalways with the directory /usr/local/news/bin


Maybe Konstantin would have some ideas or suggestions.


Henri
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Freebsd 11.0 RELEASE - ZFS deadlock

Reply via email to