On 11/18/2016 13:30, Andriy Gapon wrote:
On 14/11/2016 14:00, Henri Hennebert wrote:
On 11/14/2016 12:45, Andriy Gapon wrote:
Okay. Luckily for us, it seems that 'm' is available in frame 5. It also
happens to be the first field of 'struct faultstate'. So, could you please go
to frame and print '*m' and '*(struct faultstate *)m' ?
(kgdb) fr 4
#4 0xffffffff8089d1c1 in vm_page_busy_sleep (m=0xfffff800df68cd40, wmesg=<value
optimized out>) at /usr/src/sys/vm/vm_page.c:753
753 msleep(m, vm_page_lockptr(m), PVM | PDROP, wmesg, 0);
(kgdb) print *m
$1 = {plinks = {q = {tqe_next = 0xfffff800dc5d85b0, tqe_prev =
0xfffff800debf3bd0}, s = {ss = {sle_next = 0xfffff800dc5d85b0},
pv = 0xfffff800debf3bd0}, memguard = {p = 18446735281313646000, v =
18446735281353604048}}, listq = {tqe_next = 0x0,
tqe_prev = 0xfffff800dc5d85c0}, object = 0xfffff800b62e9c60, pindex = 11,
phys_addr = 3389358080, md = {pv_list = {
tqh_first = 0x0, tqh_last = 0xfffff800df68cd78}, pv_gen = 426, pat_mode =
6}, wire_count = 0, busy_lock = 6, hold_count = 0,
flags = 0, aflags = 2 '\002', oflags = 0 '\0', queue = 0 '\0', psind = 0 '\0',
segind = 3 '\003', order = 13 '\r',
pool = 0 '\0', act_count = 0 '\0', valid = 0 '\0', dirty = 0 '\0'}
If I interpret this correctly the page is in the 'exclusive busy' state.
Unfortunately, I can't tell much beyond that.
But I am confident that this is the root cause of the lock-up.
(kgdb) print *(struct faultstate *)m
$2 = {m = 0xfffff800dc5d85b0, object = 0xfffff800debf3bd0, pindex = 0, first_m =
0xfffff800dc5d85c0,
first_object = 0xfffff800b62e9c60, first_pindex = 11, map = 0xca058000, entry
= 0x0, lookup_still_valid = -546779784,
vp = 0x6000001aa}
(kgdb)
I was wrong on this one as 'm' is actually a pointer, so the above is not
correct. Maybe 'info reg' in frame 5 would give a clue about the value of 'fs'.
(kgdb) fr 5
#5 0xffffffff8089dd4d in vm_page_sleep_if_busy (m=0xfffff800df68cd40,
msg=0xffffffff809c51bc "vmpfw")
at /usr/src/sys/vm/vm_page.c:1086
1086 vm_page_busy_sleep(m, msg);
(kgdb) info reg
rax 0x0 0
rbx 0xfffff800b62e9c78 -8793036514184
rcx 0x0 0
rdx 0x0 0
rsi 0x0 0
rdi 0x0 0
rbp 0xfffffe0101836810 0xfffffe0101836810
rsp 0xfffffe01018367e0 0xfffffe01018367e0
r8 0x0 0
r9 0x0 0
r10 0x0 0
r11 0x0 0
r12 0xfffff800b642aa00 -8793035200000
r13 0xfffff800df68cd40 -8792344834752
r14 0xfffff800b62e9c60 -8793036514208
r15 0xffffffff809c51bc -2137239108
rip 0xffffffff8089dd4d 0xffffffff8089dd4d
<vm_page_sleep_if_busy+285>
eflags 0x0 0
cs 0x0 0
ss 0x0 0
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
I don't know what to do from here.
I am not sure how to proceed from here.
The only thing I can think of is a lock order reversal between the vnode lock
and the page busying quasi-lock. But examining the code I can not spot it.
Another possibility is a leak of a busy page, but that's hard to debug.
How hard is it to reproduce the problem?
After 7 days all seems normal only one copy of innd:
[root@avoriaz ~]# ps xa|grep inn
1193 - Is 0:01.40 /usr/local/news/bin/innd -r
13498 - IN 0:00.01 /usr/local/news/bin/innfeed
1194 v0- IW 0:00.00 /bin/sh /usr/local/news/bin/innwatch -i 60
I will try to stop and restart innd.
All continue to look good:
[root@avoriaz ~]# ps xa|grep inn
31673 - Ss 0:00.02 /usr/local/news/bin/innd
31694 - SN 0:00.01 /usr/local/news/bin/innfeed
31674 0 S 0:00.01 /bin/sh /usr/local/news/bin/innwatch -i 60
I think to reproduce is just waiting it occurs by itself...
One thing here: The deadlock occurs at least 5 times since 10.0R. And
always with the directory /usr/local/news/bin
Maybe Konstantin would have some ideas or suggestions.
Henri
_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"