In message <[EMAIL PROTECTED]>, Vadim Belman writes:
> wmesg=0xc0233171 "vmopar", timo=0) at ../../kern/kern_synch.c:467
...
>#8 0xc01dd606 in vm_fault (map=0xdc3e7e80, vaddr=712876032,
> fault_type=1 '\001', fault_flags=0) at ../../vm/vm_pager.h:130
If anyone is interested, here are a few further details from my
mailbox. The patch David included appears to have solved this
particular problem for us, but there is another similar problem
lurking within the NFS/VM system.
Ian
--------------------------------------------
The problem seems to originate with NFS's postop_attr information
that is returned with a read or write RPC. Within a vm_fault context,
the code cannot deal with vnode_pager_setsize() shrinking a vnode.
The workaround in the patch below stops the nfsm_postop_attr() macro
from ever shrinking a vnode. If the new size in the postop_attr
information is smaller, then it just sets the nfsnode n_attrstamp to 0
to stop the wrong size getting used in the future. This change only
affects postop_attr attributes; the nfsm_loadattr() macro works as
normal.
The change is implemented by adding a new argument to nfs_loadattrcache()
called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never
reduce the vnode/nfsnode size; instead it zeros n_attrstamp.
-----------------------------------------------
Hmm. We used this patch for a while - it stopped those particular vmopar
hangs, but another kind of deadlock has emerged (which happens with or
without the patch).
It seems that vinvalbuf() locks the vnode's v_interlock before calling
vm_object_page_remove(). vm_object_page_remove will then lock a page i.e.
vinvalbuf() [Lock v_interlock] ->
vm_object_page_remove() [Lock page]
If another process concurrently vm_fault's on the same vnode then it
locks the page, and finishes with a vput(vp). vput() locks the
interlock, so it results in:
vm_fault() [Lock page] ->
vput() [Lock v_interlock]
This is a simple lock-ordering deadlock. Since vm_fault can keep the
page locked for a considerable amount of time with NFS, this deadlock
can happen quite easily. I'm not sure what to suggest as a solution,
but keeping the v_interlock locked across a tsleep seems wrong... Any
ideas? Traces below.
#12 0xc02140f0 in atkbd_isa_intr (unit=0) at ../../i386/isa/atkbd_isa.c:84
#13 0xc020eceb in wait ()
#14 0xc01e22d3 in _unlock_things (fs=0xca6f0ef0, dealloc=0)
at ../../vm/vm_fault.c:148
#15 0xc01e2b73 in vm_fault (map=0xca6d2ac0, vaddr=134766592,
fault_type=1 '\001', fault_flags=0) at ../../vm/vm_fault.c:745
#16 0xc0210252 in trap_pfault (frame=0xca6f0fbc, usermode=1, eva=134769544)
at ../../i386/i386/trap.c:816
#17 0xc020fda2 in trap (frame={tf_es = 39, tf_ds = 39, tf_edi = -1077946880,
tf_esi = 1, tf_ebp = -1077947052, tf_isp = -898691100,
tf_ebx = -1077946872, tf_edx = 4, tf_ecx = -1077947772, tf_eax = 2,
tf_trapno = 12, tf_err = 4, tf_eip = 134769544, tf_cs = 31,
tf_eflags = 66050, tf_esp = -1077947172, tf_ss = 39})
at ../../i386/i386/trap.c:358
#18 0x8086b88 in ?? ()
(kgdb) proc 1042
(kgdb) bt
#0 mi_switch () at ../../kern/kern_synch.c:825
#1 0xc0150b4d in tsleep (ident=0xc0598534, priority=4,
wmesg=0xc024d22a "vmopar", timo=0) at ../../kern/kern_synch.c:443
#2 0xc01eaec6 in vm_page_sleep (m=0xc0598534, msg=0xc024d22a "vmopar",
busy=0xc0598563 "") at ../../vm/vm_page.c:1052
#3 0xc01e9aff in vm_object_page_remove (object=0xca6bac1c, start=0, end=0,
clean_only=1) at ../../vm/vm_object.c:1335
#4 0xc0172a6a in vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80,
p=0xca6e5a40, slpflag=256, slptimeo=0) at ../../kern/vfs_subr.c:671
#5 0xc019541c in nfs_vinvalbuf (vp=0xca6bf700, flags=1, cred=0xc171ec80,
p=0xca6e5a40, intrflg=1) at ../../nfs/nfs_bio.c:978
#6 0xc01b6859 in nfs_open (ap=0xca6f3e2c) at ../../nfs/nfs_vnops.c:490
#7 0xc01796ae in vn_open (ndp=0xca6f3f00, fmode=1, cmode=1512)
at vnode_if.h:163
#8 0xc01760d9 in open (p=0xca6e5a40, uap=0xca6f3f94)
at ../../kern/vfs_syscalls.c:935
#9 0xc02108bf in syscall (frame={tf_es = 39, tf_ds = 39, tf_edi = 134725618,
tf_esi = -1077946896, tf_ebp = -1077946944, tf_isp = -898678812,
tf_ebx = -1077946956, tf_edx = -1077946588, tf_ecx = 134893176,
tf_eax = 5, tf_trapno = 12, tf_err = 2, tf_eip = 672042756, tf_cs = 31,
tf_eflags = 514, tf_esp = -1077949296, tf_ss = 39})
at ../../i386/i386/trap.c:1100
#10 0xc01ff11c in Xint0x80_syscall ()
#11 0x8049d39 in ?? ()
-------------------------------------
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message