Interesting.  It's an overlapping same-process deadlock with mmap/write.
    This bug also hits NFS, though in a slightly different way, and also
    occurs with mmap/write when two processes are mmap'ing two files and
    write()ing the other descriptor using the map as a buffer.

    I see a three-stage solution:

    * We change the API for the VM pager *getpages() code.

        At the moment the caller busies all pages being passed to getpages()
        and expects the primary page (but not any of the others) to be 
        returned busied.  I also believe that some of the code assumes that
        the page will not be unbusied at all for the duration of the
        operation ( though vm_fault was hacked to handle the situation where
        it might have been ). 

        This API is screwing up NFS and would also make it very difficult for
        general VFS deadlock avoidance to be implemented properly and for
        a fix to the specific case being discussed in this thread to be 
        implemented properly.

        I recommend changing the API such that *ALL* passed pages are 
        unbusied prior to return.  The caller of getpages() must then 
        VM lookup the page again.  Always.  vm_fault already does this, 
        in fact.   We would clean up the code and document it to this effect.

        This change would allow us to immediately fix the self-referential
        deadlocks and I think it would also allow me to fix a similar bug
        in NFS trivially.

    * We hack a fix to deal with the mmap/write case.

        A permanent vnode locking fix is many months away because core
        decided to ask Kirk to fix it, which was news to me at the time.
        However, I agree with the idea of having Kirk fix VNode locking.

        But since this sort of permanent fix is months away, we really need
        an interim solution to the mmap/write deadlock case.

        The easiest interim solution is to break write atomicy.  That is,
        unlock the vnode if the backing store of the uio being written is
        (A) vnode-pager-backed and (B) not all in-core. 

        This will generally fix all known deadlock situations but at the
        cost of write atomicy in certain cases.  We can use the same hack
        that pipe code uses and only guarentee write atomicy for small 
        block sizes.  We would do this by wiring ( and faulting, if 
        necessary ) the first N pages of the uio prior to locking the vnode.

        We cannot wire all the pages of the uio since the user may specify
        a very large buffer - megabytes or gigabytes.

    * Stage 3:  Permanent fix is committed by generally fixing vnode locks
      and VFS layering.

        ... which may be 6 months if Kirk agrees to do a complete rewrite
        of the vnode locking algorithms.

                                        -Matt
                                        Matthew Dillon 
                                        <dil...@backplane.com>



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Reply via email to