Re: problem for the VM gurus

Matthew Dillon Sun, 13 Jun 1999 23:59:40 -0700 (PDT) (envelope-from dillon@apollo.backplane.com)

:>      VM lookup the page again.  Always.  vm_fault already does this, 
:>      in fact.   We would clean up the code and document it to this effect.
:>
:>      This change would allow us to immediately fix the self-referential
:>      deadlocks and I think it would also allow me to fix a similar bug
:>      in NFS trivially.
:
:   I should point out here that the process of looking up the pages is a
:significant amount of the overhead of the routines involved. Although
:doing this for just one page is probably sufficiently in the noise as to
:not be a concern.


    It would be for only one page and, besides, it *already* relooksup
    the page in vm_fault ( to see if the page was ripped out from under the 
    caller ), so the overhead on the change would be very near zero.

:>      The easiest interim solution is to break write atomicy.  That is,
:>      unlock the vnode if the backing store of the uio being written is
:>      (A) vnode-pager-backed and (B) not all in-core. 
:
:   Uh, I don't think you can safely do that. I thought one of the reasons
:for locking a vnode for writes is so that the file metadata doesn't change
:underneath you while the write is in progress, but perhaps I'm wrong about
:that.
:
:-DG
:
:David Greenman

    The problem can be distilled into the fact that we currently hold an 
    exclusive lock *through* a uiomove that might possibly incur read I/O
    due to pages not being entirely in core.   The problem does *not* occur
    when we are blocked on meta-data I/O ( such as a BMAP operation ) since
    meta-data cannot be mmaped.   Under current circumstances we already
    lose read atomicy on the source during the write(), but do not lose
    write() atomicy.

    The simple solution is to give up or downgrade the lock on the 
    destination when blocked within the uiomove.  We can pre-fault
    the first two pages of the uio to guarentee a minimum write atomicy
    I/O size.  I suppose this could be extended to pre-faulting the
    first N pages of the uio, where N is chosen to be reasonably large - like
    64K, but we could not guarentee arbitrary write atomicy because the user
    might decide to write a very large mmap'd buffer ( e.g. megabytes or
    gigabytes ) and obviously wiring that many pages just won't work.

    The more complex solution is to implement a separate range lock for
    I/O that is independant of the vnode lock.  This solution would also
    require deadlock detection and restart handling.  Atomicy would be 
    maintained from the point of view of the processes running on the machine
    but not from the point of view of the physical storage.  Since write
    atomicy is already not maintained from the point of view of the physical
    storage I don't think this would present a problem.  Due to the
    complexity, however, it could not be used as an interim solution.  It
    would have to be a permanent solution for the programming time to be
    worth it.  Doing range-based deadlock detection and restart handling
    properly is not trivial.  It is something that only databases usually 
    need to do.

                                        -Matt
                                        Matthew Dillon 
                                        <dil...@backplane.com>



To Unsubscribe: send mail to majord...@freebsd.org
with "unsubscribe freebsd-hackers" in the body of the message

Re: problem for the VM gurus

Reply via email to