:> VM lookup the page again. Always. vm_fault already does this, :> in fact. We would clean up the code and document it to this effect. :> :> This change would allow us to immediately fix the self-referential :> deadlocks and I think it would also allow me to fix a similar bug :> in NFS trivially. : : I should point out here that the process of looking up the pages is a :significant amount of the overhead of the routines involved. Although :doing this for just one page is probably sufficiently in the noise as to :not be a concern.
It would be for only one page and, besides, it *already* relooksup the page in vm_fault ( to see if the page was ripped out from under the caller ), so the overhead on the change would be very near zero. :> The easiest interim solution is to break write atomicy. That is, :> unlock the vnode if the backing store of the uio being written is :> (A) vnode-pager-backed and (B) not all in-core. : : Uh, I don't think you can safely do that. I thought one of the reasons :for locking a vnode for writes is so that the file metadata doesn't change :underneath you while the write is in progress, but perhaps I'm wrong about :that. : :-DG : :David Greenman The problem can be distilled into the fact that we currently hold an exclusive lock *through* a uiomove that might possibly incur read I/O due to pages not being entirely in core. The problem does *not* occur when we are blocked on meta-data I/O ( such as a BMAP operation ) since meta-data cannot be mmaped. Under current circumstances we already lose read atomicy on the source during the write(), but do not lose write() atomicy. The simple solution is to give up or downgrade the lock on the destination when blocked within the uiomove. We can pre-fault the first two pages of the uio to guarentee a minimum write atomicy I/O size. I suppose this could be extended to pre-faulting the first N pages of the uio, where N is chosen to be reasonably large - like 64K, but we could not guarentee arbitrary write atomicy because the user might decide to write a very large mmap'd buffer ( e.g. megabytes or gigabytes ) and obviously wiring that many pages just won't work. The more complex solution is to implement a separate range lock for I/O that is independant of the vnode lock. This solution would also require deadlock detection and restart handling. Atomicy would be maintained from the point of view of the processes running on the machine but not from the point of view of the physical storage. Since write atomicy is already not maintained from the point of view of the physical storage I don't think this would present a problem. Due to the complexity, however, it could not be used as an interim solution. It would have to be a permanent solution for the programming time to be worth it. Doing range-based deadlock detection and restart handling properly is not trivial. It is something that only databases usually need to do. -Matt Matthew Dillon <dil...@backplane.com> To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-hackers" in the body of the message