On Wed, 2006-12-13 at 17:41 -0800, Andrew Morton wrote: > > Trond, please define precisely and completely and without reference to > > the existing implementation: what behaviour does NFS want?
Part of the behaviour is dictated by our needs for invalidate_inode_pages2(), part of the behaviour is dictated by the 2-stage NFS writeout mechanism. Starting with invalidate_inode_pages2(). The goal here is to have a function that is guaranteed to force all currently cached pages to be read in from the server afresh. In other words, we'd like to throw out the old mapping of the file, and start with a new one. However, we also need to guarantee that any modifications that a user may have made to the file are preserved. Throwing out a dirty page is forbidden unless the data has been written to disk first. What we're doing in try_to_release_page() in the current implementation of invalidate_inode_pages2() is therefore to fix races: the goal is to ensure that we write back any dirty data before the page is removed. This is made necessary by the fact that the VM may at any time override our call to unmap_mapping_range() and allow the page to be re-dirtied by the user. Next: about the 2-stage NFS writeout mechanism. A client may want to allow the server to cache writeback data for a while, so that it can flush several WRITE RPC calls to disk at the same time, maximising I/O efficiency in terms of elevator algorithms etc. It signals this to the server by means of the 'unstable' flag on the WRITE RPC call, which allows the former to just write the data into its pagecache. Before the client can actually free up the page it still needs to know that the data it wrote to the server has been flushed from the pagecache onto disk, and this is done by sending the COMMIT RPC request. The latter acts rather like fdatasync() does on local data: it guarantees that all file data within a specified range has been flushed to disk. Failure of the COMMIT request signals to the client that the server may have rebooted, and so the page needs to be written out again. Our second use of try_to_release_page() is therefore to ensure that the server has indeed flushed that dirty data from its pagecache to its disk before we allow the VM to release the page on the client. We don't send a COMMIT request for each and every page that is to be released, but we do ensure that at least one COMMIT has been sent that covers the data range represented by the page to be freed. Cheers Trond - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/