On Wed, 13 Dec 2006 19:48:34 +0100 Peter Zijlstra <[EMAIL PROTECTED]> wrote:
> --- linux-2.6-git.orig/mm/truncate.c 2006-12-13 19:41:09.000000000 +0100 > +++ linux-2.6-git/mm/truncate.c 2006-12-13 19:42:56.000000000 +0100 > @@ -306,19 +306,14 @@ invalidate_complete_page2(struct address > if (PagePrivate(page) && !try_to_release_page(page, GFP_KERNEL)) > return 0; > > + test_clear_page_dirty(page); > write_lock_irq(&mapping->tree_lock); > - if (PageDirty(page)) > - goto failed; > - > BUG_ON(PagePrivate(page)); > __remove_from_page_cache(page); > write_unlock_irq(&mapping->tree_lock); > ClearPageUptodate(page); > page_cache_release(page); /* pagecache ref */ > return 1; > -failed: > - write_unlock_irq(&mapping->tree_lock); > - return 0; > } > > /** > @@ -350,7 +345,6 @@ int invalidate_inode_pages2_range(struct > for (i = 0; !ret && i < pagevec_count(&pvec); i++) { > struct page *page = pvec.pages[i]; > pgoff_t page_index; > - int was_dirty; > > lock_page(page); > if (page->mapping != mapping) { > @@ -386,12 +380,8 @@ int invalidate_inode_pages2_range(struct > PAGE_CACHE_SIZE, 0); > } > } > - was_dirty = test_clear_page_dirty(page); > - if (!invalidate_complete_page2(mapping, page)) { > - if (was_dirty) > - set_page_dirty(page); > + if (!invalidate_complete_page2(mapping, page)) > ret = -EIO; > - } > unlock_page(page); > } > pagevec_release(&pvec); a) we're now calling try_to_release_page() with a potentially-dirty page, whereas it was previously clean. I wouldn't expect ->releasepage() implementations to go looking at PG_Dirty, because that's not what they're suppoed to be interested in. But they might do, dunno. b) If invalidate_complete_page2() failed due to, say, dirty buffer_heads then we now have a clean page with dirty buffers. That is an illegal state and the page will leak permanently. I _think_ that's what the was_dirty logic is in there for: to preserve the correct page-vs-buffers dirtiness coherency. But I'd need to do some 2.5.x changelog-dumpster-diving to be sure. Sigh. I don't know what invalidate_inode_pages2() is *supposed to do*. What are its semantics wrt unfreeable page metadata? Against page dirtiness? It was written for direct-io and had one set of semantics for that, then NFS came along and seemed to require a slightly different set of semantics, even though the earlier semantics weren't really defined, leading to a belief that the present semantics are "wrong", without a definition of what semantics NFS actually desires. So let's start again. Trond, please define precisely and completely and without reference to the existing implementation: what behaviour does NFS want? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/