On Sun, 24 Dec 2006 09:16:06 -0800 (PST) Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > > On Sun, 24 Dec 2006, Andrei Popa wrote: > > > On Sun, 2006-12-24 at 04:31 -0800, Andrew Morton wrote: > > > Andrei Popa <[EMAIL PROTECTED]> wrote: > > > > /dev/sda7 on / type ext3 (rw,noatime,nobh) > > > > > > > > I don't have corruption. I tested twice. > > > > > > This is a surprising result. Can you pleas retest ext3 > > > data=writeback,nobh? > > > > Yes, no corruption. Also tested only with data=writeback and had no > > corruption. > > Ok, so it would seem to be writeback related _somehow_. However, most of > the differences (I _thought_) in ext3 actually show up only if you have > *both* "nobh" and "data=writeback", and as far as I can tell, just a > simple "data=writeback" should still use the bog-standard > "block_write_full_page()". > > Andrew? > > Although as far as I can see, then ext2 should work as-is too (since it > too also just uses "block_write_full_page()" without anything fancy). ext2 uses the multipage-bio assembly code for writeback whereas ext3 doesn't. But ext3 doesn't use that code in data=ordered mode, of course. Still, this: --- a/fs/ext2/inode.c~a +++ a/fs/ext2/inode.c @@ -693,7 +693,7 @@ const struct address_space_operations ex .commit_write = generic_commit_write, .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, - .writepages = ext2_writepages, +// .writepages = ext2_writepages, .migratepage = buffer_migrate_page, }; @@ -711,7 +711,7 @@ const struct address_space_operations ex .commit_write = nobh_commit_write, .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, - .writepages = ext2_writepages, +// .writepages = ext2_writepages, .migratepage = buffer_migrate_page, }; _ will switch it off for ext2. > Strange. > > How about this particularly stupid diff? (please test with something that > _would_ cause corruption normally). > > It is _entirely_ untested, but what it tries to do is to simply serialize > any writeback in progress with any process that tries to re-map a shared > page into its address space and dirty it. I haven't tested it, and maybe > it misses some case, but it looks likea good way to try to avoid races > with marking pages dirty and the writeback phase .. > > Linus > --- > diff --git a/mm/memory.c b/mm/memory.c > index 563792f..64ed10b 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1544,6 +1544,7 @@ static int do_wp_page(struct mm_struct *mm, struct > vm_area_struct *vma, > if (!pte_same(*page_table, orig_pte)) > goto unlock; > } > + wait_on_page_writeback(old_page); > dirty_page = old_page; > get_page(dirty_page); > reuse = 1; > @@ -2215,6 +2216,7 @@ retry: > page_cache_release(new_page); > return VM_FAULT_SIGBUS; > } > + wait_on_page_writeback(new_page); > } > } yup. Also, we could perhaps lock the target page during pagefaults.. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/