On Fri, 2006-12-29 at 02:48 -0800, Linus Torvalds wrote: > > On Fri, 29 Dec 2006, Linus Torvalds wrote: > > > > Hmm? I'd love it if somebody else wrote the patch and tested it, because > > I'm getting sick and tired of this bug ;) > > Who the hell am I kidding? I haven't been able to sleep right for the last > few days over this bug. It was really getting to me. > > And putting on the thinking cap, there's actually a fairly simple an > nonintrusive patch. It still has a tiny tiny race (see the comment), but I > bet nobody can really hit it in real life anyway, and I know several ways > to fix it, so I'm not really _that_ worried about it. > > The patch is mostly a comment. The "real" meat of it is actually just a > few lines. > > Can anybody get corruption with this thing applied? It goes on top of > plain v2.6.20-rc2.
Tested with rtorrent and there is no corruption. > > Linus > > ---- > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index b3a198c..ec01da1 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -862,17 +862,46 @@ int clear_page_dirty_for_io(struct page *page) > { > struct address_space *mapping = page_mapping(page); > > - if (!mapping) > - return TestClearPageDirty(page); > - > - if (TestClearPageDirty(page)) { > - if (mapping_cap_account_dirty(mapping)) { > - page_mkclean(page); > + if (mapping && mapping_cap_account_dirty(mapping)) { > + /* > + * Yes, Virginia, this is indeed insane. > + * > + * We use this sequence to make sure that > + * (a) we account for dirty stats properly > + * (b) we tell the low-level filesystem to > + * mark the whole page dirty if it was > + * dirty in a pagetable. Only to then > + * (c) clean the page again and return 1 to > + * cause the writeback. > + * > + * This way we avoid all nasty races with the > + * dirty bit in multiple places and clearing > + * them concurrently from different threads. > + * > + * Note! Normally the "set_page_dirty(page)" > + * has no effect on the actual dirty bit - since > + * that will already usually be set. But we > + * need the side effects, and it can help us > + * avoid races. > + * > + * We basically use the page "master dirty bit" > + * as a serialization point for all the different > + * threds doing their things. > + * > + * FIXME! We still have a race here: if somebody > + * adds the page back to the page tables in > + * between the "page_mkclean()" and the "TestClearPageDirty()", > + * we might have it mapped without the dirty bit set. > + */ > + if (page_mkclean(page)) > + set_page_dirty(page); > + if (TestClearPageDirty(page)) { > dec_zone_page_state(page, NR_FILE_DIRTY); > + return 1; > } > - return 1; > + return 0; > } > - return 0; > + return TestClearPageDirty(page); > } > EXPORT_SYMBOL(clear_page_dirty_for_io); > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/