This is a slightly different take on the fix for the deadlock in fuse with dirty balancing. David Chinner convinced me, that per-bdi counters are too expensive, and that it's not worth trying to account the number of pages under writeback, as they will be limited by the queue anyway.
---- From: Miklos Szeredi <[EMAIL PROTECTED]> Current behavior of balance_dirty_pages() is to try to start writeout into the specified queue for at least "write_chunk" number of pages. If "write_chunk" pages have been submitted, then return. However if there are less than "write_chunk" dirty pages for this queue, then it doesn't return, waiting for the global dirty+writeback counters to subside, but without doing any actual work. This is illogical behavior: it allows more dirtyings while there are dirty pages, but stops further dirtying completely if there are no more dirty pages. It also makes a deadlock possible when one filesystem is writing data through another, and the balance_dirty_pages() for the lower filesystem is stalling the writeback for the upper filesystem's data (*). So the exit condition should instead be: submitted at least "write_chunk" number of pages OR submitted ALL the dirty pages destined for this backing dev AND backing dev is not congested To do this, introduce a new counter in writeback_control, which counts the number of dirty pages encountered during writeback. This includes all dirty pages, even those which are already under writeback but have been dirtied again, and those which have been skipped due to having locked buffers. If this counter is zero after trying to submit some pages for writeback, and the backing dev is uncongested, then don't wait any more. After this, newly dirtied pages can quickly be written back to this backing dev. If there are globally no more pages to submit for writeback (nr_reclaimable == 0), then also don't wait for ever, only while this backing dev is congested. (*) For more info on this deadlock, see the following discussions: http://lkml.org/lkml/2007/3/1/9 http://lkml.org/lkml/2007/3/12/16 Signed-off-by: Miklos Szeredi <[EMAIL PROTECTED]> --- Index: linux/include/linux/writeback.h =================================================================== --- linux.orig/include/linux/writeback.h 2007-03-24 22:06:56.000000000 +0100 +++ linux/include/linux/writeback.h 2007-03-24 22:29:02.000000000 +0100 @@ -44,6 +44,7 @@ struct writeback_control { long nr_to_write; /* Write this many pages, and decrement this for each page written */ long pages_skipped; /* Pages which were not written */ + long nr_dirty; /* Number of dirty pages encountered */ /* * For a_ops->writepages(): is start or end are non-zero then this is Index: linux/mm/page-writeback.c =================================================================== --- linux.orig/mm/page-writeback.c 2007-03-24 22:06:56.000000000 +0100 +++ linux/mm/page-writeback.c 2007-03-24 22:29:02.000000000 +0100 @@ -207,7 +207,15 @@ static void balance_dirty_pages(struct a * written to the server's write cache, but has not yet * been flushed to permanent storage. */ - if (nr_reclaimable) { + if (!nr_reclaimable) { + /* + * If there's nothing more to write back and this queue + * is uncongested, then it is possible to quickly + * write out some more data, so let's not wait + */ + if (!bdi_write_congested(bdi)) + break; + } else { writeback_inodes(&wbc); get_dirty_limits(&background_thresh, &dirty_thresh, mapping); @@ -220,6 +228,14 @@ static void balance_dirty_pages(struct a pages_written += write_chunk - wbc.nr_to_write; if (pages_written >= write_chunk) break; /* We've done our duty */ + + /* + * If there are no more dirty pages for this backing + * backing dev, and the queue is not congested, then + * it is possible to quickly write out some more data + */ + if (!wbc.nr_dirty && !bdi_write_congested(bdi)) + break; } congestion_wait(WRITE, HZ/10); } @@ -619,6 +635,7 @@ retry: min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) { unsigned i; + wbc->nr_dirty += nr_pages; scanned = 1; for (i = 0; i < nr_pages; i++) { struct page *page = pvec.pages[i]; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/