This is actually inspired by Filipe's patch(55e3bd2e0c2e1). When submit_extent_page() in __extent_writepage_io() fails, Btrfs misses clearing a writeback bit of the failed page. This causes the false under-writeback page. Then, another sync task hangs in filemap_fdatawait_range(), because it waits the false under-writeback page.
CPU0 CPU1 __extent_writepage_io() ret = submit_extent_page() // fail if (ret) SetPageError(page) // miss clearing the writeback bit sync() ... filemap_fdatawait_range() wait_on_page_writeback(page); // wait the false under-writeback page Signed-off-by: Takafumi Kubota <takafumi.kubota1...@sslab.ics.keio.ac.jp> --- fs/btrfs/extent_io.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 1e67723..ef9793b 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -3443,8 +3443,10 @@ static noinline_for_stack int __extent_writepage_io(struct inode *inode, bdev, &epd->bio, max_nr, end_bio_extent_writepage, 0, 0, 0, false); - if (ret) + if (ret) { SetPageError(page); + end_page_writeback(page); + } cur = cur + iosize; pg_offset += iosize; -- 1.9.3