On Thu, Sep 29, 2016 at 06:45:03PM +0800, Chao Yu wrote: > On 2016/9/29 4:19, Jaegeuk Kim wrote: > > On Tue, Sep 27, 2016 at 10:09:03AM +0800, Chao Yu wrote: > >> On 2016/9/27 9:39, Jaegeuk Kim wrote: > >>> On Tue, Sep 27, 2016 at 08:57:41AM +0800, Chao Yu wrote: > >>>> Hi Jaegeuk, > >>>> > >>>> On 2016/9/27 2:33, Jaegeuk Kim wrote: > >>>>> Hi Chao, > >>>>> > >>>>> On Tue, Sep 27, 2016 at 12:09:52AM +0800, Chao Yu wrote: > >>>>>> From: Chao Yu <yuch...@huawei.com> > >>>>>> > >>>>>> In sync_node_pages, we won't check and commit last merged pages in > >>>>>> private > >>>>>> bio cache of f2fs, as these pages were taged as writeback, someone who > >>>>>> is > >>>>>> waiting for writebacking of the page will be blocked until the cache > >>>>>> was > >>>>>> committed by someone else. > >>>>>> > >>>>>> We need to commit node type bio cache to avoid potential deadlock or > >>>>>> long > >>>>>> delay of waiting writeback. > >>>>>> > >>>>>> Signed-off-by: Chao Yu <yuch...@huawei.com> > >>>>>> --- > >>>>>> fs/f2fs/node.c | 11 +++++++++-- > >>>>>> 1 file changed, 9 insertions(+), 2 deletions(-) > >>>>>> > >>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > >>>>>> index 9faddcd..f73f774 100644 > >>>>>> --- a/fs/f2fs/node.c > >>>>>> +++ b/fs/f2fs/node.c > >>>>>> @@ -1416,6 +1416,7 @@ int sync_node_pages(struct f2fs_sb_info *sbi, > >>>>>> struct writeback_control *wbc) > >>>>>> struct pagevec pvec; > >>>>>> int step = 0; > >>>>>> int nwritten = 0; > >>>>>> + int ret = 0; > >>>>>> > >>>>>> pagevec_init(&pvec, 0); > >>>>>> > >>>>>> @@ -1436,7 +1437,8 @@ next_step: > >>>>>> > >>>>>> if (unlikely(f2fs_cp_error(sbi))) { > >>>>>> pagevec_release(&pvec); > >>>>>> - return -EIO; > >>>>>> + ret = -EIO; > >>>>>> + goto out; > >>>>>> } > >>>>>> > >>>>>> /* > >>>>>> @@ -1487,6 +1489,8 @@ continue_unlock: > >>>>>> > >>>>>> if (NODE_MAPPING(sbi)->a_ops->writepage(page, > >>>>>> wbc)) > >>>>>> unlock_page(page); > >>>>>> + else > >>>>>> + nwritten++; > >>>>>> > >>>>>> if (--wbc->nr_to_write == 0) > >>>>>> break; > >>>>>> @@ -1504,7 +1508,10 @@ continue_unlock: > >>>>>> step++; > >>>>>> goto next_step; > >>>>>> } > >>>>>> - return nwritten; > >>>>>> +out: > >>>>>> + if (nwritten) > >>>>>> + f2fs_submit_merged_bio(sbi, NODE, WRITE); > >>>>> > >>>>> IIRC, we don't need to flush this, since f2fs_submit_merged_bio_cond() > >>>>> would > >>>>> handle this in f2fs_wait_on_page_writeback(). > >>>> > >>>> Yes, it covers all the cases in f2fs private codes, but there are still > >>>> some > >>>> codes in mm or fs directory, and they didn't use > >>>> f2fs_wait_on_page_writeback > >>>> when waiting page writeback. Such as do_writepages && filemap_fdatawait > >>>> in > >>>> __writeback_single_inode... > >>> > >>> The do_writepages() is okay, which will call f2fs_write_node_pages(). > >>> The __writeback_single_inode() won't do filemap_fdatawait() with > >>> WB_SYNC_ALL. > >>> We don't need to take care of truncation as well. > >>> > >>> Any missing one? > >> > >> Another is: while testing with first version of checkpoint error > >> injection, I > >> encounter below dump stack: > >> > >> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > >> mount D ffff8801c1bf7960 0 97685 97397 0x00080000 > >> ffff8801c1bf7960 ffff8801c1bf7930 ffff880175900000 ffff8801c1bf7980 > >> ffff8801c1bf8000 0000000000000000 7fffffffffffffff ffff88021f7be340 > >> ffffffff817c8880 ffff8801c1bf7978 ffffffff817c80a5 ffff880214f58fc0 > >> Call Trace: > >> [<ffffffff817c8880>] ? bit_wait+0x50/0x50 > >> [<ffffffff817c80a5>] schedule+0x35/0x80 > >> [<ffffffff817cb152>] schedule_timeout+0x292/0x3d0 > >> [<ffffffff81022ab5>] ? xen_clocksource_get_cycles+0x15/0x20 > >> [<ffffffff810eeb5c>] ? ktime_get+0x3c/0xb0 > >> [<ffffffff817c8880>] ? bit_wait+0x50/0x50 > >> [<ffffffff817c7906>] io_schedule_timeout+0xa6/0x110 > >> [<ffffffff817c889b>] bit_wait_io+0x1b/0x60 > >> [<ffffffff817c84e4>] __wait_on_bit+0x64/0x90 > >> [<ffffffff8117dcd4>] wait_on_page_bit+0xc4/0xd0 > >> [<ffffffff810bc4d0>] ? autoremove_wake_function+0x40/0x40 > >> [<ffffffff81190a29>] truncate_inode_pages_range+0x409/0x840 > >> [<ffffffff811a406d>] ? pcpu_free_area+0x13d/0x1a0 > >> [<ffffffff810bc025>] ? wake_up_bit+0x25/0x30 > >> [<ffffffff81190ecc>] truncate_inode_pages_final+0x4c/0x60 > >> [<ffffffffa025e9e8>] f2fs_evict_inode+0x48/0x390 [f2fs] > >> [<ffffffff812212f7>] evict+0xc7/0x1a0 > >> [<ffffffff81221f77>] iput+0x197/0x200 > >> [<ffffffffa0268242>] f2fs_fill_super+0xab2/0x1130 [f2fs] > >> [<ffffffff81209454>] mount_bdev+0x184/0x1c0 > >> [<ffffffffa0267790>] ? f2fs_commit_super+0x100/0x100 [f2fs] > >> [<ffffffffa02646a5>] f2fs_mount+0x15/0x20 [f2fs] > >> [<ffffffff81209e19>] mount_fs+0x39/0x160 > >> [<ffffffff81225e47>] vfs_kern_mount+0x67/0x110 > >> [<ffffffff812283bb>] do_mount+0x1bb/0xc80 > >> [<ffffffff81229163>] SyS_mount+0x83/0xd0 > >> [<ffffffff8100391e>] do_syscall_64+0x6e/0x170 > >> [<ffffffff817cc325>] entry_SYSCALL64_slow_path+0x25/0x25 > >> > >> Any thoughts? > > > > I think this should not happen normally, since f2fs_stop_checkpoint() calls > > f2fs_flush_merged_bios(). > > In write_end_io, f2fs_stop_checkpoint will not call f2fs_flush_merged_bios.
So, in write_node_page(), we call f2fs_submit_merged_bio() if f2fs_cp_error() is set. > One other problem here is it can cause latency during waiting writeback: > > In fsync() > 1. fsync_node_pages a/b/c pages is submitted, and d/e/f pages is still in bio > cache as we didn't commit the bio in the end. > 2. wait_on_node_pages_writeback will wait a/b/c pages for writebacking, then > submit bio with d/e/f pages and the wait until they have writebacked to > device. > > Here we need to submit cached bios at the end of > fsync_node_pages/sync_node_pages to let them being merged in block layer as > much > as possible, and also to avoid suffering more delay time due to double > submit&wait. I think this is more reasonable to me. ;) Thanks,