On 2020/5/10 3:03, Jaegeuk Kim wrote: > On 05/09, Chao Yu wrote: >> On 2020/5/9 0:10, Jaegeuk Kim wrote: >>> Hi Sayali, >>> >>> In order to address the perf regression, how about this? >>> >>> >From 48418af635884803ffb35972df7958a2e6649322 Mon Sep 17 00:00:00 2001 >>> From: Jaegeuk Kim <jaeg...@kernel.org> >>> Date: Fri, 8 May 2020 09:08:37 -0700 >>> Subject: [PATCH] f2fs: avoid double lock for cp_rwsem during checkpoint >>> >>> There could be a scenario where f2fs_sync_node_pages gets >>> called during checkpoint, which in turn tries to flush >>> inline data and calls iput(). This results in deadlock as >>> iput() tries to hold cp_rwsem, which is already held at the >>> beginning by checkpoint->block_operations(). >>> >>> Call stack : >>> >>> Thread A Thread B >>> f2fs_write_checkpoint() >>> - block_operations(sbi) >>> - f2fs_lock_all(sbi); >>> - down_write(&sbi->cp_rwsem); >>> >>> - open() >>> - igrab() >>> - write() write inline data >>> - unlink() >>> - f2fs_sync_node_pages() >>> - if (is_inline_node(page)) >>> - flush_inline_data() >>> - ilookup() >>> page = f2fs_pagecache_get_page() >>> if (!page) >>> goto iput_out; >>> iput_out: >>> -close() >>> -iput() >>> iput(inode); >>> - f2fs_evict_inode() >>> - f2fs_truncate_blocks() >>> - f2fs_lock_op() >>> - down_read(&sbi->cp_rwsem); >>> >>> Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to >>> inline_data") >>> Signed-off-by: Sayali Lokhande <saya...@codeaurora.org> >>> Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org> >>> --- >>> fs/f2fs/node.c | 4 ++-- >>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >>> index 1db8cabf727ef..626d7daca09de 100644 >>> --- a/fs/f2fs/node.c >>> +++ b/fs/f2fs/node.c >>> @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, >>> goto continue_unlock; >>> } >>> >>> - /* flush inline_data */ >>> - if (is_inline_node(page)) { >>> + /* flush inline_data, if it's not sync path. */ >>> + if (do_balance && is_inline_node(page)) { >> >> IIRC, this flow was designed to avoid running out of free space issue >> during checkpoint: >> >> 2049d4fcb057 ("f2fs: avoid multiple node page writes due to inline_data") >> >> The sceanrio is: >> 1. create fully node blocks >> 2. flush node blocks >> 3. write inline_data for all the node blocks again >> 4. flush node blocks redundantly >> >> I guess this may cause failing one case of fstest. > > Yeah, actually I was hitting 204 failure, and thus, revised like this. > Now, I don't see any regression in fstest. > >>From 8f1882acfb0a5fc43e5a2bbd576a8f3c681a7d2c Mon Sep 17 00:00:00 2001 > From: Sayali Lokhande <saya...@codeaurora.org> > Date: Thu, 30 Apr 2020 16:28:29 +0530 > Subject: [PATCH] f2fs: Avoid double lock for cp_rwsem during checkpoint > > There could be a scenario where f2fs_sync_node_pages gets > called during checkpoint, which in turn tries to flush > inline data and calls iput(). This results in deadlock as > iput() tries to hold cp_rwsem, which is already held at the > beginning by checkpoint->block_operations(). > > Call stack : > > Thread A Thread B > f2fs_write_checkpoint() > - block_operations(sbi) > - f2fs_lock_all(sbi); > - down_write(&sbi->cp_rwsem); > > - open() > - igrab() > - write() write inline data > - unlink() > - f2fs_sync_node_pages() > - if (is_inline_node(page)) > - flush_inline_data() > - ilookup() > page = f2fs_pagecache_get_page() > if (!page) > goto iput_out; > iput_out: > -close() > -iput() > iput(inode); > - f2fs_evict_inode() > - f2fs_truncate_blocks() > - f2fs_lock_op() > - down_read(&sbi->cp_rwsem); > > Fixes: 2049d4fcb057 ("f2fs: avoid multiple node page writes due to > inline_data") > Signed-off-by: Sayali Lokhande <saya...@codeaurora.org> > Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org> > --- > fs/f2fs/checkpoint.c | 9 ++++++++- > fs/f2fs/f2fs.h | 4 ++-- > fs/f2fs/node.c | 10 +++++----- > 3 files changed, 15 insertions(+), 8 deletions(-) > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > index d49f7a01d8a26..928aea4ff663d 100644 > --- a/fs/f2fs/checkpoint.c > +++ b/fs/f2fs/checkpoint.c > @@ -1168,6 +1168,12 @@ static int block_operations(struct f2fs_sb_info *sbi) > }; > int err = 0, cnt = 0; > > + /* > + * Let's flush node pages first to flush inline_data. > + * We'll actually guarantee everything below under f2fs_lock_all. > + */ > + f2fs_sync_node_pages(sbi, &wbc, false, false, FS_CP_NODE_IO);
It is possible that user write a large number of inline data in between f2fs_sync_node_pages() and f2fs_lock_all(), it will cause the no-space issue in race condition. Also, if there is huge number of F2FS_DIRTY_IMETA, after this change, we will flush inode page twice which is unneeded. f2fs_sync_node_pages() --- flush dirty inode page f2fs_lock_all() ... f2fs_sync_inode_meta() --- update dirty inode page f2fs_sync_node_pages() --- flush dirty inode page again. Thanks, > + > retry_flush_quotas: > f2fs_lock_all(sbi); > if (__need_flush_quota(sbi)) { > @@ -1222,7 +1228,8 @@ static int block_operations(struct f2fs_sb_info *sbi) > if (get_pages(sbi, F2FS_DIRTY_NODES)) { > up_write(&sbi->node_write); > atomic_inc(&sbi->wb_sync_req[NODE]); > - err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO); > + err = f2fs_sync_node_pages(sbi, &wbc, false, > + true, FS_CP_NODE_IO); > atomic_dec(&sbi->wb_sync_req[NODE]); > if (err) { > up_write(&sbi->node_change); > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h > index d916540f12813..ac6ae42b9dd4e 100644 > --- a/fs/f2fs/f2fs.h > +++ b/fs/f2fs/f2fs.h > @@ -3286,8 +3286,8 @@ int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, > struct inode *inode, > struct writeback_control *wbc, bool atomic, > unsigned int *seq_id); > int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > - struct writeback_control *wbc, > - bool do_balance, enum iostat_type io_type); > + struct writeback_control *wbc, > + bool do_balance, bool sync, enum iostat_type io_type); > int f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount); > bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid); > void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid); > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c > index 1db8cabf727ef..fd00a8c119088 100644 > --- a/fs/f2fs/node.c > +++ b/fs/f2fs/node.c > @@ -1808,8 +1808,8 @@ static bool flush_dirty_inode(struct page *page) > } > > int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > - struct writeback_control *wbc, > - bool do_balance, enum iostat_type io_type) > + struct writeback_control *wbc, > + bool do_balance, bool sync, enum iostat_type io_type) > { > pgoff_t index; > struct pagevec pvec; > @@ -1870,8 +1870,8 @@ int f2fs_sync_node_pages(struct f2fs_sb_info *sbi, > goto continue_unlock; > } > > - /* flush inline_data */ > - if (is_inline_node(page)) { > + /* flush inline_data, if it's async context. */ > + if (!sync && is_inline_node(page)) { > clear_inline_node(page); > unlock_page(page); > flush_inline_data(sbi, ino_of_node(page)); > @@ -1999,7 +1999,7 @@ static int f2fs_write_node_pages(struct address_space > *mapping, > > diff = nr_pages_to_write(sbi, NODE, wbc); > blk_start_plug(&plug); > - f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO); > + f2fs_sync_node_pages(sbi, wbc, true, false, FS_NODE_IO); > blk_finish_plug(&plug); > wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff); > >