On 05/08, Chao Yu wrote: > On 2018/5/8 4:46, Jaegeuk Kim wrote: > > On 04/27, Chao Yu wrote: > >> On 2018/4/27 0:36, Jaegeuk Kim wrote: > >>> On 04/26, Chao Yu wrote: > >>>> On 2018/4/26 23:48, Jaegeuk Kim wrote: > >>>>> On 04/26, Chao Yu wrote: > >>>>>> Thread A Thread B > >>>>>> - f2fs_ioc_commit_atomic_write > >>>>>> - commit_inmem_pages > >>>>>> - f2fs_submit_merged_write_cond > >>>>>> : write data > >>>>>> - write_checkpoint > >>>>>> - do_checkpoint > >>>>>> : commit all node within CP > >>>>>> -> SPO > >>>>>> - f2fs_do_sync_file > >>>>>> - file_write_and_wait_range > >>>>>> : wait data writeback > >>>>>> > >>>>>> In above race condition, data/node can be flushed in reversed order > >>>>>> when > >>>>>> coming a checkpoint before f2fs_do_sync_file, after SPOR, it results in > >>>>>> atomic written data being corrupted. > >>>>> > >>>>> Wait, what is the problem here? Thread B could succeed checkpoint, > >>>>> there is > >>>>> no problem. If it fails, there is no fsync mark where we can recover > >>>>> it, so > >>>> > >>>> Node is flushed by checkpoint before data, with reversed order, that's > >>>> the problem. > >>> > >>> What do you mean? Data should be in disk, in order to proceed checkpoint. > >> > >> 1. thread A: commit_inmem_pages submit data into block layer, but haven't > >> waited > >> it writeback. > >> 2. thread A: commit_inmem_pages update related node. > >> 3. thread B: do checkpoint, flush all nodes to disk > > > > How about, in block_operations(), > > > > down_read_trylock(&F2FS_I(inode)->i_gc_rwsem[WRITE]); > > if (fail) > > wait_on_all_pages_writeback(F2FS_WB_DATA); > > else > > up_read(&F2FS_I(inode)->i_gc_rwsem[WRITE]); > > I sent one patch for that, could you check it? > > Adding wait_on_all_pages_writeback in block_operations() can make checkpoint() > wait pages writeback one more time, which break IO flow, so what's your > concern > here?
Performance. And I can see wait_on_all_pages_writeback() waits only for F2FS_WB_CP_DATA in checkpoint()? > > Thanks, > > > > > > >> 4. SPOR > >> > >> Then, atomic file becomes corrupted since nodes is flushed before data. > >> > >> Thanks, > >> > >>> > >>>> > >>>> Thanks, > >>>> > >>>>> we can just ignore the last written data as nothing. > >>>>> > >>>>>> > >>>>>> This patch adds f2fs_wait_on_page_writeback in __revoke_inmem_pages() > >>>>>> to > >>>>>> keep data and node of atomic file being flushed orderly. > >>>>>> > >>>>>> Signed-off-by: Chao Yu <yuch...@huawei.com> > >>>>>> --- > >>>>>> fs/f2fs/file.c | 4 ++++ > >>>>>> fs/f2fs/segment.c | 3 +++ > >>>>>> 2 files changed, 7 insertions(+) > >>>>>> > >>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > >>>>>> index be7578774a47..a352804af244 100644 > >>>>>> --- a/fs/f2fs/file.c > >>>>>> +++ b/fs/f2fs/file.c > >>>>>> @@ -217,6 +217,9 @@ static int f2fs_do_sync_file(struct file *file, > >>>>>> loff_t start, loff_t end, > >>>>>> > >>>>>> trace_f2fs_sync_file_enter(inode); > >>>>>> > >>>>>> + if (atomic) > >>>>>> + goto write_done; > >>>>>> + > >>>>>> /* if fdatasync is triggered, let's do in-place-update */ > >>>>>> if (datasync || get_dirty_pages(inode) <= > >>>>>> SM_I(sbi)->min_fsync_blocks) > >>>>>> set_inode_flag(inode, FI_NEED_IPU); > >>>>>> @@ -228,6 +231,7 @@ static int f2fs_do_sync_file(struct file *file, > >>>>>> loff_t start, loff_t end, > >>>>>> return ret; > >>>>>> } > >>>>>> > >>>>>> +write_done: > >>>>>> /* if the inode is dirty, let's recover all the time */ > >>>>>> if (!f2fs_skip_inode_update(inode, datasync)) { > >>>>>> f2fs_write_inode(inode, NULL); > >>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > >>>>>> index 584483426584..9ca3d0a43d93 100644 > >>>>>> --- a/fs/f2fs/segment.c > >>>>>> +++ b/fs/f2fs/segment.c > >>>>>> @@ -230,6 +230,8 @@ static int __revoke_inmem_pages(struct inode > >>>>>> *inode, > >>>>>> > >>>>>> lock_page(page); > >>>>>> > >>>>>> + f2fs_wait_on_page_writeback(page, DATA, true); > >>>>>> + > >>>>>> if (recover) { > >>>>>> struct dnode_of_data dn; > >>>>>> struct node_info ni; > >>>>>> @@ -415,6 +417,7 @@ static int __commit_inmem_pages(struct inode > >>>>>> *inode) > >>>>>> /* drop all uncommitted pages */ > >>>>>> __revoke_inmem_pages(inode, &fi->inmem_pages, true, > >>>>>> false); > >>>>>> } else { > >>>>>> + /* wait all committed IOs writeback and release them > >>>>>> from list */ > >>>>>> __revoke_inmem_pages(inode, &revoke_list, false, false); > >>>>>> } > >>>>>> > >>>>>> -- > >>>>>> 2.15.0.55.gc2ece9dc4de6 > >>> > >>> . > >>> > > > > . > >