> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaeg...@kernel.org]
> Sent: Tuesday, September 23, 2014 12:53 PM
> To: Chao Yu
> Cc: 'Huang Ying'; linux-kernel@vger.kernel.org; linux-fsde...@vger.kernel.org;
> linux-f2fs-de...@lists.sourceforge.net
> Subject: Re: [f2fs-dev] [PATCH 2/3] f2fs: fix conditions to remain recovery 
> information in
> f2fs_sync_file
> 
> On Mon, Sep 22, 2014 at 05:20:19PM +0800, Chao Yu wrote:
> > > -----Original Message-----
> > > From: Huang Ying [mailto:ying.hu...@intel.com]
> > > Sent: Monday, September 22, 2014 3:39 PM
> > > To: Chao Yu
> > > Cc: 'Jaegeuk Kim'; linux-kernel@vger.kernel.org; 
> > > linux-fsde...@vger.kernel.org;
> > > linux-f2fs-de...@lists.sourceforge.net
> > > Subject: Re: [f2fs-dev] [PATCH 2/3] f2fs: fix conditions to remain 
> > > recovery information
> in
> > > f2fs_sync_file
> > >
> > > On Mon, 2014-09-22 at 15:24 +0800, Chao Yu wrote:
> > > > Hi Jaegeuk, Huang,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jaegeuk Kim [mailto:jaeg...@kernel.org]
> > > > > Sent: Thursday, September 18, 2014 1:51 PM
> > > > > To: linux-kernel@vger.kernel.org; linux-fsde...@vger.kernel.org;
> > > > > linux-f2fs-de...@lists.sourceforge.net
> > > > > Cc: Jaegeuk Kim; Huang Ying
> > > > > Subject: [f2fs-dev] [PATCH 2/3] f2fs: fix conditions to remain 
> > > > > recovery information
> in
> > > > > f2fs_sync_file
> > > > >
> > > > > This patch revisited whole the recovery information during the 
> > > > > f2fs_sync_file.
> > > > >
> > > > > In this patch, there are three information to make a decision.
> > > > >
> > > > > a) IS_CHECKPOINTED,   /* is it checkpointed before? */
> > > > > b) HAS_FSYNCED_INODE, /* is the inode fsynced before? */
> > > > > c) HAS_LAST_FSYNC,    /* has the latest node fsync mark? */
> > > > >
> > > > > And, the scenarios for our rule are based on:
> > > > >
> > > > > [Term] F: fsync_mark, D: dentry_mark
> > > > >
> > > > > 1. inode(x) | CP | inode(x) | dnode(F)
> > > > > 2. inode(x) | CP | inode(F) | dnode(F)
> > > > > 3. inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > > > 4. inode(x) | CP | dnode(F) | inode(F)
> > > > > 5. CP | inode(x) | dnode(F) | inode(DF)
> > > > > 6. CP | inode(DF) | dnode(F)
> > > > > 7. CP | dnode(F) | inode(DF)
> > > > > 8. CP | dnode(F) | inode(x) | inode(DF)
> > > >
> > > > No sure, do we missed these cases:
> > > > inode(x) | CP | inode(F) | dnode(x) -> write inode(F)
> > > > CP | inode(DF) | dnode(x) -> write inode(F)
> > > >
> > > > In these cases we will write another inode with fsync flag because our 
> > > > last
> > > > dnode is written out to disk by bdi-flusher (HAS_LAST_FSYNC is not 
> > > > marked). But
> > > > this appended inode is not useful.
> > > >
> > > > AFAIK, HAS_LAST_FSYNC(AKA fsync_done) is introduced in commit 
> > > > 479f40c44ae3
> > > > ("f2fs: skip unnecessary node writes during fsync") to avoid writting 
> > > > multiple
> > > > unneeded inode page by multiple redundant fsync calls. But for now, its 
> > > > role can
> > > > be taken by HAS_FSYNCED_INODE.
> > > > So, can we remove this flag to simplify our logic of fsync flow?
> > > >
> > > > Then in fsync flow, the rule can be:
> > > > If CPed before, there must be a inode(F) written in warm node chain;
> > >
> > > How about
> > >
> > > inode(x) | CP | dnode(F)
> >
> > Oh, I missed this one, thanks for remindering that.
> >
> > There is another case:
> > inode(x) | CP | dnode(F) | dnode(x) -> write inode(F)
> > It seems we will append another unneeded inode(F) in this patch also due to
> > no HAS_LAST_FSYNC in nat entry cache of inode.
> 
> As the current rule for roll-forward recovery, we need inode(F) to find the
> latest mark. This can also be used to distinguish fsynced inode from 
> writebacked
> inodes.

Got it now, thanks for your explanation!

> 
> >
> > >
> > > > If not CPed before, there must be a inode(DF) written in warm node 
> > > > chain.
> > >
> > > For example below:
> > >
> > > 1) checkpoint
> > > 2) create "a", change "a"
> > > 3) fsync "a"
> > > 4) open "a", change "a"
> > >
> > > Do we want recovery to stop at dnode(F) in step 3) or stop at dnode(x)
> > > produced by step 4)?
> >
> > To my understanding, we will recover all info related to fsynced nodes in 
> > warm
> > node chain. So will produce to step 4 if changed nodes in step 4 are 
> > flushed to
> > device.

To Huang,
My mistaken for the wrong explanation due to my misunderstanding about the 
recover
code, sorry about that!

> 
> Current rule should stop at 3) fsync "a". It won't recover 4)'s inode, since 
> it
> was just writebacked.
> 
> If we'd like to recover whole the inode and its data, we should traverse all 
> the
> recovery paths from the sketch.

I am missing some codes when traverse the recover flow.
You're right, and thanks for correcting me.

Regards,
Yu

> 
> Thanks,
> 
> >
> > Thanks,
> > Yu
> > >
> > > Best Regards,
> > > Huang, Ying
> > >
> > > > >
> > > > > For example, #3, the three conditions should be changed as follows.
> > > > >
> > > > >    inode(x) | CP | dnode(F) | inode(x) | inode(F)
> > > > > a)    x       o      o          o          o
> > > > > b)    x       x      x          x          o
> > > > > c)    x       o      o          x          o
> > > > >
> > > > > If f2fs_sync_file stops   ------^,
> > > > >  it should write inode(F)    --------------^
> > > > >
> > > > > So, the need_inode_block_update should return true, since
> > > > >  c) get_nat_flag(e, HAS_LAST_FSYNC), is false.
> > > > >
> > > > > For example, #8,
> > > > >       CP | alloc | dnode(F) | inode(x) | inode(DF)
> > > > > a)    o      x        x          x          x
> > > > > b)    x               x          x          o
> > > > > c)    o               o          x          o
> > > > >
> > > > > If f2fs_sync_file stops   -------^,
> > > > >  it should write inode(DF)    --------------^
> > > > >
> > > > > Note that, the roll-forward policy should follow this rule, which 
> > > > > means,
> > > > > if there are any missing blocks, we doesn't need to recover that 
> > > > > inode.
> > > > >
> > > > > Signed-off-by: Huang Ying <ying.hu...@intel.com>
> > > > > Signed-off-by: Jaegeuk Kim <jaeg...@kernel.org>
> > > > > ---
> > > > >  fs/f2fs/data.c |  3 ---
> > > > >  fs/f2fs/f2fs.h |  6 +++---
> > > > >  fs/f2fs/file.c | 10 ++++++----
> > > > >  fs/f2fs/node.c | 56 
> > > > > +++++++++++++++++++++++++++++++++++---------------------
> > > > >  fs/f2fs/node.h | 21 ++++++++++++---------
> > > > >  5 files changed, 56 insertions(+), 40 deletions(-)
> > > > >
> > > > > diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> > > > > index 0e37658..fdc3dbe 100644
> > > > > --- a/fs/f2fs/data.c
> > > > > +++ b/fs/f2fs/data.c
> > > > > @@ -1089,9 +1089,6 @@ static ssize_t f2fs_direct_IO(int rw, struct 
> > > > > kiocb *iocb,
> > > > >       if (check_direct_IO(inode, rw, iter, offset))
> > > > >               return 0;
> > > > >
> > > > > -     /* clear fsync mark to recover these blocks */
> > > > > -     fsync_mark_clear(F2FS_I_SB(inode), inode->i_ino);
> > > > > -
> > > > >       trace_f2fs_direct_IO_enter(inode, offset, count, rw);
> > > > >
> > > > >       err = blockdev_direct_IO(rw, iocb, inode, iter, offset, 
> > > > > get_data_block);
> > > > > diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> > > > > index 9fc1bcd..fd44083 100644
> > > > > --- a/fs/f2fs/f2fs.h
> > > > > +++ b/fs/f2fs/f2fs.h
> > > > > @@ -1224,9 +1224,9 @@ struct dnode_of_data;
> > > > >  struct node_info;
> > > > >
> > > > >  bool available_free_memory(struct f2fs_sb_info *, int);
> > > > > -int is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> > > > > -bool fsync_mark_done(struct f2fs_sb_info *, nid_t);
> > > > > -void fsync_mark_clear(struct f2fs_sb_info *, nid_t);
> > > > > +bool is_checkpointed_node(struct f2fs_sb_info *, nid_t);
> > > > > +bool has_fsynced_inode(struct f2fs_sb_info *, nid_t);
> > > > > +bool need_inode_block_update(struct f2fs_sb_info *, nid_t);
> > > > >  void get_node_info(struct f2fs_sb_info *, nid_t, struct node_info *);
> > > > >  int get_dnode_of_data(struct dnode_of_data *, pgoff_t, int);
> > > > >  int truncate_inode_blocks(struct inode *, pgoff_t);
> > > > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> > > > > index af06e22..3035c79 100644
> > > > > --- a/fs/f2fs/file.c
> > > > > +++ b/fs/f2fs/file.c
> > > > > @@ -207,15 +207,17 @@ int f2fs_sync_file(struct file *file, loff_t 
> > > > > start, loff_t end,
> int
> > > > > datasync)
> > > > >                       up_write(&fi->i_sem);
> > > > >               }
> > > > >       } else {
> > > > > -             /* if there is no written node page, write its inode 
> > > > > page */
> > > > > -             while (!sync_node_pages(sbi, ino, &wbc)) {
> > > > > -                     if (fsync_mark_done(sbi, ino))
> > > > > -                             goto out;
> > > > > +sync_nodes:
> > > > > +             sync_node_pages(sbi, ino, &wbc);
> > > > > +
> > > > > +             if (need_inode_block_update(sbi, ino)) {
> > > > >                       mark_inode_dirty_sync(inode);
> > > > >                       ret = f2fs_write_inode(inode, NULL);
> > > > >                       if (ret)
> > > > >                               goto out;
> > > > > +                     goto sync_nodes;
> > > > >               }
> > > > > +
> > > > >               ret = wait_on_node_pages_writeback(sbi, ino);
> > > > >               if (ret)
> > > > >                       goto out;
> > > > > diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> > > > > index d19d6b1..7a2d9c9 100644
> > > > > --- a/fs/f2fs/node.c
> > > > > +++ b/fs/f2fs/node.c
> > > > > @@ -123,44 +123,48 @@ static void __del_from_nat_cache(struct 
> > > > > f2fs_nm_info *nm_i, struct
> > > > > nat_entry *e)
> > > > >       kmem_cache_free(nat_entry_slab, e);
> > > > >  }
> > > > >
> > > > > -int is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid)
> > > > > +bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid)
> > > > >  {
> > > > >       struct f2fs_nm_info *nm_i = NM_I(sbi);
> > > > >       struct nat_entry *e;
> > > > > -     int is_cp = 1;
> > > > > +     bool is_cp = true;
> > > > >
> > > > >       read_lock(&nm_i->nat_tree_lock);
> > > > >       e = __lookup_nat_cache(nm_i, nid);
> > > > >       if (e && !get_nat_flag(e, IS_CHECKPOINTED))
> > > > > -             is_cp = 0;
> > > > > +             is_cp = false;
> > > > >       read_unlock(&nm_i->nat_tree_lock);
> > > > >       return is_cp;
> > > > >  }
> > > > >
> > > > > -bool fsync_mark_done(struct f2fs_sb_info *sbi, nid_t nid)
> > > > > +bool has_fsynced_inode(struct f2fs_sb_info *sbi, nid_t ino)
> > > > >  {
> > > > >       struct f2fs_nm_info *nm_i = NM_I(sbi);
> > > > >       struct nat_entry *e;
> > > > > -     bool fsync_done = false;
> > > > > +     bool fsynced = false;
> > > > >
> > > > >       read_lock(&nm_i->nat_tree_lock);
> > > > > -     e = __lookup_nat_cache(nm_i, nid);
> > > > > -     if (e)
> > > > > -             fsync_done = get_nat_flag(e, HAS_FSYNC_MARK);
> > > > > +     e = __lookup_nat_cache(nm_i, ino);
> > > > > +     if (e && get_nat_flag(e, HAS_FSYNCED_INODE))
> > > > > +             fsynced = true;
> > > > >       read_unlock(&nm_i->nat_tree_lock);
> > > > > -     return fsync_done;
> > > > > +     return fsynced;
> > > > >  }
> > > > >
> > > > > -void fsync_mark_clear(struct f2fs_sb_info *sbi, nid_t nid)
> > > > > +bool need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino)
> > > > >  {
> > > > >       struct f2fs_nm_info *nm_i = NM_I(sbi);
> > > > >       struct nat_entry *e;
> > > > > +     bool need_update = true;
> > > > >
> > > > > -     write_lock(&nm_i->nat_tree_lock);
> > > > > -     e = __lookup_nat_cache(nm_i, nid);
> > > > > -     if (e)
> > > > > -             set_nat_flag(e, HAS_FSYNC_MARK, false);
> > > > > -     write_unlock(&nm_i->nat_tree_lock);
> > > > > +     read_lock(&nm_i->nat_tree_lock);
> > > > > +     e = __lookup_nat_cache(nm_i, ino);
> > > > > +     if (e && get_nat_flag(e, HAS_LAST_FSYNC) &&
> > > > > +                     (get_nat_flag(e, IS_CHECKPOINTED) ||
> > > > > +                      get_nat_flag(e, HAS_FSYNCED_INODE)))
> > > > > +             need_update = false;
> > > > > +     read_unlock(&nm_i->nat_tree_lock);
> > > > > +     return need_update;
> > > > >  }
> > > > >
> > > > >  static struct nat_entry *grab_nat_entry(struct f2fs_nm_info *nm_i, 
> > > > > nid_t nid)
> > > > > @@ -176,7 +180,7 @@ static struct nat_entry *grab_nat_entry(struct 
> > > > > f2fs_nm_info *nm_i,
> nid_t
> > > > > nid)
> > > > >       }
> > > > >       memset(new, 0, sizeof(struct nat_entry));
> > > > >       nat_set_nid(new, nid);
> > > > > -     set_nat_flag(new, IS_CHECKPOINTED, true);
> > > > > +     nat_reset_flag(new);
> > > > >       list_add_tail(&new->list, &nm_i->nat_entries);
> > > > >       nm_i->nat_cnt++;
> > > > >       return new;
> > > > > @@ -244,12 +248,17 @@ retry:
> > > > >
> > > > >       /* change address */
> > > > >       nat_set_blkaddr(e, new_blkaddr);
> > > > > +     if (new_blkaddr == NEW_ADDR || new_blkaddr == NULL_ADDR)
> > > > > +             set_nat_flag(e, IS_CHECKPOINTED, false);
> > > > >       __set_nat_cache_dirty(nm_i, e);
> > > > >
> > > > >       /* update fsync_mark if its inode nat entry is still alive */
> > > > >       e = __lookup_nat_cache(nm_i, ni->ino);
> > > > > -     if (e)
> > > > > -             set_nat_flag(e, HAS_FSYNC_MARK, fsync_done);
> > > > > +     if (e) {
> > > > > +             if (fsync_done && ni->nid == ni->ino)
> > > > > +                     set_nat_flag(e, HAS_FSYNCED_INODE, true);
> > > > > +             set_nat_flag(e, HAS_LAST_FSYNC, fsync_done);
> > > > > +     }
> > > > >       write_unlock(&nm_i->nat_tree_lock);
> > > > >  }
> > > > >
> > > > > @@ -1121,10 +1130,14 @@ continue_unlock:
> > > > >
> > > > >                       /* called by fsync() */
> > > > >                       if (ino && IS_DNODE(page)) {
> > > > > -                             int mark = !is_checkpointed_node(sbi, 
> > > > > ino);
> > > > >                               set_fsync_mark(page, 1);
> > > > > -                             if (IS_INODE(page))
> > > > > -                                     set_dentry_mark(page, mark);
> > > > > +                             if (IS_INODE(page)) {
> > > > > +                                     if (!is_checkpointed_node(sbi, 
> > > > > ino) &&
> > > > > +                                             !has_fsynced_inode(sbi, 
> > > > > ino))
> > > > > +                                             set_dentry_mark(page, 
> > > > > 1);
> > > > > +                                     else
> > > > > +                                             set_dentry_mark(page, 
> > > > > 0);
> > > > > +                             }
> > > > >                               nwritten++;
> > > > >                       } else {
> > > > >                               set_fsync_mark(page, 0);
> > > > > @@ -1912,6 +1925,7 @@ void flush_nat_entries(struct f2fs_sb_info *sbi)
> > > > >                               write_unlock(&nm_i->nat_tree_lock);
> > > > >                       } else {
> > > > >                               write_lock(&nm_i->nat_tree_lock);
> > > > > +                             nat_reset_flag(ne);
> > > > >                               __clear_nat_cache_dirty(nm_i, ne);
> > > > >                               write_unlock(&nm_i->nat_tree_lock);
> > > > >                       }
> > > > > diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
> > > > > index 3043778..b8ba63c 100644
> > > > > --- a/fs/f2fs/node.h
> > > > > +++ b/fs/f2fs/node.h
> > > > > @@ -41,7 +41,8 @@ struct node_info {
> > > > >
> > > > >  enum {
> > > > >       IS_CHECKPOINTED,        /* is it checkpointed before? */
> > > > > -     HAS_FSYNC_MARK,         /* has the latest node fsync mark? */
> > > > > +     HAS_FSYNCED_INODE,      /* is the inode fsynced before? */
> > > > > +     HAS_LAST_FSYNC,         /* has the latest node fsync mark? */
> > > > >  };
> > > > >
> > > > >  struct nat_entry {
> > > > > @@ -60,15 +61,9 @@ struct nat_entry {
> > > > >  #define nat_set_version(nat, v)              (nat->ni.version = v)
> > > > >
> > > > >  #define __set_nat_cache_dirty(nm_i, ne)                              
> > > > >         \
> > > > > -     do {                                                            
> > > > > \
> > > > > -             set_nat_flag(ne, IS_CHECKPOINTED, false);               
> > > > > \
> > > > > -             list_move_tail(&ne->list, &nm_i->dirty_nat_entries);    
> > > > > \
> > > > > -     } while (0)
> > > > > +             list_move_tail(&ne->list, &nm_i->dirty_nat_entries);
> > > > >  #define __clear_nat_cache_dirty(nm_i, ne)                            
> > > > > \
> > > > > -     do {                                                            
> > > > > \
> > > > > -             set_nat_flag(ne, IS_CHECKPOINTED, true);                
> > > > > \
> > > > > -             list_move_tail(&ne->list, &nm_i->nat_entries);          
> > > > > \
> > > > > -     } while (0)
> > > > > +             list_move_tail(&ne->list, &nm_i->nat_entries);
> > > > >  #define inc_node_version(version)    (++version)
> > > > >
> > > > >  static inline void set_nat_flag(struct nat_entry *ne,
> > > > > @@ -87,6 +82,14 @@ static inline bool get_nat_flag(struct nat_entry 
> > > > > *ne, unsigned int
> type)
> > > > >       return ne->flag & mask;
> > > > >  }
> > > > >
> > > > > +static inline void nat_reset_flag(struct nat_entry *ne)
> > > > > +{
> > > > > +     /* these states can be set only after checkpoint was done */
> > > > > +     set_nat_flag(ne, IS_CHECKPOINTED, true);
> > > > > +     set_nat_flag(ne, HAS_FSYNCED_INODE, false);
> > > > > +     set_nat_flag(ne, HAS_LAST_FSYNC, true);
> > > > > +}
> > > > > +
> > > > >  static inline void node_info_from_raw_nat(struct node_info *ni,
> > > > >                                               struct f2fs_nat_entry 
> > > > > *raw_ne)
> > > > >  {
> > > > > --
> > > > > 1.8.5.2 (Apple Git-48)
> > > > >
> > > > >
> > > > > ------------------------------------------------------------------------------
> > > > > Want excitement?
> > > > > Manually upgrade your production database.
> > > > > When you want reliability, choose Perforce
> > > > > Perforce version control. Predictably reliable.
> > > > > http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> > > > > _______________________________________________
> > > > > Linux-f2fs-devel mailing list
> > > > > linux-f2fs-de...@lists.sourceforge.net
> > > > > https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
> > > >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to