from:"Chao Yu"

Re: [f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure in bio_alloc

2013-09-15 Thread Chao Yu

Hi Gu

> -Original Message-
> From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
> Sent: Monday, September 16, 2013 10:09 AM
> To: Chao Yu
> Cc: Kim Jaegeuk; linux-f2fs-de...@lists.sourceforge.net;
> linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 谭姝
> Subject: Re: [f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure in
bio_alloc
> 
> Hi Chao,
> 
> On 09/13/2013 09:27 PM, Chao Yu wrote:
> 
> > This patch add macro MAX_BIO_BLOCKS to limit value of npages in
> > f2fs_bio_alloc, it can avoid allocating failure in bio_alloc caused by
> > npages is larger than UIO_MAXIOV.
> 
> As I know bio_alloc is based of *fs_bio_set* pool, without the limitation
of
> UIO_MAXIOV, am I missing something?

Here is the code in bio.c, fs_bio_set is as the actual parameter pass to bs
without being inited.
So it may have opportunity to return NULL in this function.
---
Bio.c 
struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set
*bs)
{
..
if (!bs) {
if (nr_iovecs > UIO_MAXIOV)
return NULL;
---
I did the abnormal test: modify the max_sectors_kb in /sys/block/sdx/queue
to 32767 for a disk with f2fs format,
and I got a segfualt in f2fs_bio_alloc after the img mounted.
Is there anyting I missed?

> 
> Thanks,
> Gu
> 
> >
> > Signed-off-by: Yu Chao 
> >  ---
> >  fs/f2fs/segment.c |4 +++-
> >  fs/f2fs/segment.h |3 +++
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> > 09af9c7..bd79bbe 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -657,6 +657,7 @@ static void submit_write_page(struct f2fs_sb_info
> > *sbi, struct page *page,
> > block_t blk_addr, enum page_type
> type)
> > {
> > struct block_device *bdev = sbi->sb->s_bdev;
> > +   int bio_blocks;
> >
> > verify_block_addr(sbi, blk_addr);
> >
> > @@ -676,7 +677,8 @@ retry:
> > goto retry;
> > }
> >
> > -   sbi->bio[type] = f2fs_bio_alloc(bdev,
max_hw_blocks(sbi));
> > +   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
> > +   sbi->bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
> > sbi->bio[type]->bi_sector = SECTOR_FROM_BLOCK(sbi,
> > blk_addr);
> > sbi->bio[type]->bi_private = priv;
> > /*
> > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> > bdd10ea..6352af1 100644
> > --- a/fs/f2fs/segment.h
> > +++ b/fs/f2fs/segment.h
> > @@ -9,6 +9,7 @@
> >   * published by the Free Software Foundation.
> >   */
> >  #include 
> > +#include 
> >
> >  /* constant macro */
> >  #define NULL_SEGNO ((unsigned int)(~0))
> > @@ -90,6 +91,8 @@
> > (blk_addr << ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
> >  #define SECTOR_TO_BLOCK(sbi, sectors)
> \
> > (sectors >> ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
> > +#define MAX_BIO_BLOCKS(max_hw_blocks)
> \
> > +   (min((int)max_hw_blocks, UIO_MAXIOV))
> >
> >  /* during checkpoint, bio_private is used to synchronize the last bio
> > */  struct bio_private {
> > ---
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure in bio_alloc

2013-09-16 Thread Chao Yu

Hi Gu

> -Original Message-
> From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
> Sent: Monday, September 16, 2013 12:40 PM
> To: Chao Yu
> Cc: 'Kim Jaegeuk'; linux-f2fs-de...@lists.sourceforge.net;
> linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; '谭姝'
> Subject: Re: [f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure in
bio_alloc
> 
> Hi Chao,
> 
> On 09/16/2013 11:26 AM, Chao Yu wrote:
> 
> > Hi Gu
> >
> >> -Original Message-
> >> From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
> >> Sent: Monday, September 16, 2013 10:09 AM
> >> To: Chao Yu
> >> Cc: Kim Jaegeuk; linux-f2fs-de...@lists.sourceforge.net;
> >> linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 谭姝
> >> Subject: Re: [f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure
> >> in
> > bio_alloc
> >>
> >> Hi Chao,
> >>
> >> On 09/13/2013 09:27 PM, Chao Yu wrote:
> >>
> >>> This patch add macro MAX_BIO_BLOCKS to limit value of npages in
> >>> f2fs_bio_alloc, it can avoid allocating failure in bio_alloc caused
> >>> by npages is larger than UIO_MAXIOV.
> >>
> >> As I know bio_alloc is based of *fs_bio_set* pool, without the
> >> limitation
> > of
> >> UIO_MAXIOV, am I missing something?
> >
> > Here is the code in bio.c, fs_bio_set is as the actual parameter pass
> > to bs without being inited.
> 
> fs_bio_set was initiated early in the bio subsystem init.
> 
> > So it may have opportunity to return NULL in this function.
> 
> It may be, but may not be the thread you mentioned below.
> 
> > ---
> > Bio.c
> > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct
> > bio_set
> > *bs)
> > {
> > ..
> > if (!bs) {
> > if (nr_iovecs > UIO_MAXIOV)
> > return NULL;
> > ---
> > I did the abnormal test: modify the max_sectors_kb in
> > /sys/block/sdx/queue to 32767 for a disk with f2fs format, and I got a
> > segfualt in f2fs_bio_alloc after the img mounted.
> > Is there anyting I missed?
> 
> Hmm, this change will also trigger bvec_alloc failed, did you add some
traces to
> debug this?

I reviewed the code in bio_alloc, and then trace the process of bio_alloc
for verification.
It indicate that bvec_alloc() will fail on the condition that nr_iovecs is
greater than BIO_MAX_PAGES.
The patch should be updated for that.

I am sorry about the mistake in this patch, and thanks for the reviewing and
reminding.

> 
> Regards,
> Gu
> 
> >
> >>
> >> Thanks,
> >> Gu
> >>
> >>>
> >>> Signed-off-by: Yu Chao 
> >>>  ---
> >>>  fs/f2fs/segment.c |4 +++-
> >>>  fs/f2fs/segment.h |3 +++
> >>>  2 files changed, 6 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> >>> 09af9c7..bd79bbe 100644
> >>> --- a/fs/f2fs/segment.c
> >>> +++ b/fs/f2fs/segment.c
> >>> @@ -657,6 +657,7 @@ static void submit_write_page(struct
> >>> f2fs_sb_info *sbi, struct page *page,
> >>> block_t blk_addr, enum page_type
> >> type)
> >>> {
> >>> struct block_device *bdev = sbi->sb->s_bdev;
> >>> +   int bio_blocks;
> >>>
> >>> verify_block_addr(sbi, blk_addr);
> >>>
> >>> @@ -676,7 +677,8 @@ retry:
> >>> goto retry;
> >>> }
> >>>
> >>> -   sbi->bio[type] = f2fs_bio_alloc(bdev,
> > max_hw_blocks(sbi));
> >>> +   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
> >>> +   sbi->bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
> >>> sbi->bio[type]->bi_sector =
> SECTOR_FROM_BLOCK(sbi,
> >>> blk_addr);
> >>> sbi->bio[type]->bi_private = priv;
> >>> /*
> >>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> >>> bdd10ea..6352af1 100644
> >>> --- a/fs/f2fs/segment.h
> >>> +++ b/fs/f2fs/segment.h
> >>> @@ -9,6 +9,7 @@
> >>>   * published by the Free Software Foundation.
> >>>   */
> >>>  #include 
> >>> +#include 
> >>>
> >>>  /* constant macro */
> >>>  #define NULL_SEGNO

Re: Re: [f2fs-dev][PATCH] f2fs: optimize fs_lock for better performance

2013-09-10 Thread Chao Yu

Hi Kim,

I did some tests as you mention of using random instead of spin_lock.
The test model is as following:
eight threads race to grab one of eight locks for one thousand times,
and I used four methods to generate lock num: 

1.atomic_add_return(1, &sbi->next_lock_num) % NR_GLOBAL_LOCKS;
2.spin_lock(); next_lock_num++ % NR_GLOBAL_LOCKS; spin_unlock();
3.ktime_get().tv64 % NR_GLOBAL_LOCKS;
4.get_random_bytes(&next_lock, sizeof(unsigned int));

the result indicate that:
max count of collide continuously: 4 > 3 > 2 = 1
max-min count of lock is grabbed: 4 > 3 > 2 = 1
elapsed time of generating: 3 > 2 > 4 > 1

So I think it's better to use atomic_add_return in round-robin method to
cost less time and reduce collide.
What's your opinion?

thanks

--- Original Message ---
Sender : ??? S5(??)/??/?(???)/
Date : 九月 10, 2013 09:52 (GMT+09:00)
Title : Re: [f2fs-dev][PATCH] f2fs: optimize fs_lock for better performance

Hi,

At first, thank you for the report and please follow the email writing
rules. :)

Anyway, I agree to the below issue.
One thing that I can think of is that we don't need to use the
spin_lock, since we don't care about the exact lock number, but just
need to get any not-collided number.

So, how about removing the spin_lock?
And how about using a random number?
Thanks,

2013-09-06 (?), 09:48 +, Chao Yu:
> Hi Kim:
> 
>  I think there is a performance problem: when all sbi->fs_lock is
> holded, 
> 
> then all other threads may get the same next_lock value from
> sbi->next_lock_num in function mutex_lock_op, 
> 
> and wait to get the same lock at position fs_lock[next_lock], it
> unbalance the fs_lock usage. 
> 
> It may lost performance when we do the multithread test.
> 
>  
> 
> Here is the patch to fix this problem:
> 
>  
> 
> Signed-off-by: Yu Chao 
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> 
> old mode 100644
> 
> new mode 100755
> 
> index 467d42d..983bb45
> 
> --- a/fs/f2fs/f2fs.h
> 
> +++ b/fs/f2fs/f2fs.h
> 
> @@ -371,6 +371,7 @@ struct f2fs_sb_info {
> 
> struct mutex fs_lock[NR_GLOBAL_LOCKS];  /* blocking FS
> operations */
> 
> struct mutex node_write;/* locking node writes
> */
> 
> struct mutex writepages;/* mutex for
> writepages() */
> 
> +   spinlock_t spin_lock;   /* lock for
> next_lock_num */
> 
> unsigned char next_lock_num;/* round-robin global
> locks */
> 
> int por_doing;  /* recovery is doing
> or not */
> 
> int on_build_free_nids; /* build_free_nids is
> doing */
> 
> @@ -533,15 +534,19 @@ static inline void mutex_unlock_all(struct
> f2fs_sb_info *sbi)
> 
>  
> 
>  static inline int mutex_lock_op(struct f2fs_sb_info *sbi)
> 
>  {
> 
> -   unsigned char next_lock = sbi->next_lock_num %
> NR_GLOBAL_LOCKS;
> 
> +   unsigned char next_lock;
> 
> int i = 0;
> 
>  
> 
> for (; i < NR_GLOBAL_LOCKS; i++)
> 
> if (mutex_trylock(&sbi->fs_lock[i]))
> 
> return i;
> 
>  
> 
> -   mutex_lock(&sbi->fs_lock[next_lock]);
> 
> +   spin_lock(&sbi->spin_lock);
> 
> +   next_lock = sbi->next_lock_num % NR_GLOBAL_LOCKS;
> 
> sbi->next_lock_num++;
> 
> +   spin_unlock(&sbi->spin_lock);
> 
> +
> 
> +   mutex_lock(&sbi->fs_lock[next_lock]);
> 
> return next_lock;
> 
>  }
> 
>  
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> 
> old mode 100644
> 
> new mode 100755
> 
> index 75c7dc3..4f27596
> 
> --- a/fs/f2fs/super.c
> 
> +++ b/fs/f2fs/super.c
> 
> @@ -657,6 +657,7 @@ static int f2fs_fill_super(struct super_block *sb,
> void *data, int silent)
> 
> mutex_init(&sbi->cp_mutex);
> 
> for (i = 0; i < NR_GLOBAL_LOCKS; i++)
> 
> mutex_init(&sbi->fs_lock[i]);
> 
> +   spin_lock_init(&sbi->spin_lock);
> 
> mutex_init(&sbi->node_write);
> 
> sbi->por_doing = 0;
> 
> spin_lock_init(&sbi->stat_lock);
> 
> (END)
> 
>  
> 
> 
> 
> 

-- 
Jaegeuk Kim
SamsungN�Р骒r��yb�X�肚�v�^�)藓{.n�+�伐�{��赙zXФ�≤�}��财�z�&j:+v�����赙zZ+��+zf＂�h���~i���z��wア�?�ㄨ��&�)撷f��^j谦y�m��@A�a囤�
0鹅h���i

[f2fs-dev][PATCH] f2fs: limit nr_iovecs in bio_alloc

2013-09-13 Thread Chao Yu

This patch add macro MAX_BIO_BLOCKS to limit value of npages in
f2fs_bio_alloc,
it can avoid to return NULL in bio_alloc caused by npages is larger than
UIO_MAXIOV.

Signed-off-by: Yu Chao 
 ---
 fs/f2fs/segment.c |4 +++-
 fs/f2fs/segment.h |3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 09af9c7..bd79bbe 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -657,6 +657,7 @@ static void submit_write_page(struct f2fs_sb_info *sbi,
struct page *page,
block_t blk_addr, enum page_type type)
 {
struct block_device *bdev = sbi->sb->s_bdev;
+   int bio_blocks;
 
verify_block_addr(sbi, blk_addr);
 
@@ -676,7 +677,8 @@ retry:
goto retry;
}
 
-   sbi->bio[type] = f2fs_bio_alloc(bdev, max_hw_blocks(sbi));
+   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+   sbi->bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
sbi->bio[type]->bi_sector = SECTOR_FROM_BLOCK(sbi,
blk_addr);
sbi->bio[type]->bi_private = priv;
/*
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index bdd10ea..9cc95eb 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -9,6 +9,7 @@
  * published by the Free Software Foundation.
  */
 #include 
+#include 
 
 /* constant macro */
 #define NULL_SEGNO ((unsigned int)(~0))
@@ -90,6 +91,8 @@
(blk_addr << ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
 #define SECTOR_TO_BLOCK(sbi, sectors)  \
(sectors >> ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
+#define MAX_BIO_BLOCK(max_hw_blocks)   \
+   (min((int)max_hw_blocks, UIO_MAXIOV))
 
 /* during checkpoint, bio_private is used to synchronize the last bio */
 struct bio_private {
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [f2fs-dev][PATCH] f2fs: limit nr_iovecs in bio_alloc

2013-09-13 Thread Chao Yu

> -Original Message-
> From: Jin Xu [mailto:linuxclim...@gmail.com]
> Sent: Friday, September 13, 2013 7:49 PM
> To: Chao Yu
> Cc: ???; linux-f2fs-de...@lists.sourceforge.net;
linux-fsde...@vger.kernel.org;
> linux-kernel@vger.kernel.org; 谭姝
> Subject: Re: [f2fs-dev][PATCH] f2fs: limit nr_iovecs in bio_alloc
> 
> Did this patch pass the basic build? There seems have a typo regarding
> MAX_BIO_BLOCK.
> 

I am so sorry about that.I miss the 'S' when merging the code by handwriting
from build path to git branch path.
I will check the patch carefully and resubmit it.

Thanks for reminding!

> --
> Jin
> 
> On 13/09/2013 18:07, Chao Yu wrote:
> > This patch add macro MAX_BIO_BLOCKS to limit value of npages in
> > f2fs_bio_alloc, it can avoid to return NULL in bio_alloc caused by
> > npages is larger than UIO_MAXIOV.
> >
> > Signed-off-by: Yu Chao 
> >  ---
> >  fs/f2fs/segment.c |4 +++-
> >  fs/f2fs/segment.h |3 +++
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index
> > 09af9c7..bd79bbe 100644
> > --- a/fs/f2fs/segment.c
> > +++ b/fs/f2fs/segment.c
> > @@ -657,6 +657,7 @@ static void submit_write_page(struct f2fs_sb_info
> > *sbi, struct page *page,
> > block_t blk_addr, enum page_type
> type)
> > {
> > struct block_device *bdev = sbi->sb->s_bdev;
> > +   int bio_blocks;
> >
> > verify_block_addr(sbi, blk_addr);
> >
> > @@ -676,7 +677,8 @@ retry:
> > goto retry;
> > }
> >
> > -   sbi->bio[type] = f2fs_bio_alloc(bdev,
max_hw_blocks(sbi));
> > +   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
> > +   sbi->bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
> > sbi->bio[type]->bi_sector = SECTOR_FROM_BLOCK(sbi,
> > blk_addr);
> > sbi->bio[type]->bi_private = priv;
> > /*
> > diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index
> > bdd10ea..9cc95eb 100644
> > --- a/fs/f2fs/segment.h
> > +++ b/fs/f2fs/segment.h
> > @@ -9,6 +9,7 @@
> >   * published by the Free Software Foundation.
> >   */
> >  #include 
> > +#include 
> >
> >  /* constant macro */
> >  #define NULL_SEGNO ((unsigned int)(~0))
> > @@ -90,6 +91,8 @@
> > (blk_addr << ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
> >  #define SECTOR_TO_BLOCK(sbi, sectors)
> \
> > (sectors >> ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
> > +#define MAX_BIO_BLOCK(max_hw_blocks)
> \
> > +   (min((int)max_hw_blocks, UIO_MAXIOV))
> >
> >  /* during checkpoint, bio_private is used to synchronize the last bio
> > */  struct bio_private {
> > ---
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev][PATCH RESEND] f2fs: avoid allocating failure in bio_alloc

2013-09-13 Thread Chao Yu

This patch add macro MAX_BIO_BLOCKS to limit value of npages in
f2fs_bio_alloc,
it can avoid allocating failure in bio_alloc caused by npages is larger than
UIO_MAXIOV.

Signed-off-by: Yu Chao 
 ---
 fs/f2fs/segment.c |4 +++-
 fs/f2fs/segment.h |3 +++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 09af9c7..bd79bbe 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -657,6 +657,7 @@ static void submit_write_page(struct f2fs_sb_info *sbi,
struct page *page,
block_t blk_addr, enum page_type type)
 {
struct block_device *bdev = sbi->sb->s_bdev;
+   int bio_blocks;
 
verify_block_addr(sbi, blk_addr);
 
@@ -676,7 +677,8 @@ retry:
goto retry;
}
 
-   sbi->bio[type] = f2fs_bio_alloc(bdev, max_hw_blocks(sbi));
+   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+   sbi->bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
sbi->bio[type]->bi_sector = SECTOR_FROM_BLOCK(sbi,
blk_addr);
sbi->bio[type]->bi_private = priv;
/*
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index bdd10ea..6352af1 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -9,6 +9,7 @@
  * published by the Free Software Foundation.
  */
 #include 
+#include 
 
 /* constant macro */
 #define NULL_SEGNO ((unsigned int)(~0))
@@ -90,6 +91,8 @@
(blk_addr << ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
 #define SECTOR_TO_BLOCK(sbi, sectors)  \
(sectors >> ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
+#define MAX_BIO_BLOCKS(max_hw_blocks)  \
+   (min((int)max_hw_blocks, UIO_MAXIOV))
 
 /* during checkpoint, bio_private is used to synchronize the last bio */
 struct bio_private {
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev][PATCH DISCUSS] f2fs: readahead continuous sit entry pages for better mount performance

2013-09-30 Thread Chao Yu

Since f2fs mount process should scanning all valid SIT entries and reserve
information in memory for following operations during mount process, the
mount performance is worse than ext4 in embedded devices. We found a way to
improve the mount performance based on current f2fs design strategy. For the
tests on Galaxy SIII, the mount performance can be improved by 20% ~ 30%.

Considering following items:
1.  The maximum count of sit journal entries reserved in current
CURSEG_COLD_DATA segment information is 6 (SIT_JOURNAL_ENTRIES), that means,
the actual journal entries are no more than 6;
2.  Each block in SIT area can contain 55 entries (SIT_ENTRY_PER_BLOCK).
Because there are no more than 6 journal entries in checkpoint area, most
sit entries is achieved from SIT#0 or SIT#1 and all the valid sit pages are
read out for organizing all sit entries in memory. 
3.  Mostly the valid sit blocks exist in SIT#0 or SIT#1 continuously.
4.  Read multiple continuous pages within one bio is faster than read
page one by one in multiple bio.

Thinking about the items above, we tried to read multiple continues pages
within one bio for build sit entries in memory.

Following is current design of mount function build_sit_entries:
1.  Cycle from first segment to final segment;
2.  Scan all checkpoint journal entries, if the segment number is the
same as current cycle segment number, read sit and reserve in memory and go
to step 1; otherwise, continue with step 3;
3.  Read one meta page from SIT#0 or SIT#1 considering current valid
meta page bitmap and reserve sit information in memory, go to step 1;

We change the design of build_sit_entries as:
1.  Create a page_array with maximum size as max_hw_blocks(sbi) (one
page array can contain maximum size of pages). 
2.  Cycle from first SIT entry block to final SIT entry block.
3.  ra_sit_pages: read multiple continuous sit pages. If a) reached
maximum size of page_array or b) sit blocks are converted from SIT#0 to
SIT#1 or from SIT#1 to SIT#0, return to build_sit_entries; (that means, try
to read continuous pages in SIT#0 or SIT#1 within one bio)
4.  get pages that is read previously one by one, and reserve sit entry
information in memory; go to step 2;
5.  After all valid sit entries in SIT#0 or SIT#1 are reserved in
memory, free page_array, scanning all journal sit entries in checkpoint area
and cover the information to memory sit entries (sit_i->sentries). 

One more optimization is, considering most sit entries contain totally valid
blocks or totally invalid blocks in one page because of f2fs allocation and
garbage collection strategy, we changed the check function check_block_count
for sit entry:

Here is our temp patch base on f2fs of linux-next:

Signed-off-by: Tan Shu 
Reviewed-by: Li Fan < fanofcode...@samsung.com>
Reviewed-by: Yu Chao 
---
 fs/f2fs/data.c|2 +-
 fs/f2fs/f2fs.h|1 +
 fs/f2fs/segment.c |  211
+
 fs/f2fs/segment.h |   18 +
 4 files changed, 202 insertions(+), 30 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
old mode 100644
new mode 100755
index 2c02ec8..7d8e9f6
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -355,7 +355,7 @@ repeat:
return page;
 }
 
-static void read_end_io(struct bio *bio, int err)
+void read_end_io(struct bio *bio, int err)
 {
const int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
old mode 100644
new mode 100755
index 7fd99d8..9f3a784
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1117,6 +1117,7 @@ struct page *get_lock_data_page(struct inode *,
pgoff_t);
 struct page *get_new_data_page(struct inode *, struct page *, pgoff_t,
bool);
 int f2fs_readpage(struct f2fs_sb_info *, struct page *, block_t, int);
 int do_write_data_page(struct page *);
+void read_end_io(struct bio *bio, int err);
 
 /*
  * gc.c
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
old mode 100644
new mode 100755
index bd79bbe..971838d
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -14,6 +14,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #include "f2fs.h"
 #include "segment.h"
@@ -1210,21 +1213,108 @@ int lookup_journal_in_cursum(struct
f2fs_summary_block *sum, int type,
}
return -1;
 }
-
-static struct page *get_current_sit_page(struct f2fs_sb_info *sbi,
-   unsigned int segno)
+static void ra_sit_pages(struct f2fs_sb_info *sbi, 
+   struct page**
page_array, 
+   int array_size, 
+   unsigned int start, 
+   unsigned int* next, 
+   unsigned int* base)
 {
+   struct addr

[f2fs-dev] [PATCH] f2fs: avoid allocating failure in bio_alloc

2013-09-22 Thread Chao Yu

This patch add macro MAX_BIO_BLOCKS to limit value of npages in
f2fs_bio_alloc, it can avoid allocating failure in bio_alloc caused by
npages is larger than BIO_MAX_PAGES.

Signed-off-by: Yu Chao 
---
 fs/f2fs/segment.c |4 +++-
 fs/f2fs/segment.h |2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 09af9c7..bd79bbe 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -657,6 +657,7 @@ static void submit_write_page(struct f2fs_sb_info *sbi,
struct page *page,
block_t blk_addr, enum page_type type)
 {
struct block_device *bdev = sbi->sb->s_bdev;
+   int bio_blocks;
 
verify_block_addr(sbi, blk_addr);
 
@@ -676,7 +677,8 @@ retry:
goto retry;
}
 
-   sbi->bio[type] = f2fs_bio_alloc(bdev, max_hw_blocks(sbi));
+   bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
+   sbi->bio[type] = f2fs_bio_alloc(bdev, bio_blocks);
sbi->bio[type]->bi_sector = SECTOR_FROM_BLOCK(sbi,
blk_addr);
sbi->bio[type]->bi_private = priv;
/*
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index bdd10ea..7f94d78 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -90,6 +90,8 @@
(blk_addr << ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
 #define SECTOR_TO_BLOCK(sbi, sectors)  \
(sectors >> ((sbi)->log_blocksize - F2FS_LOG_SECTOR_SIZE))
+#define MAX_BIO_BLOCKS(max_hw_blocks)  \
+   (min((int)max_hw_blocks, BIO_MAX_PAGES))
 
 /* during checkpoint, bio_private is used to synchronize the last bio */
 struct bio_private {
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev] [PATCH] f2fs: remove unneeded write checkpoint in recover_fsync_data

2013-09-22 Thread Chao Yu

Previously, recover_fsync_data still to write checkpoint when there is
nothing to recover with normal umount image.
It may reduce mount performance and flash memory lifetime, so let's remove
it.

Signed-off-by: Tan Shu 
Signed-off-by: Yu Chao 
---
 fs/f2fs/recovery.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 51ef5ee..6988e1b 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -419,6 +419,7 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
 {
struct list_head inode_list;
int err;
+   int is_writecp = 0;
 
fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
sizeof(struct fsync_inode_entry), NULL);
@@ -436,6 +437,8 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
if (list_empty(&inode_list))
goto out;
 
+   is_writecp = 1;
+
/* step #2: recover data */
err = recover_data(sbi, &inode_list, CURSEG_WARM_NODE);
BUG_ON(!list_empty(&inode_list));
@@ -443,7 +446,7 @@ out:
destroy_fsync_dnodes(&inode_list);
kmem_cache_destroy(fsync_entry_slab);
sbi->por_doing = 0;
-   if (!err)
+   if (!err && is_writecp)
write_checkpoint(sbi, false);
return err;
 }
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [f2fs-dev] [PATCH] f2fs: remove unneeded write checkpoint in recover_fsync_data

2013-09-23 Thread Chao Yu

Hi Gu

> -Original Message-
> From: Gu Zheng [mailto:guz.f...@cn.fujitsu.com]
> Sent: Monday, September 23, 2013 9:54 AM
> To: Chao Yu
> Cc: Kim Jaegeuk; linux-f2fs-de...@lists.sourceforge.net;
> linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org; 谭姝
> Subject: Re: [f2fs-dev] [PATCH] f2fs: remove unneeded write checkpoint in
> recover_fsync_data
> 
> On 09/22/2013 03:51 PM, Chao Yu wrote:
> 
> > Previously, recover_fsync_data still to write checkpoint when there is
> > nothing to recover with normal umount image.
> > It may reduce mount performance and flash memory lifetime, so let's
> > remove it.
> >
> > Signed-off-by: Tan Shu 
> > Signed-off-by: Yu Chao 
> > ---
> >  fs/f2fs/recovery.c |5 -
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index
> > 51ef5ee..6988e1b 100644
> > --- a/fs/f2fs/recovery.c
> > +++ b/fs/f2fs/recovery.c
> > @@ -419,6 +419,7 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
> > {
> > struct list_head inode_list;
> > int err;
> > +   int is_writecp = 0;
> 
> "need_writecp" may be more suitable.

Okay, it increase readability. I will change it.

Thanks.
> 
> Thanks,
> Gu
> 
> >
> > fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
> > sizeof(struct fsync_inode_entry), NULL); @@ -436,6
+437,8
> @@ int
> > recover_fsync_data(struct f2fs_sb_info *sbi)
> > if (list_empty(&inode_list))
> > goto out;
> >
> > +   is_writecp = 1;
> > +
> > /* step #2: recover data */
> > err = recover_data(sbi, &inode_list, CURSEG_WARM_NODE);
> > BUG_ON(!list_empty(&inode_list));
> > @@ -443,7 +446,7 @@ out:
> > destroy_fsync_dnodes(&inode_list);
> > kmem_cache_destroy(fsync_entry_slab);
> > sbi->por_doing = 0;
> > -   if (!err)
> > +   if (!err && is_writecp)
> > write_checkpoint(sbi, false);
> > return err;
> >  }
> > ---
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[f2fs-dev] [PATCH RESEND] f2fs: remove unneeded write checkpoint in recover_fsync_data

2013-09-23 Thread Chao Yu

Previously, recover_fsync_data still to write checkpoint when there is
nothing to recover with normal umount image.
It may reduce mount performance and flash memory lifetime, so let's remove
it.

Signed-off-by: Tan Shu 
Signed-off-by: Yu Chao 
---
fs/f2fs/recovery.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 51ef5ee..d43e4cd 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -419,6 +419,7 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
 {
struct list_head inode_list;
int err;
+   int need_writecp = 0;
 
fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
sizeof(struct fsync_inode_entry), NULL);
@@ -436,6 +437,8 @@ int recover_fsync_data(struct f2fs_sb_info *sbi)
if (list_empty(&inode_list))
goto out;
 
+   need_writecp = 1;
+
/* step #2: recover data */
err = recover_data(sbi, &inode_list, CURSEG_WARM_NODE);
BUG_ON(!list_empty(&inode_list));
@@ -443,7 +446,7 @@ out:
destroy_fsync_dnodes(&inode_list);
kmem_cache_destroy(fsync_entry_slab);
sbi->por_doing = 0;
-   if (!err)
+   if (!err && need_writecp)
write_checkpoint(sbi, false);
return err;
 }
---

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] nfs: add missing CONFIG_MIGRATION for nfs_migrate_page

2016-09-19 Thread Chao Yu

We'd better to use CONFIG_MIGRATION to cover nfs_migrate_page, otherwise
when CONFIG_MIGRATION is not defined, unused nfs_migrate_page will still
be compiled into kernel.

Signed-off-by: Chao Yu 
---
 fs/nfs/file.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 7d62097..6cfb83e 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -543,7 +543,9 @@ const struct address_space_operations nfs_file_aops = {
.invalidatepage = nfs_invalidate_page,
.releasepage = nfs_release_page,
.direct_IO = nfs_direct_IO,
+#ifdef CONFIG_MIGRATION
.migratepage = nfs_migrate_page,
+#endif
.launder_page = nfs_launder_page,
.is_dirty_writeback = nfs_check_dirty_writeback,
.error_remove_page = generic_error_remove_page,
-- 
2.8.2.311.gee88674

[PATCH] gfs2: fix to detect failure of register_shrinker

2016-09-19 Thread Chao Yu

From: Chao Yu 

register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node
deferred work"), we should detect the failure of it, otherwise we may
fail to register shrinker after gfs2 module was been inited successfully.

Signed-off-by: Chao Yu 
---
 fs/gfs2/glock.c | 8 +++-
 fs/gfs2/main.c  | 4 +++-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c8e2e7f..14cbf60 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1781,7 +1781,13 @@ int __init gfs2_glock_init(void)
return -ENOMEM;
}
 
-   register_shrinker(&glock_shrinker);
+   ret = register_shrinker(&glock_shrinker);
+   if (ret) {
+   destroy_workqueue(gfs2_delete_workqueue);
+   destroy_workqueue(glock_workqueue);
+   rhashtable_destroy(&gl_hash_table);
+   return ret;
+   }
 
return 0;
 }
diff --git a/fs/gfs2/main.c b/fs/gfs2/main.c
index 74fd013..67d1fc4 100644
--- a/fs/gfs2/main.c
+++ b/fs/gfs2/main.c
@@ -145,7 +145,9 @@ static int __init init_gfs2_fs(void)
if (!gfs2_qadata_cachep)
goto fail;
 
-   register_shrinker(&gfs2_qd_shrinker);
+   error = register_shrinker(&gfs2_qd_shrinker);
+   if (error)
+   goto fail;
 
error = register_filesystem(&gfs2_fs_type);
if (error)
-- 
2.7.2

Re: [PATCH] nfs: add missing CONFIG_MIGRATION for nfs_migrate_page

2016-09-19 Thread Chao Yu

Hi Anna,

On 2016/9/20 1:38, Anna Schumaker wrote:
> Hi Chao,
> 
> On 09/19/2016 08:09 AM, Chao Yu wrote:
>> We'd better to use CONFIG_MIGRATION to cover nfs_migrate_page, otherwise
>> when CONFIG_MIGRATION is not defined, unused nfs_migrate_page will still
>> be compiled into kernel.
> 
> I don't think that nfs_migrate_page is still compiled into the kernel when 
> CONFIG_MIGRATION=n.  The file fs/nfs/internal.h has:
> 
>   #ifdef CONFIG_MIGRATION
>   extern int nfs_migrate_page(struct address_space *,
>   struct page *, struct page *, enum migrate_mode);
>   #else
>   #define nfs_migrate_page NULL
>   #endif
> 
> So it looks like we're just setting the variable to a NULL pointer in this 
> case.  

Oh, thank you for correcting me, I think I'm missing that part.

> I'm not opposed to your change, since it better matches how we've done things 
> in other parts of the client, but can you please clean up internal.h while 
> you're at it?

OK, let me change commit log and do more cleanup like you suggested above.

Thanks,

> 
> Thanks,
> Anna
> 
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/nfs/file.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>> index 7d62097..6cfb83e 100644
>> --- a/fs/nfs/file.c
>> +++ b/fs/nfs/file.c
>> @@ -543,7 +543,9 @@ const struct address_space_operations nfs_file_aops = {
>>  .invalidatepage = nfs_invalidate_page,
>>  .releasepage = nfs_release_page,
>>  .direct_IO = nfs_direct_IO,
>> +#ifdef CONFIG_MIGRATION
>>  .migratepage = nfs_migrate_page,
>> +#endif
>>  .launder_page = nfs_launder_page,
>>  .is_dirty_writeback = nfs_check_dirty_writeback,
>>  .error_remove_page = generic_error_remove_page,
>>
> 
> 
> .
>

Re: [PATCH 3/6] f2fs: fix to avoid race condition when updating sbi flag

2016-09-19 Thread Chao Yu

Hi Jaegeuk,

On 2016/9/20 5:40, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Sun, Sep 18, 2016 at 11:30:05PM +0800, Chao Yu wrote:
>> From: Chao Yu 
>>
>> Making updating of sbi flag atomic by using {test,set,clear}_bit,
>> otherwise in concurrency scenario, the flag could be updated incorrectly.
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/f2fs.h | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>> index 9b4bbf2..c30f744b 100644
>> --- a/fs/f2fs/f2fs.h
>> +++ b/fs/f2fs/f2fs.h
>> @@ -794,7 +794,7 @@ struct f2fs_sb_info {
>>  struct proc_dir_entry *s_proc;  /* proc entry */
>>  struct f2fs_super_block *raw_super; /* raw super block pointer */
>>  int valid_super_block;  /* valid super block no */
>> -int s_flag; /* flags for sbi */
>> +unsigned long s_flag;   /* flags for sbi */
>>  
>>  #ifdef CONFIG_F2FS_FS_ENCRYPTION
>>  u8 key_prefix[F2FS_KEY_DESC_PREFIX_SIZE];
>> @@ -1063,17 +1063,19 @@ static inline struct address_space 
>> *NODE_MAPPING(struct f2fs_sb_info *sbi)
>>  
>>  static inline bool is_sbi_flag_set(struct f2fs_sb_info *sbi, unsigned int 
>> type)
>>  {
>> -return sbi->s_flag & (0x01 << type);
>> +return test_bit(type, &sbi->s_flag);
>>  }
>>  
>>  static inline void set_sbi_flag(struct f2fs_sb_info *sbi, unsigned int type)
>>  {
>> -sbi->s_flag |= (0x01 << type);
>> +if (!test_bit(type, &sbi->s_flag))
>> +set_bit(type, &sbi->s_flag);
> 
> The set_bit() is enough, no?

It seems OK to me, let me send v2.

Thanks,

> 
>>  }
>>  
>>  static inline void clear_sbi_flag(struct f2fs_sb_info *sbi, unsigned int 
>> type)
>>  {
>> -sbi->s_flag &= ~(0x01 << type);
>> +if (test_bit(type, &sbi->s_flag))
>> +clear_bit(type, &sbi->s_flag);
> 
> ditto.
> 
> Thanks,
> 
>>  }
>>  
>>  static inline unsigned long long cur_cp_version(struct f2fs_checkpoint *cp)
>> -- 
>> 2.7.2
> 
> .
>

Re: [PATCH 4/6] f2fs: introduce cp_lock to protect updating of ckpt_flags

2016-09-19 Thread Chao Yu

Hi Jaegeuk,

On 2016/9/20 5:49, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Sun, Sep 18, 2016 at 11:30:06PM +0800, Chao Yu wrote:
>> From: Chao Yu 
>>
>> This patch introduces spinlock to protect updating process of ckpt_flags
>> field in struct f2fs_checkpoint, it avoids incorrectly updating in race
>> condition.
> 
> So, I'm seeing a race condition between f2fs_stop_checkpoint(),
> write_checkpoint(), and f2fs_fill_super().
> 
> How about just adding f2fs_lock_op() in f2fs_stop_checkpoint()?

I'm afraid there will be a potential deadlock here:

Thread AThread B
- write_checkpoint
 - block_operations
  - f2fs_lock_all
 - do_checkpoint
 - wait_on_all_pages_writeback
- f2fs_write_end_io
 - f2fs_stop_checkpoint
  - f2fs_lock_op
 - end_page_writeback
 - atomic_dec_and_test
     - wake_up

Right?

Thanks,

> 
> Thanks,
> 
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/checkpoint.c | 24 
>>  fs/f2fs/f2fs.h   | 29 +
>>  fs/f2fs/recovery.c   |  2 +-
>>  fs/f2fs/segment.c|  4 ++--
>>  fs/f2fs/super.c  |  5 +++--
>>  5 files changed, 39 insertions(+), 25 deletions(-)
>>
>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>> index df56a43..0338f8c 100644
>> --- a/fs/f2fs/checkpoint.c
>> +++ b/fs/f2fs/checkpoint.c
>> @@ -28,7 +28,7 @@ struct kmem_cache *inode_entry_slab;
>>  
>>  void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io)
>>  {
>> -set_ckpt_flags(sbi->ckpt, CP_ERROR_FLAG);
>> +set_ckpt_flags(sbi, CP_ERROR_FLAG);
>>  sbi->sb->s_flags |= MS_RDONLY;
>>  if (!end_io)
>>  f2fs_flush_merged_bios(sbi);
>> @@ -571,7 +571,7 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
>>  block_t start_blk, orphan_blocks, i, j;
>>  int err;
>>  
>> -if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG))
>> +if (!is_set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG))
>>  return 0;
>>  
>>  start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
>> @@ -595,7 +595,7 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
>>  f2fs_put_page(page, 1);
>>  }
>>  /* clear Orphan Flag */
>> -clear_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG);
>> +clear_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG);
>>  return 0;
>>  }
>>  
>> @@ -1054,9 +1054,9 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
>> struct cp_control *cpc)
>>  /* 2 cp  + n data seg summary + orphan inode blocks */
>>  data_sum_blocks = npages_for_summary_flush(sbi, false);
>>  if (data_sum_blocks < NR_CURSEG_DATA_TYPE)
>> -set_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
>> +set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG);
>>  else
>> -clear_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
>> +clear_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG);
>>  
>>  orphan_blocks = GET_ORPHAN_BLOCKS(orphan_num);
>>  ckpt->cp_pack_start_sum = cpu_to_le32(1 + cp_payload_blks +
>> @@ -1072,22 +1072,22 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
>> struct cp_control *cpc)
>>  orphan_blocks);
>>  
>>  if (cpc->reason == CP_UMOUNT)
>> -set_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
>> +set_ckpt_flags(sbi, CP_UMOUNT_FLAG);
>>  else
>> -clear_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
>> +clear_ckpt_flags(sbi, CP_UMOUNT_FLAG);
>>  
>>  if (cpc->reason == CP_FASTBOOT)
>> -set_ckpt_flags(ckpt, CP_FASTBOOT_FLAG);
>> +set_ckpt_flags(sbi, CP_FASTBOOT_FLAG);
>>  else
>> -clear_ckpt_flags(ckpt, CP_FASTBOOT_FLAG);
>> +clear_ckpt_flags(sbi, CP_FASTBOOT_FLAG);
>>  
>>  if (orphan_num)
>> -set_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);
>> +set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG);
>>  else
>> -clear_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);
>> +clear_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG);
>>  
>>  if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
>> -set_ckpt_flags(ckpt, CP_FSCK_FLAG);
>> +set_ckpt_flags(sbi, CP_FSCK_FLAG);
>>  
>>

Re: [PATCH] f2fs: fix to avoid slowing down background gc

2016-09-19 Thread Chao Yu

Hi Jaegeuk,

On 2016/9/20 6:12, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Sun, Sep 18, 2016 at 07:52:27PM +0800, Chao Yu wrote:
>> Previously, we will choose to speed up background gc when the below
>> conditions are both satisfied:
>> a. There are a number of invalid blocks
>> b. There is not enough free space
>>
>> But, when space utilization is high (utilization > 60%), there will be
>> not enough invalid blocks, result in slowing down background gc, after
>> then there are more opportunities that triggering foreground gc due to
>> high fragmented free space in fs.
>>
>> Remove condition a) in order to avoid slow down background gc speed in
>> a high utilization fs.
> 
> There exists a trade-off here: wear-out vs. eager gc for future speed-up.
> How about using a kind of f2fs's dirty level (e.g., BDF)?

Yep, I think that f2fs can implement a mechanism which can provide more
dynamically adjustable GC speed in the specified scenario of user, by this, user
can choose the strategy which is more beneficial to aspect
(wear-out/performance) they care. Let me think a while, anyway I agree that BDF
is a good reference value here.

And Before we can provide above ability, how about treat this patch as a fixing
patch, since it fixes to not adjust speed of GC according to utilization 
watermark?

Thanks,

> 
> Thanks,
> 
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/gc.h | 18 +++---
>>  1 file changed, 3 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
>> index a993967..5d0a19c 100644
>> --- a/fs/f2fs/gc.h
>> +++ b/fs/f2fs/gc.h
>> @@ -16,7 +16,6 @@
>>  #define DEF_GC_THREAD_MIN_SLEEP_TIME3   /* milliseconds */
>>  #define DEF_GC_THREAD_MAX_SLEEP_TIME6
>>  #define DEF_GC_THREAD_NOGC_SLEEP_TIME   30  /* wait 5 min */
>> -#define LIMIT_INVALID_BLOCK 40 /* percentage over total user space */
>>  #define LIMIT_FREE_BLOCK40 /* percentage over invalid + free space */
>>  
>>  /* Search max. number of dirty segments to select a victim segment */
>> @@ -52,11 +51,6 @@ static inline block_t free_user_blocks(struct 
>> f2fs_sb_info *sbi)
>>  << sbi->log_blocks_per_seg;
>>  }
>>  
>> -static inline block_t limit_invalid_user_blocks(struct f2fs_sb_info *sbi)
>> -{
>> -return (long)(sbi->user_block_count * LIMIT_INVALID_BLOCK) / 100;
>> -}
>> -
>>  static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi)
>>  {
>>  block_t reclaimable_user_blocks = sbi->user_block_count -
>> @@ -88,15 +82,9 @@ static inline void decrease_sleep_time(struct 
>> f2fs_gc_kthread *gc_th,
>>  
>>  static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi)
>>  {
>> -block_t invalid_user_blocks = sbi->user_block_count -
>> -written_block_count(sbi);
>>  /*
>> - * Background GC is triggered with the following conditions.
>> - * 1. There are a number of invalid blocks.
>> - * 2. There is not enough free space.
>> + * Background GC should speed up when there is not enough free blocks
>> + * in total unused (free + invalid) blocks.
>>   */
>> -if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
>> -free_user_blocks(sbi) < limit_free_user_blocks(sbi))
>> -return true;
>> -return false;
>> +return free_user_blocks(sbi) < limit_free_user_blocks(sbi);
>>  }
>> -- 
>> 2.8.2.311.gee88674
> 
> .
>

[PATCH v2 3/6] f2fs: fix to avoid race condition when updating sbi flag

2016-09-19 Thread Chao Yu

Making updating of sbi flag atomic by using {test,set,clear}_bit,
otherwise in concurrency scenario, the flag could be updated incorrectly.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 9b4bbf2..fc57794 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -794,7 +794,7 @@ struct f2fs_sb_info {
struct proc_dir_entry *s_proc;  /* proc entry */
struct f2fs_super_block *raw_super; /* raw super block pointer */
int valid_super_block;  /* valid super block no */
-   int s_flag; /* flags for sbi */
+   unsigned long s_flag;   /* flags for sbi */
 
 #ifdef CONFIG_F2FS_FS_ENCRYPTION
u8 key_prefix[F2FS_KEY_DESC_PREFIX_SIZE];
@@ -1063,17 +1063,17 @@ static inline struct address_space *NODE_MAPPING(struct 
f2fs_sb_info *sbi)
 
 static inline bool is_sbi_flag_set(struct f2fs_sb_info *sbi, unsigned int type)
 {
-   return sbi->s_flag & (0x01 << type);
+   return test_bit(type, &sbi->s_flag);
 }
 
 static inline void set_sbi_flag(struct f2fs_sb_info *sbi, unsigned int type)
 {
-   sbi->s_flag |= (0x01 << type);
+   set_bit(type, &sbi->s_flag);
 }
 
 static inline void clear_sbi_flag(struct f2fs_sb_info *sbi, unsigned int type)
 {
-   sbi->s_flag &= ~(0x01 << type);
+   clear_bit(type, &sbi->s_flag);
 }
 
 static inline unsigned long long cur_cp_version(struct f2fs_checkpoint *cp)
-- 
2.8.2.311.gee88674

[PATCH] raid5: fix to detect failure of register_shrinker

2016-09-19 Thread Chao Yu

register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node
deferred work"), we should detect the failure of it, otherwise we may
fail to register shrinker after raid5 configuration was setup successfully.

Signed-off-by: Chao Yu 
---
 drivers/md/raid5.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 766c3b7..b819a9a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6632,7 +6632,12 @@ static struct r5conf *setup_conf(struct mddev *mddev)
conf->shrinker.count_objects = raid5_cache_count;
conf->shrinker.batch = 128;
conf->shrinker.flags = 0;
-   register_shrinker(&conf->shrinker);
+   if (register_shrinker(&conf->shrinker)) {
+   printk(KERN_ERR
+  "md/raid:%s: couldn't register shrinker.\n",
+  mdname(mddev));
+   goto abort;
+   }
 
sprintf(pers_name, "raid%d", mddev->new_level);
conf->thread = md_register_thread(raid5d, mddev, pers_name);
-- 
2.7.2

Re: [PATCH 4/6] f2fs: introduce cp_lock to protect updating of ckpt_flags

2016-09-19 Thread Chao Yu

On 2016/9/20 10:41, Jaegeuk Kim wrote:
> On Tue, Sep 20, 2016 at 09:47:20AM +0800, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2016/9/20 5:49, Jaegeuk Kim wrote:
>>> Hi Chao,
>>>
>>> On Sun, Sep 18, 2016 at 11:30:06PM +0800, Chao Yu wrote:
>>>> From: Chao Yu 
>>>>
>>>> This patch introduces spinlock to protect updating process of ckpt_flags
>>>> field in struct f2fs_checkpoint, it avoids incorrectly updating in race
>>>> condition.
>>>
>>> So, I'm seeing a race condition between f2fs_stop_checkpoint(),
>>> write_checkpoint(), and f2fs_fill_super().
>>>
>>> How about just adding f2fs_lock_op() in f2fs_stop_checkpoint()?
>>
>> I'm afraid there will be a potential deadlock here:
>>
>> Thread A Thread B
>> - write_checkpoint
>>  - block_operations
>>   - f2fs_lock_all
>>  - do_checkpoint
>>  - wait_on_all_pages_writeback
>>  - f2fs_write_end_io
>>   - f2fs_stop_checkpoint
>>- f2fs_lock_op
>>   - end_page_writeback
>>   - atomic_dec_and_test
>>   - wake_up
>>
>> Right?
> 
> Okay, I see. Let me try to understand in more details.
> Basically, there'd be no problem if there is no f2fs_stop_checkpoint(), since
> every {set|clear}_ckpt_flags() are called under f2fs_lock_op() or in
> fill_super(). And, you're probably concerned about any breakage of ckpt->flags
> due to accidental f2fs_stop_checkpoint() trigger. So, we're able to lose
> ERROR_FLAG because of data race.
> 
> Oh, I found one potential corruption case in f2fs_write_checkpoint().
> Before writing the last checkpoint block, we used to check its IO error.
> But, if set_ckpt_flags() and f2fs_stop_checkpoint() were called concurrently,
> ckpt_flags was able to lose ERROR_FLAG, resulting in finalizing its checkpoint
> pack. So, we can get valid checkpoint pack with EIO'ed metadata block.

That's right.

> 
> BTW, we can do multiple set/clear flags in do_checkpoint() with single 
> spin_lock
> call, tho.

Agree, let me refactor the code. :)

Thanks,

> 
> Thanks,
> 
>>>
>>>> Signed-off-by: Chao Yu 
>>>> ---
>>>>  fs/f2fs/checkpoint.c | 24 
>>>>  fs/f2fs/f2fs.h   | 29 +
>>>>  fs/f2fs/recovery.c   |  2 +-
>>>>  fs/f2fs/segment.c|  4 ++--
>>>>  fs/f2fs/super.c  |  5 +++--
>>>>  5 files changed, 39 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
>>>> index df56a43..0338f8c 100644
>>>> --- a/fs/f2fs/checkpoint.c
>>>> +++ b/fs/f2fs/checkpoint.c
>>>> @@ -28,7 +28,7 @@ struct kmem_cache *inode_entry_slab;
>>>>  
>>>>  void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io)
>>>>  {
>>>> -  set_ckpt_flags(sbi->ckpt, CP_ERROR_FLAG);
>>>> +  set_ckpt_flags(sbi, CP_ERROR_FLAG);
>>>>sbi->sb->s_flags |= MS_RDONLY;
>>>>if (!end_io)
>>>>f2fs_flush_merged_bios(sbi);
>>>> @@ -571,7 +571,7 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
>>>>block_t start_blk, orphan_blocks, i, j;
>>>>int err;
>>>>  
>>>> -  if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG))
>>>> +  if (!is_set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG))
>>>>return 0;
>>>>  
>>>>start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
>>>> @@ -595,7 +595,7 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
>>>>f2fs_put_page(page, 1);
>>>>}
>>>>/* clear Orphan Flag */
>>>> -  clear_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG);
>>>> +  clear_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG);
>>>>return 0;
>>>>  }
>>>>  
>>>> @@ -1054,9 +1054,9 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
>>>> struct cp_control *cpc)
>>>>/* 2 cp  + n data seg summary + orphan inode blocks */
>>>>data_sum_blocks = npages_for_summary_flush(sbi, false);
>>>>if (data_sum_blocks < NR_CURSEG_DATA_TYPE)
>>>> -  set_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);

[PATCH v2 4/6] f2fs: introduce cp_lock to protect updating of ckpt_flags

2016-09-19 Thread Chao Yu

This patch introduces spinlock to protect updating process of ckpt_flags
field in struct f2fs_checkpoint, it avoids incorrectly updating in race
condition.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 28 
 fs/f2fs/f2fs.h   | 37 +
 fs/f2fs/recovery.c   |  2 +-
 fs/f2fs/segment.c|  4 ++--
 fs/f2fs/super.c  |  5 +++--
 5 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index a366521..bc93afd 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -28,7 +28,7 @@ struct kmem_cache *inode_entry_slab;
 
 void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io)
 {
-   set_ckpt_flags(sbi->ckpt, CP_ERROR_FLAG);
+   set_ckpt_flags(sbi, CP_ERROR_FLAG);
sbi->sb->s_flags |= MS_RDONLY;
if (!end_io)
f2fs_flush_merged_bios(sbi);
@@ -574,7 +574,7 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
block_t start_blk, orphan_blocks, i, j;
int err;
 
-   if (!is_set_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG))
+   if (!is_set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG))
return 0;
 
start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
@@ -598,7 +598,7 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
f2fs_put_page(page, 1);
}
/* clear Orphan Flag */
-   clear_ckpt_flags(F2FS_CKPT(sbi), CP_ORPHAN_PRESENT_FLAG);
+   clear_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG);
return 0;
 }
 
@@ -1056,10 +1056,12 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
 
/* 2 cp  + n data seg summary + orphan inode blocks */
data_sum_blocks = npages_for_summary_flush(sbi, false);
+   spin_lock(&sbi->cp_lock);
if (data_sum_blocks < NR_CURSEG_DATA_TYPE)
-   set_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
+   __set_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
else
-   clear_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
+   __clear_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
+   spin_unlock(&sbi->cp_lock);
 
orphan_blocks = GET_ORPHAN_BLOCKS(orphan_num);
ckpt->cp_pack_start_sum = cpu_to_le32(1 + cp_payload_blks +
@@ -1074,23 +1076,25 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
cp_payload_blks + data_sum_blocks +
orphan_blocks);
 
+   spin_lock(&sbi->cp_lock);
if (cpc->reason == CP_UMOUNT)
-   set_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
+   __set_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
else
-   clear_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
+   __clear_ckpt_flags(ckpt, CP_UMOUNT_FLAG);
 
if (cpc->reason == CP_FASTBOOT)
-   set_ckpt_flags(ckpt, CP_FASTBOOT_FLAG);
+   __set_ckpt_flags(ckpt, CP_FASTBOOT_FLAG);
else
-   clear_ckpt_flags(ckpt, CP_FASTBOOT_FLAG);
+   __clear_ckpt_flags(ckpt, CP_FASTBOOT_FLAG);
 
if (orphan_num)
-   set_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);
+   __set_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);
else
-   clear_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);
+   __clear_ckpt_flags(ckpt, CP_ORPHAN_PRESENT_FLAG);
 
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
-   set_ckpt_flags(ckpt, CP_FSCK_FLAG);
+   __set_ckpt_flags(ckpt, CP_FSCK_FLAG);
+   spin_unlock(&sbi->cp_lock);
 
/* update SIT/NAT bitmap */
get_sit_bitmap(sbi, __bitmap_ptr(sbi, SIT_BITMAP));
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 53da455..7803808 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -817,6 +817,7 @@ struct f2fs_sb_info {
 
/* for checkpoint */
struct f2fs_checkpoint *ckpt;   /* raw checkpoint pointer */
+   spinlock_t cp_lock; /* for flag in ckpt */
struct inode *meta_inode;   /* cache meta blocks */
struct mutex cp_mutex;  /* checkpoint procedure lock */
struct rw_semaphore cp_rwsem;   /* blocking FS operations */
@@ -1084,26 +1085,46 @@ static inline unsigned long long cur_cp_version(struct 
f2fs_checkpoint *cp)
return le64_to_cpu(cp->checkpoint_ver);
 }
 
-static inline bool is_set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int 
f)
+static inline bool is_set_ckpt_flags(struct f2fs_sb_info *sbi, unsigned int f)
 {
+   struct f2fs_checkpoint *cp = F2FS_CKPT(sbi);
unsigned int ckpt_flags = le32_to_cpu(cp->ckpt_flags);
+
return ckpt_flags & f;
 }
 
-static inline void set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f)
+static inline void __set_ckpt_flags(struct f2fs_checkpoint *cp, unsigned int f)
 {
-

Re: [PATCH] f2fs: fix to avoid slowing down background gc

2016-09-19 Thread Chao Yu

On 2016/9/20 10:54, Jaegeuk Kim wrote:
> On Tue, Sep 20, 2016 at 10:22:22AM +0800, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2016/9/20 6:12, Jaegeuk Kim wrote:
>>> Hi Chao,
>>>
>>> On Sun, Sep 18, 2016 at 07:52:27PM +0800, Chao Yu wrote:
>>>> Previously, we will choose to speed up background gc when the below
>>>> conditions are both satisfied:
>>>> a. There are a number of invalid blocks
>>>> b. There is not enough free space
>>>>
>>>> But, when space utilization is high (utilization > 60%), there will be
>>>> not enough invalid blocks, result in slowing down background gc, after
>>>> then there are more opportunities that triggering foreground gc due to
>>>> high fragmented free space in fs.
>>>>
>>>> Remove condition a) in order to avoid slow down background gc speed in
>>>> a high utilization fs.
>>>
>>> There exists a trade-off here: wear-out vs. eager gc for future speed-up.
>>> How about using a kind of f2fs's dirty level (e.g., BDF)?
>>
>> Yep, I think that f2fs can implement a mechanism which can provide more
>> dynamically adjustable GC speed in the specified scenario of user, by this, 
>> user
>> can choose the strategy which is more beneficial to aspect
>> (wear-out/performance) they care. Let me think a while, anyway I agree that 
>> BDF
>> is a good reference value here.
>>
>> And Before we can provide above ability, how about treat this patch as a 
>> fixing
>> patch, since it fixes to not adjust speed of GC according to utilization 
>> watermark?
> 
> Well, this is not a bug fix, but a very conservative policy. So, please let's
> make a better policy, if possible.

Alright, let's think about this.

Thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>
>>>>
>>>> Signed-off-by: Chao Yu 
>>>> ---
>>>>  fs/f2fs/gc.h | 18 +++---
>>>>  1 file changed, 3 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
>>>> index a993967..5d0a19c 100644
>>>> --- a/fs/f2fs/gc.h
>>>> +++ b/fs/f2fs/gc.h
>>>> @@ -16,7 +16,6 @@
>>>>  #define DEF_GC_THREAD_MIN_SLEEP_TIME  3   /* milliseconds */
>>>>  #define DEF_GC_THREAD_MAX_SLEEP_TIME  6
>>>>  #define DEF_GC_THREAD_NOGC_SLEEP_TIME 30  /* wait 5 min */
>>>> -#define LIMIT_INVALID_BLOCK   40 /* percentage over total user space 
>>>> */
>>>>  #define LIMIT_FREE_BLOCK  40 /* percentage over invalid + free space */
>>>>  
>>>>  /* Search max. number of dirty segments to select a victim segment */
>>>> @@ -52,11 +51,6 @@ static inline block_t free_user_blocks(struct 
>>>> f2fs_sb_info *sbi)
>>>><< sbi->log_blocks_per_seg;
>>>>  }
>>>>  
>>>> -static inline block_t limit_invalid_user_blocks(struct f2fs_sb_info *sbi)
>>>> -{
>>>> -  return (long)(sbi->user_block_count * LIMIT_INVALID_BLOCK) / 100;
>>>> -}
>>>> -
>>>>  static inline block_t limit_free_user_blocks(struct f2fs_sb_info *sbi)
>>>>  {
>>>>block_t reclaimable_user_blocks = sbi->user_block_count -
>>>> @@ -88,15 +82,9 @@ static inline void decrease_sleep_time(struct 
>>>> f2fs_gc_kthread *gc_th,
>>>>  
>>>>  static inline bool has_enough_invalid_blocks(struct f2fs_sb_info *sbi)
>>>>  {
>>>> -  block_t invalid_user_blocks = sbi->user_block_count -
>>>> -  written_block_count(sbi);
>>>>/*
>>>> -   * Background GC is triggered with the following conditions.
>>>> -   * 1. There are a number of invalid blocks.
>>>> -   * 2. There is not enough free space.
>>>> +   * Background GC should speed up when there is not enough free blocks
>>>> +   * in total unused (free + invalid) blocks.
>>>> */
>>>> -  if (invalid_user_blocks > limit_invalid_user_blocks(sbi) &&
>>>> -  free_user_blocks(sbi) < limit_free_user_blocks(sbi))
>>>> -  return true;
>>>> -  return false;
>>>> +  return free_user_blocks(sbi) < limit_free_user_blocks(sbi);
>>>>  }
>>>> -- 
>>>> 2.8.2.311.gee88674
>>>
>>> .
>>>
> 
> .
>

[PATCH] nfs: cover ->migratepage with CONFIG_MIGRATION

2016-09-19 Thread Chao Yu

It will be more clean to use CONFIG_MIGRATION to cover nfs' private
.migratepage in nfs_file_aops like we do in other part of nfs
operations.

Signed-off-by: Chao Yu 
---
 fs/nfs/file.c | 2 ++
 fs/nfs/internal.h | 8 
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 7d62097..6cfb83e 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -543,7 +543,9 @@ const struct address_space_operations nfs_file_aops = {
.invalidatepage = nfs_invalidate_page,
.releasepage = nfs_release_page,
.direct_IO = nfs_direct_IO,
+#ifdef CONFIG_MIGRATION
.migratepage = nfs_migrate_page,
+#endif
.launder_page = nfs_launder_page,
.is_dirty_writeback = nfs_check_dirty_writeback,
.error_remove_page = generic_error_remove_page,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 7ce5e02..0d508f7 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -532,14 +532,6 @@ void nfs_clear_pnfs_ds_commit_verifiers(struct 
pnfs_ds_commit_info *cinfo)
 }
 #endif
 
-
-#ifdef CONFIG_MIGRATION
-extern int nfs_migrate_page(struct address_space *,
-   struct page *, struct page *, enum migrate_mode);
-#else
-#define nfs_migrate_page NULL
-#endif
-
 static inline int
 nfs_write_verifier_cmp(const struct nfs_write_verifier *v1,
const struct nfs_write_verifier *v2)
-- 
2.8.2.311.gee88674

Re: [PATCH] nfs: cover ->migratepage with CONFIG_MIGRATION

2016-09-19 Thread Chao Yu

On 2016/9/20 20:51, kbuild test robot wrote:
>>> fs/nfs/file.c:547:17: error: 'nfs_migrate_page' undeclared here (not in a 
>>> function)
>  .migratepage = nfs_migrate_page,

Oops :(, sorry for my mistake, let me fix this.

Thanks,

[PATCH v2] nfs: cover ->migratepage with CONFIG_MIGRATION

2016-09-19 Thread Chao Yu

It will be more clean to use CONFIG_MIGRATION to cover nfs' private
.migratepage in nfs_file_aops like we do in other part of nfs
operations.

Signed-off-by: Chao Yu 
---
 fs/nfs/file.c | 2 ++
 fs/nfs/internal.h | 3 ---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 7d62097..6cfb83e 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -543,7 +543,9 @@ const struct address_space_operations nfs_file_aops = {
.invalidatepage = nfs_invalidate_page,
.releasepage = nfs_release_page,
.direct_IO = nfs_direct_IO,
+#ifdef CONFIG_MIGRATION
.migratepage = nfs_migrate_page,
+#endif
.launder_page = nfs_launder_page,
.is_dirty_writeback = nfs_check_dirty_writeback,
.error_remove_page = generic_error_remove_page,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 7ce5e02..2f1af3a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -532,12 +532,9 @@ void nfs_clear_pnfs_ds_commit_verifiers(struct 
pnfs_ds_commit_info *cinfo)
 }
 #endif
 
-
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
-#else
-#define nfs_migrate_page NULL
 #endif
 
 static inline int
-- 
2.8.2.311.gee88674

Re: [f2fs-dev] [PATCH 1/2] f2fs: use crc and cp version to determine roll-forward recovery

2016-09-20 Thread Chao Yu

Hi Jaegeuk,

On 2016/9/20 10:55, Jaegeuk Kim wrote:
> Previously, we used cp_version only to detect recoverable dnodes.
> In order to avoid same garbage cp_version, we needed to truncate the next
> dnode during checkpoint, resulting in additional discard or data write.
> If we can distinguish this by using crc in addition to cp_version, we can
> remove this overhead.
> 
> There is backward compatibility concern where it changes node_footer layout.
> But, it only affects the direct nodes written after the last checkpoint.
> We simply expect that user would change kernel versions back and forth after
> stable checkpoint.

With it, tests/generic/050 of fstest will fail:

 setting device read-only
 mounting filesystem that needs recovery on a read-only device:
 mount: SCRATCH_DEV is write-protected, mounting read-only
-mount: cannot mount SCRATCH_DEV read-only
 unmounting read-only filesystem
-umount: SCRATCH_DEV: not mounted
 mounting filesystem with -o norecovery on a read-only device:

Could you have a look at it?

Thanks,

Re: [f2fs-dev] [PATCH 1/4] f2fs: assign return value in f2fs_gc

2016-09-22 Thread Chao Yu

On 2016/9/22 11:54, Jaegeuk Kim wrote:
> This patch adds a return value of write_checkpoint for f2fs_gc.
> 
> Signed-off-by: Jaegeuk Kim 

Please add this in all patches of this serials.

Reviewed-by: Chao Yu

[PATCH 1/3] f2fs: adjust display format of segment bit

2016-09-22 Thread Chao Yu

Just adjust segment bit info printed in procfs.

Before:
1008  5|0  |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1009  3|183|0 0 61 20 20 0 0 21 80 c0 2 e4 e 54 0 21 21 17 a 44 d0 28 e4 50 
40 30 8 0 2d 32 0 5 b0 80 1 43 2 8e f8 7b 2 25 93 bf e0 73 8e 9a 19 44 60 ff e4 
cc e6 8e bf f9 ff 5 3d 31 3d 13
1010  3|1  |0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0 0 40 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

After:
1008  5|0  | 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1009  4|434| ff 7d ff bf d9 3f ff e7 ff bf d7 bf ff bb be ff fb df f7 fb fa 
bf fb fe bb df dd ff fe ef ff fe ef e2 27 bf ab bf fb df fd bd bf fb db fc ff 
ff 3f ff ff bf ff 5f db 3f fb fb bf fb bf 4f ff ef
1010  4|422| ff bb fe ff ef d7 ee ff ff fc bf ef 7d eb ec fd fb 3f 97 7f ef 
ff af ff db ff ff 69 bf ff f6 e7 ff fb f7 7b fb df be ff ff ef f3 fe ff ff df 
fe f7 fa ff b7 77 be fe fb a9 7f 87 a2 ac c7 ff 75

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index e7bb153..6426855 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -954,7 +954,7 @@ static int segment_bits_seq_show(struct seq_file *seq, void 
*offset)
seq_printf(seq, "%d|%-3u|", se->type,
get_valid_blocks(sbi, i, 1));
for (j = 0; j < SIT_VBLOCK_MAP_SIZE; j++)
-   seq_printf(seq, "%x ", se->cur_valid_map[j]);
+   seq_printf(seq, " %.2x", se->cur_valid_map[j]);
seq_putc(seq, '\n');
}
return 0;
-- 
2.8.2.311.gee88674

[PATCH 3/3] f2fs: fix potential deadlock when hitting checkpoint error

2016-09-22 Thread Chao Yu

tests/generic/013 of fstest suit complains us with below dmesg when we
trigger checkpoint error injection in f2fs.

F2FS-fs : inject checkpoint error in sync_node_pages+0x69f/0x6f0 [f2fs]
F2FS-fs (zram0): Cannot recover all fsync data errno=-5
INFO: task mount:97685 blocked for more than 120 seconds.
  Tainted: G   OE   4.8.0-rc4 #11
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount   D 8801c1bf7960 0 97685  97397 0x0008
 8801c1bf7960 8801c1bf7930 88017590 8801c1bf7980
 8801c1bf8000  7fff 88021f7be340
 817c8880 8801c1bf7978 817c80a5 880214f58fc0
Call Trace:
 [] ? bit_wait+0x50/0x50
 [] schedule+0x35/0x80
 [] schedule_timeout+0x292/0x3d0
 [] ? xen_clocksource_get_cycles+0x15/0x20
 [] ? ktime_get+0x3c/0xb0
 [] ? bit_wait+0x50/0x50
 [] io_schedule_timeout+0xa6/0x110
 [] bit_wait_io+0x1b/0x60
 [] __wait_on_bit+0x64/0x90
 [] wait_on_page_bit+0xc4/0xd0
 [] ? autoremove_wake_function+0x40/0x40
 [] truncate_inode_pages_range+0x409/0x840
 [] ? pcpu_free_area+0x13d/0x1a0
 [] ? wake_up_bit+0x25/0x30
 [] truncate_inode_pages_final+0x4c/0x60
 [] f2fs_evict_inode+0x48/0x390 [f2fs]
 [] evict+0xc7/0x1a0
 [] iput+0x197/0x200
 [] f2fs_fill_super+0xab2/0x1130 [f2fs]
 [] mount_bdev+0x184/0x1c0
 [] ? f2fs_commit_super+0x100/0x100 [f2fs]
 [] f2fs_mount+0x15/0x20 [f2fs]
 [] mount_fs+0x39/0x160
 [] vfs_kern_mount+0x67/0x110
 [] do_mount+0x1bb/0xc80
 [] SyS_mount+0x83/0xd0
 [] do_syscall_64+0x6e/0x170
 [] entry_SYSCALL64_slow_path+0x25/0x25

The reason is that after we commit at least one page into f2fs private
bio cache, if there occurs checkpoint error, we will lose the chance to
commit private bio, result in deadlock in f2fs_evict_inode when wait
that page being writebacked. So giving a chance to do committing in
sync_node_pages for fixing.

Signed-off-by: Chao Yu 
---
 fs/f2fs/node.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 55c22a9..c2d953e 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1416,6 +1416,7 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct 
writeback_control *wbc)
struct pagevec pvec;
int step = 0;
int nwritten = 0;
+   int ret = 0;
 
pagevec_init(&pvec, 0);
 
@@ -1436,7 +1437,8 @@ next_step:
 
if (unlikely(f2fs_cp_error(sbi))) {
pagevec_release(&pvec);
-   return -EIO;
+   ret = -EIO;
+   goto out;
}
 
/*
@@ -1485,9 +1487,11 @@ continue_unlock:
set_fsync_mark(page, 0);
set_dentry_mark(page, 0);
 
-   if (NODE_MAPPING(sbi)->a_ops->writepage(page, wbc))
+   if (NODE_MAPPING(sbi)->a_ops->writepage(page, wbc)) {
unlock_page(page);
-
+   } else {
+   nwritten++;
+   }
if (--wbc->nr_to_write == 0)
break;
}
@@ -1504,7 +1508,10 @@ continue_unlock:
step++;
goto next_step;
}
-   return nwritten;
+out:
+   if (ret && nwritten)
+   f2fs_submit_merged_bio(sbi, NODE, WRITE);
+   return ret;
 }
 
 int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
-- 
2.8.2.311.gee88674

[PATCH 2/3] f2fs: support checkpoint error injection

2016-09-22 Thread Chao Yu

This patch adds to support checkpoint error injection in f2fs for testing
fatal error tolerance.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h  | 7 +++
 fs/f2fs/super.c | 1 +
 2 files changed, 8 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e216bc0..3c513fe 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -47,6 +47,7 @@ enum {
FAULT_DIR_DEPTH,
FAULT_EVICT_INODE,
FAULT_IO,
+   FAULT_CHECKPOINT,
FAULT_MAX,
 };
 
@@ -80,6 +81,8 @@ static inline bool time_to_inject(int type)
return false;
else if (type == FAULT_IO && !IS_FAULT_SET(type))
return false;
+   else if (type == FAULT_CHECKPOINT && !IS_FAULT_SET(type))
+   return false;
 
atomic_inc(&f2fs_fault.inject_ops);
if (atomic_read(&f2fs_fault.inject_ops) >= f2fs_fault.inject_rate) {
@@ -1873,6 +1876,10 @@ static inline int f2fs_readonly(struct super_block *sb)
 
 static inline bool f2fs_cp_error(struct f2fs_sb_info *sbi)
 {
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+   if (time_to_inject(FAULT_CHECKPOINT))
+   return true;
+#endif
return is_set_ckpt_flags(sbi, CP_ERROR_FLAG);
 }
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 6426855..3c49419 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -51,6 +51,7 @@ char *fault_name[FAULT_MAX] = {
[FAULT_DIR_DEPTH]   = "too big dir depth",
[FAULT_EVICT_INODE] = "evict_inode fail",
[FAULT_IO]  = "IO error",
+   [FAULT_CHECKPOINT]  = "checkpoint error",
 };
 
 static void f2fs_build_fault_attr(unsigned int rate)
-- 
2.8.2.311.gee88674

[PATCH] f2fs: support configuring fault injection per superblock

2016-09-23 Thread Chao Yu

From: Chao Yu 

Previously, we only support global fault injection configuration, so that
when we configure type/rate of fault injection through sysfs, mount
option, it will influence all f2fs partition which is being used.

It is not make sence, since it will be not convenient if developer want
to test separated partitions with different fault injection rate/type
simultaneously, also it's not possible to enable fault injection in one
partition and disable fault injection in other one.

>From now on, we move global configuration of fault injection in module
into per-superblock, hence injection testing can be more flexible.

Signed-off-by: Chao Yu 
---
 fs/f2fs/acl.c| 12 
 fs/f2fs/checkpoint.c |  2 +-
 fs/f2fs/data.c   |  2 +-
 fs/f2fs/dir.c|  2 +-
 fs/f2fs/f2fs.h   | 78 
 fs/f2fs/gc.c |  2 +-
 fs/f2fs/inline.c |  4 +--
 fs/f2fs/inode.c  |  2 +-
 fs/f2fs/node.c   |  2 +-
 fs/f2fs/super.c  | 57 ++
 10 files changed, 66 insertions(+), 97 deletions(-)

diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
index 4dcc9e2..1e29630 100644
--- a/fs/f2fs/acl.c
+++ b/fs/f2fs/acl.c
@@ -109,14 +109,16 @@ fail:
return ERR_PTR(-EINVAL);
 }
 
-static void *f2fs_acl_to_disk(const struct posix_acl *acl, size_t *size)
+static void *f2fs_acl_to_disk(struct f2fs_sb_info *sbi,
+   const struct posix_acl *acl, size_t *size)
 {
struct f2fs_acl_header *f2fs_acl;
struct f2fs_acl_entry *entry;
int i;
 
-   f2fs_acl = f2fs_kmalloc(sizeof(struct f2fs_acl_header) + acl->a_count *
-   sizeof(struct f2fs_acl_entry), GFP_NOFS);
+   f2fs_acl = f2fs_kmalloc(sbi, sizeof(struct f2fs_acl_header) +
+   acl->a_count * sizeof(struct f2fs_acl_entry),
+   GFP_NOFS);
if (!f2fs_acl)
return ERR_PTR(-ENOMEM);
 
@@ -175,7 +177,7 @@ static struct posix_acl *__f2fs_get_acl(struct inode 
*inode, int type,
 
retval = f2fs_getxattr(inode, name_index, "", NULL, 0, dpage);
if (retval > 0) {
-   value = f2fs_kmalloc(retval, GFP_F2FS_ZERO);
+   value = f2fs_kmalloc(F2FS_I_SB(inode), retval, GFP_F2FS_ZERO);
if (!value)
return ERR_PTR(-ENOMEM);
retval = f2fs_getxattr(inode, name_index, "", value,
@@ -230,7 +232,7 @@ static int __f2fs_set_acl(struct inode *inode, int type,
}
 
if (acl) {
-   value = f2fs_acl_to_disk(acl, &size);
+   value = f2fs_acl_to_disk(F2FS_I_SB(inode), acl, &size);
if (IS_ERR(value)) {
clear_inode_flag(inode, FI_ACL_MODE);
return (int)PTR_ERR(value);
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d1560bb..a655d75 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -494,7 +494,7 @@ int acquire_orphan_inode(struct f2fs_sb_info *sbi)
spin_lock(&im->ino_lock);
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
-   if (time_to_inject(FAULT_ORPHAN)) {
+   if (time_to_inject(sbi, FAULT_ORPHAN)) {
spin_unlock(&im->ino_lock);
return -ENOSPC;
}
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a9f7436..0eb7bee 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -35,7 +35,7 @@ static void f2fs_read_end_io(struct bio *bio)
int i;
 
 #ifdef CONFIG_F2FS_FAULT_INJECTION
-   if (time_to_inject(FAULT_IO))
+   if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO))
bio->bi_error = -EIO;
 #endif
 
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index 39a850b..cbf85f6 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -545,7 +545,7 @@ int f2fs_add_regular_entry(struct inode *dir, const struct 
qstr *new_name,
 
 start:
 #ifdef CONFIG_F2FS_FAULT_INJECTION
-   if (time_to_inject(FAULT_DIR_DEPTH))
+   if (time_to_inject(F2FS_I_SB(dir), FAULT_DIR_DEPTH))
return -ENOSPC;
 #endif
if (unlikely(current_depth == MAX_DIR_HASH_DEPTH))
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 3c513fe..0d7b649 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -57,44 +57,8 @@ struct f2fs_fault_info {
unsigned int inject_type;
 };
 
-extern struct f2fs_fault_info f2fs_fault;
 extern char *fault_name[FAULT_MAX];
-#define IS_FAULT_SET(type) (f2fs_fault.inject_type & (1 << (type)))
-
-static inline bool time_to_inject(int type)
-{
-   if (!f2fs_fault.inject_rate)
-   return false;
-   if (type == FAULT_KMALLOC && !IS_FAULT_SET(type))
-   return false;
-   else if (type == FAULT_PAGE_ALLOC && !IS_FAULT_SET(type))
-   return false;
-   else if (type == FAULT_ALLOC_NID && !IS_FAULT_SET(type))
-   re

Re: [PATCH 2/3] f2fs: support checkpoint error injection

2016-09-23 Thread Chao Yu

Hi Jaegeuk,

On 2016/9/24 7:53, Jaegeuk Kim wrote:
> Hi Chao,
> 
> The basic rule is to stop every operations once CP_ERROR_FLAG is set.
> But, this patch simply breaks the rule.
> For example, f2fs_write_data_page() currently exits with mapping_set_error().
> So this patch incurs missing dentry blocks in a valid checkpoint.

Yes, that's right.

How about triggering checkpoint error in f2fs_stop_checkpoint?

>From 7bedfe9a0e97c4deead1c7cdbfc24187f5080268 Mon Sep 17 00:00:00 2001
From: Chao Yu 
Date: Fri, 23 Sep 2016 06:59:04 +0800
Subject: [PATCH] f2fs: support checkpoint error injection

This patch adds to support checkpoint error injection in f2fs for testing
fatal error tolerance.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 14 +++---
 fs/f2fs/data.c   |  7 ---
 fs/f2fs/f2fs.h   |  5 -
 fs/f2fs/file.c   |  8 
 fs/f2fs/inode.c  |  7 +--
 fs/f2fs/super.c  |  1 +
 6 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index d1560bb..834c8ec 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -26,8 +26,17 @@
 static struct kmem_cache *ino_entry_slab;
 struct kmem_cache *inode_entry_slab;

-void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io)
+void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi,
+   bool end_io, bool need_stop)
 {
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+   if (time_to_inject(FAULT_CHECKPOINT))
+   need_stop = true;
+#endif
+
+   if (!need_stop)
+   return;
+
set_ckpt_flags(sbi, CP_ERROR_FLAG);
sbi->sb->s_flags |= MS_RDONLY;
if (!end_io)
@@ -100,8 +109,7 @@ repeat:
 * readonly and make sure do not write checkpoint with non-uptodate
 * meta page.
 */
-   if (unlikely(!PageUptodate(page)))
-   f2fs_stop_checkpoint(sbi, false);
+   f2fs_stop_checkpoint(sbi, false, !PageUptodate(page));
 out:
return page;
 }
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a9f7436..1b00d3d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -74,10 +74,11 @@ static void f2fs_write_end_io(struct bio *bio)

fscrypt_pullback_bio_page(&page, true);

-   if (unlikely(bio->bi_error)) {
+   f2fs_stop_checkpoint(sbi, true, bio->bi_error);
+
+   if (unlikely(bio->bi_error))
set_bit(AS_EIO, &page->mapping->flags);
-   f2fs_stop_checkpoint(sbi, true);
-   }
+
end_page_writeback(page);
}
if (atomic_dec_and_test(&sbi->nr_wb_bios) &&
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e216bc0..7bc1802 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -47,6 +47,7 @@ enum {
FAULT_DIR_DEPTH,
FAULT_EVICT_INODE,
FAULT_IO,
+   FAULT_CHECKPOINT,
FAULT_MAX,
 };

@@ -80,6 +81,8 @@ static inline bool time_to_inject(int type)
return false;
else if (type == FAULT_IO && !IS_FAULT_SET(type))
return false;
+   else if (type == FAULT_CHECKPOINT && !IS_FAULT_SET(type))
+   return false;

atomic_inc(&f2fs_fault.inject_ops);
if (atomic_read(&f2fs_fault.inject_ops) >= f2fs_fault.inject_rate) {
@@ -2115,7 +2118,7 @@ void destroy_segment_manager_caches(void);
 /*
  * checkpoint.c
  */
-void f2fs_stop_checkpoint(struct f2fs_sb_info *, bool);
+void f2fs_stop_checkpoint(struct f2fs_sb_info *, bool, bool);
 struct page *grab_meta_page(struct f2fs_sb_info *, pgoff_t);
 struct page *get_meta_page(struct f2fs_sb_info *, pgoff_t);
 struct page *get_tmp_page(struct f2fs_sb_info *, pgoff_t);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index d341a0e..57c7a64 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1720,21 +1720,21 @@ static int f2fs_ioc_shutdown(struct file *filp, 
unsigned long arg)
case F2FS_GOING_DOWN_FULLSYNC:
sb = freeze_bdev(sb->s_bdev);
if (sb && !IS_ERR(sb)) {
-   f2fs_stop_checkpoint(sbi, false);
+   f2fs_stop_checkpoint(sbi, false, true);
thaw_bdev(sb->s_bdev, sb);
}
break;
case F2FS_GOING_DOWN_METASYNC:
/* do checkpoint only */
f2fs_sync_fs(sb, 1);
-   f2fs_stop_checkpoint(sbi, false);
+   f2fs_stop_checkpoint(sbi, false, true);
break;
case F2FS_GOING_DOWN_NOSYNC:
-   f2fs_stop_checkpoint(sbi, false);
+   f2fs_stop_checkpoint(sbi, false, true);
break;
case F2FS_GOING_DOWN_METAFLUSH:
sync_meta_pages(sbi, META, LONG_MAX);
-   f2fs_stop_checkpoint(sbi, false);
+   f2fs_stop_checkpoint(sbi, false, true)

Re: [PATCH] f2fs: remove dirty inode pages in error path

2016-09-23 Thread Chao Yu

On 2016/9/24 5:11, Jaegeuk Kim wrote:
> When getting EIO while handling orphan inodes, we can get some dirty node
> pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try
> to flush node pages. But in this case, we should prevent to do that, since
> we will try again from the start.

We are protected since we set SBI_POR_DOING flag in sb, so we are safe now?

Thanks,

Re: [PATCH 2/3] f2fs: support checkpoint error injection

2016-09-23 Thread Chao Yu

On 2016/9/24 8:52, Jaegeuk Kim wrote:
> On Sat, Sep 24, 2016 at 08:46:54AM +0800, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2016/9/24 7:53, Jaegeuk Kim wrote:
>>> Hi Chao,
>>>
>>> The basic rule is to stop every operations once CP_ERROR_FLAG is set.
>>> But, this patch simply breaks the rule.
>>> For example, f2fs_write_data_page() currently exits with 
>>> mapping_set_error().
>>> So this patch incurs missing dentry blocks in a valid checkpoint.
>>
>> Yes, that's right.
>>
>> How about triggering checkpoint error in f2fs_stop_checkpoint?
> 
> Let's just use src/godown in xfstests, since we don't need to trigger this
> multiple times in runtime.

After we inject checkpoint error into f2fs at first time, all write IOs will be
refused to be writebacked to storage, meanwhile read IOs can continuously go
through f2fs, so with checkpoint error injection being supported, we can support
to trigger random analogously power off by f2fs itself, instead of using tools.
It means it doesn't needs specified test cases where we must use godown ioctl,
but with normal testcases in xfstest/fsstress/lkp, in CP error injection enabled
f2fs, we can test power off cases.

Thanks,

Re: [f2fs-dev] [PATCH 1/2] f2fs: use crc and cp version to determine roll-forward recovery

2016-09-23 Thread Chao Yu

On 2016/9/21 8:45, Jaegeuk Kim wrote:
> @@ -259,40 +290,26 @@ static inline void fill_node_footer_blkaddr(struct page 
> *page, block_t blkaddr)
>  {
>   struct f2fs_checkpoint *ckpt = F2FS_CKPT(F2FS_P_SB(page));
>   struct f2fs_node *rn = F2FS_NODE(page);
> + size_t crc_offset = le32_to_cpu(ckpt->checksum_offset);
> + __u64 cp_ver = le64_to_cpu(ckpt->checkpoint_ver);
> + __u64 crc;
>  
> - rn->footer.cp_ver = ckpt->checkpoint_ver;
> + crc = le32_to_cpu(*((__le32 *)((unsigned char *)ckpt + crc_offset)));
> + cp_ver |= (crc << 32);

How about using '^=' here?

> + rn->footer.cp_ver = cpu_to_le64(cp_ver);
>   rn->footer.next_blkaddr = cpu_to_le32(blkaddr);
>  }
>  
> -static inline nid_t ino_of_node(struct page *node_page)
> -{
> - struct f2fs_node *rn = F2FS_NODE(node_page);
> - return le32_to_cpu(rn->footer.ino);
> -}
> -
> -static inline nid_t nid_of_node(struct page *node_page)
> -{
> - struct f2fs_node *rn = F2FS_NODE(node_page);
> - return le32_to_cpu(rn->footer.nid);
> -}
> -
> -static inline unsigned int ofs_of_node(struct page *node_page)
> -{
> - struct f2fs_node *rn = F2FS_NODE(node_page);
> - unsigned flag = le32_to_cpu(rn->footer.flag);
> - return flag >> OFFSET_BIT_SHIFT;
> -}
> -
> -static inline unsigned long long cpver_of_node(struct page *node_page)
> +static inline bool is_recoverable_dnode(struct page *page)
>  {
> - struct f2fs_node *rn = F2FS_NODE(node_page);
> - return le64_to_cpu(rn->footer.cp_ver);
> -}
> + struct f2fs_checkpoint *ckpt = F2FS_CKPT(F2FS_P_SB(page));
> + size_t crc_offset = le32_to_cpu(ckpt->checksum_offset);
> + __u64 cp_ver = cur_cp_version(ckpt);
> + __u64 crc;
>  
> -static inline block_t next_blkaddr_of_node(struct page *node_page)
> -{
> - struct f2fs_node *rn = F2FS_NODE(node_page);
> - return le32_to_cpu(rn->footer.next_blkaddr);
> + crc = le32_to_cpu(*((__le32 *)((unsigned char *)ckpt + crc_offset)));
> + cp_ver |= (crc << 32);
> + return cpu_to_le64(cp_ver) == cpver_of_node(page);
>  }

cpu_to_le64(cp_ver) == cpver_of_node(page) ^ (crc << 32)

Thanks,

Re: [PATCH] f2fs: remove dirty inode pages in error path

2016-09-25 Thread Chao Yu

On 2016/9/25 1:47, Jaegeuk Kim wrote:
> On Sat, Sep 24, 2016 at 09:02:02AM +0800, Chao Yu wrote:
>> On 2016/9/24 5:11, Jaegeuk Kim wrote:
>>> When getting EIO while handling orphan inodes, we can get some dirty node
>>> pages. Then, f2fs_write_node_pages() called by iput(node_inode) will try
>>> to flush node pages. But in this case, we should prevent to do that, since
>>> we will try again from the start.
>>
>> We are protected since we set SBI_POR_DOING flag in sb, so we are safe now?
> 
> Safe, but we get an infinite loop to flush node pages.

Got it.

Thanks,

> 
>>
>> Thanks,
> 
> .
>

Re: [PATCH 2/3] f2fs: support checkpoint error injection

2016-09-25 Thread Chao Yu

On 2016/9/25 2:10, Jaegeuk Kim wrote:
> On Sat, Sep 24, 2016 at 11:32:08AM +0800, Chao Yu wrote:
>> On 2016/9/24 8:52, Jaegeuk Kim wrote:
>>> On Sat, Sep 24, 2016 at 08:46:54AM +0800, Chao Yu wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> On 2016/9/24 7:53, Jaegeuk Kim wrote:
>>>>> Hi Chao,
>>>>>
>>>>> The basic rule is to stop every operations once CP_ERROR_FLAG is set.
>>>>> But, this patch simply breaks the rule.
>>>>> For example, f2fs_write_data_page() currently exits with 
>>>>> mapping_set_error().
>>>>> So this patch incurs missing dentry blocks in a valid checkpoint.
>>>>
>>>> Yes, that's right.
>>>>
>>>> How about triggering checkpoint error in f2fs_stop_checkpoint?
>>>
>>> Let's just use src/godown in xfstests, since we don't need to trigger this
>>> multiple times in runtime.
>>
>> After we inject checkpoint error into f2fs at first time, all write IOs will 
>> be
>> refused to be writebacked to storage, meanwhile read IOs can continuously go
>> through f2fs, so with checkpoint error injection being supported, we can 
>> support
>> to trigger random analogously power off by f2fs itself, instead of using 
>> tools.
>> It means it doesn't needs specified test cases where we must use godown 
>> ioctl,
>> but with normal testcases in xfstest/fsstress/lkp, in CP error injection 
>> enabled
>> f2fs, we can test power off cases.
> 
> But, in this approach, the test coverage would be quite limited.
> In my testcase, I'm randomly injecting godown while fsstress is running, which
> mimics really random power failures, as I believe. I'm running this infinitely
> with fscking at every run.
> 
> Anyway, in order to do this without godown, how about background_gc thread to
> trigger f2fs_stop_checkpoint?

Yeap, better.

What do you think of adding random f2fs_stop_checkpoint in f2fs_balance_fs?
power off can be triggered if gc thread is not running.

Thanks,

> 
>>
>> Thanks,
> 
> .
>

[PATCH 2/2] f2fs: fix to recover old fault injection config in ->remount_fs

2016-09-26 Thread Chao Yu

In ->remount_fs, we didn't recover original fault injection config if
we encounter error, fix it.

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index aa35e60..6132b4c 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1001,6 +1001,9 @@ static int f2fs_remount(struct super_block *sb, int 
*flags, char *data)
bool need_restart_gc = false;
bool need_stop_gc = false;
bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE);
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+   struct f2fs_fault_info ffi = sbi->fault_info;
+#endif
 
/*
 * Save the old mount options in case we
@@ -1096,6 +1099,9 @@ restore_gc:
 restore_opts:
sbi->mount_opt = org_mount_opt;
sbi->active_logs = active_logs;
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+   sbi->fault_info = ffi;
+#endif
return err;
 }
 
-- 
2.8.2.311.gee88674

[PATCH v2] f2fs: support checkpoint error injection

2016-09-26 Thread Chao Yu

This patch adds to support checkpoint error injection in f2fs for testing
fatal error tolerance, it will be useful that it can simulate abnormal
power off by f2fs itself instead of calling godown ioctl by running apps.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h| 1 +
 fs/f2fs/gc.c  | 5 +
 fs/f2fs/segment.c | 5 +
 fs/f2fs/super.c   | 1 +
 4 files changed, 12 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2545b04..59c97ad 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -47,6 +47,7 @@ enum {
FAULT_DIR_DEPTH,
FAULT_EVICT_INODE,
FAULT_IO,
+   FAULT_CHECKPOINT,
FAULT_MAX,
 };
 
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index a5c4175..c9b8a67 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -47,6 +47,11 @@ static int gc_thread_func(void *data)
continue;
}
 
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+   if (time_to_inject(sbi, FAULT_CHECKPOINT))
+   f2fs_stop_checkpoint(sbi, false);
+#endif
+
/*
 * [GC triggering condition]
 * 0. GC is not conducted currently.
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index fbcc172..fc886f0 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -345,6 +345,11 @@ int commit_inmem_pages(struct inode *inode)
  */
 void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need)
 {
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+   if (time_to_inject(sbi, FAULT_CHECKPOINT))
+   f2fs_stop_checkpoint(sbi, false);
+#endif
+
if (!need)
return;
 
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index a06eee4..4d5911b 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -50,6 +50,7 @@ char *fault_name[FAULT_MAX] = {
[FAULT_DIR_DEPTH]   = "too big dir depth",
[FAULT_EVICT_INODE] = "evict_inode fail",
[FAULT_IO]  = "IO error",
+   [FAULT_CHECKPOINT]  = "checkpoint error",
 };
 
 static void f2fs_build_fault_attr(struct f2fs_sb_info *sbi,
-- 
2.8.2.311.gee88674

[PATCH 1/2] f2fs: do fault injection initialization in default_options

2016-09-26 Thread Chao Yu

Do fault injection initialization in default_options to keep consistent
with other default option configurating.

Signed-off-by: Chao Yu 
---
 fs/f2fs/super.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 4d5911b..aa35e60 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -364,10 +364,6 @@ static int parse_options(struct super_block *sb, char 
*options)
char *p, *name;
int arg = 0;
 
-#ifdef CONFIG_F2FS_FAULT_INJECTION
-   f2fs_build_fault_attr(sbi, 0);
-#endif
-
if (!options)
return 0;
 
@@ -991,6 +987,10 @@ static void default_options(struct f2fs_sb_info *sbi)
 #ifdef CONFIG_F2FS_FS_POSIX_ACL
set_opt(sbi, POSIX_ACL);
 #endif
+
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+   f2fs_build_fault_attr(sbi, 0);
+#endif
 }
 
 static int f2fs_remount(struct super_block *sb, int *flags, char *data)
-- 
2.8.2.311.gee88674

[PATCH 1/2] f2fs: fix to commit bio cache after flushing node pages

2016-09-26 Thread Chao Yu

From: Chao Yu 

In sync_node_pages, we won't check and commit last merged pages in private
bio cache of f2fs, as these pages were taged as writeback, someone who is
waiting for writebacking of the page will be blocked until the cache was
committed by someone else.

We need to commit node type bio cache to avoid potential deadlock or long
delay of waiting writeback.

Signed-off-by: Chao Yu 
---
 fs/f2fs/node.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 9faddcd..f73f774 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1416,6 +1416,7 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct 
writeback_control *wbc)
struct pagevec pvec;
int step = 0;
int nwritten = 0;
+   int ret = 0;
 
pagevec_init(&pvec, 0);
 
@@ -1436,7 +1437,8 @@ next_step:
 
if (unlikely(f2fs_cp_error(sbi))) {
pagevec_release(&pvec);
-   return -EIO;
+   ret = -EIO;
+   goto out;
}
 
/*
@@ -1487,6 +1489,8 @@ continue_unlock:
 
if (NODE_MAPPING(sbi)->a_ops->writepage(page, wbc))
unlock_page(page);
+   else
+   nwritten++;
 
if (--wbc->nr_to_write == 0)
break;
@@ -1504,7 +1508,10 @@ continue_unlock:
step++;
goto next_step;
}
-   return nwritten;
+out:
+   if (nwritten)
+   f2fs_submit_merged_bio(sbi, NODE, WRITE);
+   return ret;
 }
 
 int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
-- 
2.7.2

[PATCH 2/2] f2fs: remove redundant io plug

2016-09-26 Thread Chao Yu

From: Chao Yu 

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index a655d75..de8693c 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -267,7 +267,6 @@ static int f2fs_write_meta_pages(struct address_space 
*mapping,
struct writeback_control *wbc)
 {
struct f2fs_sb_info *sbi = F2FS_M_SB(mapping);
-   struct blk_plug plug;
long diff, written;
 
/* collect a number of dirty meta pages and write together */
@@ -280,9 +279,7 @@ static int f2fs_write_meta_pages(struct address_space 
*mapping,
/* if mounting is failed, skip writing node pages */
mutex_lock(&sbi->cp_mutex);
diff = nr_pages_to_write(sbi, META, wbc);
-   blk_start_plug(&plug);
written = sync_meta_pages(sbi, META, wbc->nr_to_write);
-   blk_finish_plug(&plug);
mutex_unlock(&sbi->cp_mutex);
wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff);
return 0;
-- 
2.7.2

Re: [PATCH 1/2] f2fs: fix to commit bio cache after flushing node pages

2016-09-26 Thread Chao Yu

Hi Jaegeuk,

On 2016/9/27 2:33, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Tue, Sep 27, 2016 at 12:09:52AM +0800, Chao Yu wrote:
>> From: Chao Yu 
>>
>> In sync_node_pages, we won't check and commit last merged pages in private
>> bio cache of f2fs, as these pages were taged as writeback, someone who is
>> waiting for writebacking of the page will be blocked until the cache was
>> committed by someone else.
>>
>> We need to commit node type bio cache to avoid potential deadlock or long
>> delay of waiting writeback.
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/node.c | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>> index 9faddcd..f73f774 100644
>> --- a/fs/f2fs/node.c
>> +++ b/fs/f2fs/node.c
>> @@ -1416,6 +1416,7 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct 
>> writeback_control *wbc)
>>  struct pagevec pvec;
>>  int step = 0;
>>  int nwritten = 0;
>> +int ret = 0;
>>  
>>  pagevec_init(&pvec, 0);
>>  
>> @@ -1436,7 +1437,8 @@ next_step:
>>  
>>  if (unlikely(f2fs_cp_error(sbi))) {
>>  pagevec_release(&pvec);
>> -return -EIO;
>> +ret = -EIO;
>> +goto out;
>>  }
>>  
>>  /*
>> @@ -1487,6 +1489,8 @@ continue_unlock:
>>  
>>  if (NODE_MAPPING(sbi)->a_ops->writepage(page, wbc))
>>  unlock_page(page);
>> +else
>> +nwritten++;
>>  
>>  if (--wbc->nr_to_write == 0)
>>  break;
>> @@ -1504,7 +1508,10 @@ continue_unlock:
>>  step++;
>>  goto next_step;
>>  }
>> -return nwritten;
>> +out:
>> +if (nwritten)
>> +f2fs_submit_merged_bio(sbi, NODE, WRITE);
> 
> IIRC, we don't need to flush this, since f2fs_submit_merged_bio_cond() would
> handle this in f2fs_wait_on_page_writeback().

Yes, it covers all the cases in f2fs private codes, but there are still some
codes in mm or fs directory, and they didn't use f2fs_wait_on_page_writeback
when waiting page writeback. Such as do_writepages && filemap_fdatawait in
__writeback_single_inode...

Thanks,

> 
> Thanks,
> 
>> +return ret;
>>  }
>>  
>>  int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
>> -- 
>> 2.7.2
> 
> .
>

Re: [PATCH 1/2] f2fs: fix to commit bio cache after flushing node pages

2016-09-26 Thread Chao Yu

On 2016/9/27 9:39, Jaegeuk Kim wrote:
> On Tue, Sep 27, 2016 at 08:57:41AM +0800, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2016/9/27 2:33, Jaegeuk Kim wrote:
>>> Hi Chao,
>>>
>>> On Tue, Sep 27, 2016 at 12:09:52AM +0800, Chao Yu wrote:
>>>> From: Chao Yu 
>>>>
>>>> In sync_node_pages, we won't check and commit last merged pages in private
>>>> bio cache of f2fs, as these pages were taged as writeback, someone who is
>>>> waiting for writebacking of the page will be blocked until the cache was
>>>> committed by someone else.
>>>>
>>>> We need to commit node type bio cache to avoid potential deadlock or long
>>>> delay of waiting writeback.
>>>>
>>>> Signed-off-by: Chao Yu 
>>>> ---
>>>>  fs/f2fs/node.c | 11 +--
>>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
>>>> index 9faddcd..f73f774 100644
>>>> --- a/fs/f2fs/node.c
>>>> +++ b/fs/f2fs/node.c
>>>> @@ -1416,6 +1416,7 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct 
>>>> writeback_control *wbc)
>>>>struct pagevec pvec;
>>>>int step = 0;
>>>>int nwritten = 0;
>>>> +  int ret = 0;
>>>>  
>>>>pagevec_init(&pvec, 0);
>>>>  
>>>> @@ -1436,7 +1437,8 @@ next_step:
>>>>  
>>>>if (unlikely(f2fs_cp_error(sbi))) {
>>>>pagevec_release(&pvec);
>>>> -  return -EIO;
>>>> +  ret = -EIO;
>>>> +  goto out;
>>>>}
>>>>  
>>>>/*
>>>> @@ -1487,6 +1489,8 @@ continue_unlock:
>>>>  
>>>>if (NODE_MAPPING(sbi)->a_ops->writepage(page, wbc))
>>>>unlock_page(page);
>>>> +  else
>>>> +  nwritten++;
>>>>  
>>>>if (--wbc->nr_to_write == 0)
>>>>break;
>>>> @@ -1504,7 +1508,10 @@ continue_unlock:
>>>>step++;
>>>>goto next_step;
>>>>}
>>>> -  return nwritten;
>>>> +out:
>>>> +  if (nwritten)
>>>> +  f2fs_submit_merged_bio(sbi, NODE, WRITE);
>>>
>>> IIRC, we don't need to flush this, since f2fs_submit_merged_bio_cond() would
>>> handle this in f2fs_wait_on_page_writeback().
>>
>> Yes, it covers all the cases in f2fs private codes, but there are still some
>> codes in mm or fs directory, and they didn't use f2fs_wait_on_page_writeback
>> when waiting page writeback. Such as do_writepages && filemap_fdatawait in
>> __writeback_single_inode...
> 
> The do_writepages() is okay, which will call f2fs_write_node_pages().
> The __writeback_single_inode() won't do filemap_fdatawait() with WB_SYNC_ALL.
> We don't need to take care of truncation as well.
> 
> Any missing one?

Another is: while testing with first version of checkpoint error injection, I
encounter below dump stack:

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount   D 8801c1bf7960 0 97685  97397 0x0008
 8801c1bf7960 8801c1bf7930 88017590 8801c1bf7980
 8801c1bf8000  7fff 88021f7be340
 817c8880 8801c1bf7978 817c80a5 880214f58fc0
Call Trace:
 [] ? bit_wait+0x50/0x50
 [] schedule+0x35/0x80
 [] schedule_timeout+0x292/0x3d0
 [] ? xen_clocksource_get_cycles+0x15/0x20
 [] ? ktime_get+0x3c/0xb0
 [] ? bit_wait+0x50/0x50
 [] io_schedule_timeout+0xa6/0x110
 [] bit_wait_io+0x1b/0x60
 [] __wait_on_bit+0x64/0x90
 [] wait_on_page_bit+0xc4/0xd0
 [] ? autoremove_wake_function+0x40/0x40
 [] truncate_inode_pages_range+0x409/0x840
 [] ? pcpu_free_area+0x13d/0x1a0
 [] ? wake_up_bit+0x25/0x30
 [] truncate_inode_pages_final+0x4c/0x60
 [] f2fs_evict_inode+0x48/0x390 [f2fs]
 [] evict+0xc7/0x1a0
 [] iput+0x197/0x200
 [] f2fs_fill_super+0xab2/0x1130 [f2fs]
 [] mount_bdev+0x184/0x1c0
 [] ? f2fs_commit_super+0x100/0x100 [f2fs]
 [] f2fs_mount+0x15/0x20 [f2fs]
 [] mount_fs+0x39/0x160
 [] vfs_kern_mount+0x67/0x110
 [] do_mount+0x1bb/0xc80
 [] SyS_mount+0x83/0xd0
 [] do_syscall_64+0x6e/0x170
 [] entry_SYSCALL64_slow_path+0x25/0x25

Any thoughts?

> 
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>
>>>> +  return ret;
>>>>  }
>>>>  
>>>>  int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
>>>> -- 
>>>> 2.7.2
>>>
>>> .
>>>
> 
> .
>

Re: [f2fs-dev] [PATCH v2] f2fs: introduce get_checkpoint_version for cleanup

2016-09-27 Thread Chao Yu

On 2016/9/27 10:05, Tiezhu Yang wrote:
> There exists almost same codes when get the value of pre_version
> and cur_version in function validate_checkpoint, this patch adds
> get_checkpoint_version to clean up redundant codes.
> 
> Signed-off-by: Tiezhu Yang 
> ---
>  fs/f2fs/checkpoint.c | 63 
> ++--
>  1 file changed, 37 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index de8693c..2dbc834 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -663,45 +663,56 @@ static void write_orphan_inodes(struct f2fs_sb_info 
> *sbi, block_t start_blk)
>   }
>  }
>  
> -static struct page *validate_checkpoint(struct f2fs_sb_info *sbi,
> - block_t cp_addr, unsigned long long *version)
> +static int get_checkpoint_version(struct f2fs_sb_info *sbi, block_t cp_addr,
> + struct f2fs_checkpoint *cp_block, struct page *cp_page,

struct f2fs_checkpoint **cp_block, struct page **cp_page,

> + unsigned long long *version)
>  {
> - struct page *cp_page_1, *cp_page_2 = NULL;
>   unsigned long blk_size = sbi->blocksize;
> - struct f2fs_checkpoint *cp_block;
> - unsigned long long cur_version = 0, pre_version = 0;
> - size_t crc_offset;
> + size_t crc_offset = 0;
>   __u32 crc = 0;
>  
> - /* Read the 1st cp block in this CP pack */
> - cp_page_1 = get_meta_page(sbi, cp_addr);
> + cp_page = get_meta_page(sbi, cp_addr);
> + cp_block = (struct f2fs_checkpoint *)page_address(cp_page);

*cp_page = get_meta_page(sbi, cp_addr);
*cp_block = (struct f2fs_checkpoint *)page_address(*cp_page);

>  
> - /* get the version number */
> - cp_block = (struct f2fs_checkpoint *)page_address(cp_page_1);
>   crc_offset = le32_to_cpu(cp_block->checksum_offset);

ditto

> - if (crc_offset >= blk_size)
> - goto invalid_cp1;
> + if (crc_offset >= blk_size) {
> + f2fs_msg(sbi->sb, KERN_WARNING,
> + "%s: crc_offset is greater than or equal to blk_size.",
> + __func__);
> + return -EINVAL;
> + }
>  
>   crc = le32_to_cpu(*((__le32 *)((unsigned char *)cp_block + 
> crc_offset)));
> - if (!f2fs_crc_valid(sbi, crc, cp_block, crc_offset))
> - goto invalid_cp1;
> + if (!f2fs_crc_valid(sbi, crc, cp_block, crc_offset)) {

ditto

> + f2fs_msg(sbi->sb, KERN_WARNING,
> + "%s: f2fs_crc_valid returns false.", __func__);
> + return -EINVAL;
> + }
>  
> - pre_version = cur_cp_version(cp_block);
> + *version = cur_cp_version(cp_block);
> + return 0;
> +}
>  
> - /* Read the 2nd cp block in this CP pack */
> - cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1;
> - cp_page_2 = get_meta_page(sbi, cp_addr);
> +static struct page *validate_checkpoint(struct f2fs_sb_info *sbi,
> + block_t cp_addr, unsigned long long *version)
> +{
> + struct page *cp_page_1 = NULL, *cp_page_2 = NULL;
> + struct f2fs_checkpoint *cp_block = NULL;
> + unsigned long long cur_version = 0, pre_version = 0;
> + int err;
>  
> - cp_block = (struct f2fs_checkpoint *)page_address(cp_page_2);
> - crc_offset = le32_to_cpu(cp_block->checksum_offset);
> - if (crc_offset >= blk_size)
> - goto invalid_cp2;
> + err = get_checkpoint_version(sbi, cp_addr, cp_block,
> + cp_page_1, version);

err = get_checkpoint_version(sbi, cp_addr, &cp_block, &cp_page_1, version);

> + if (err)
> + goto invalid_cp1;
> + pre_version = *version;
>  
> - crc = le32_to_cpu(*((__le32 *)((unsigned char *)cp_block + 
> crc_offset)));
> - if (!f2fs_crc_valid(sbi, crc, cp_block, crc_offset))
> + cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1;
> + err = get_checkpoint_version(sbi, cp_addr, cp_block,
> + cp_page_2, version);

ditto

Thanks,

> + if (err)
>   goto invalid_cp2;
> -
> - cur_version = cur_cp_version(cp_block);
> + cur_version = *version;
>  
>   if (cur_version == pre_version) {
>   *version = cur_version;
>

Re: [f2fs-dev] [PATCH] f2fs: check free_sections for defragmentation

2016-09-07 Thread Chao Yu

Hi Jaegeuk,

On 2016/9/2 4:46, Jaegeuk Kim wrote:
> Fix wrong condition check for defragmentation of a file.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/file.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 37c24be..a8aa6fd 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -2037,7 +2037,7 @@ static int f2fs_defragment_range(struct f2fs_sb_info 
> *sbi,
>* avoid defragment running in SSR mode when free section are allocated
>* intensively
>*/
> - if (has_not_enough_free_secs(sbi, sec_num)) {
> + if (free_sections(sbi) <= sec_num) {

Why don't we check dirty dentry/node/imeta blocks here? they will be generated
at any time after f2fs_balance_fs. So, isn't original condition more strict than
new one?

Thanks,

>   err = -EAGAIN;
>   goto out;
>   }
>

Re: [f2fs-dev] [PATCH] f2fs: merge WRITE bio into previous WRITE_SYNC

2016-09-07 Thread Chao Yu

On 2016/9/3 2:36, Jaegeuk Kim wrote:
> On Fri, Sep 02, 2016 at 03:33:33PM +0800, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2016/8/27 8:53, Jaegeuk Kim wrote:
>>> This can avoid bio splits due to different op_flags.
>>
>> I thought about this, but I think this is not a good idea to increase merging
>> ratio of pages in bio. It breaks the rule of SYNC/ASYNC IO defined by system
>> which indicate degree of IO emergency, finally, some/more non-emergent IO 
>> will
>> treated as emergent one by IO scheduler, it will interrupt SYNC IOs in block
>> layer, more seriously, it may make real SYNC IO starvation.
> 
> I understand your concern.
> Originally, I tried to avoid breaking a big WRITE_SYNC by a small number of

Hmm.. I'm worry about the opposite case: user triggers small WRITE_SYNC IO
periodically, meanwhile there are big number of WRITE, with our new approach,
actually we will increase the number of synchronous WRITE IO obviously because
we will mix ASYNC/SYNC WRITE into bio cache intensively more than before since
we drop writepages mutexlock. So I'm afread the result is that it will mislead
scheduling of block layer.

> WRITE. And, I thought new WRITE can be piggybacked into previous WRITE_SYNC.
> 
> IMO, this happens very occassionally since previous pending bio should be
> WRITE_SYNC while a new request is WRITE. Even if this happens, the piggybacked
> size would not exceed over bio's max pages.
> If lots of WRITE come, we won't change at all.

I thinks this is related to writeback / blocklayer / cgroup subsystem which use
this tag frequently, maybe we should Cc their's mailing list for more opinion...

What's your opinion? :)

thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>>
>>>
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/data.c | 5 +
>>>  1 file changed, 5 insertions(+)
>>>
>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>> index 7c8e219..c7c2022 100644
>>> --- a/fs/f2fs/data.c
>>> +++ b/fs/f2fs/data.c
>>> @@ -267,6 +267,11 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
>>>  
>>> down_write(&io->io_rwsem);
>>>  
>>> +   /* WRITE can be merged into previous WRITE_SYNC */
>>> +   if (io->bio && io->last_block_in_bio == fio->new_blkaddr - 1 &&
>>> +   io->fio.op == fio->op && io->fio.op_flags == WRITE_SYNC)
>>> +   fio->op_flags = WRITE_SYNC;
>>> +
>>> if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
>>> (io->fio.op != fio->op || io->fio.op_flags != fio->op_flags)))
>>> __submit_merged_bio(io);
>>>
> 
> --
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>

Re: [PATCH] MAINTAINERS: update my maintainership status of f2fs

2016-09-07 Thread Chao Yu

Sorry, +Cc f2fs & kernel mailing list.

On 2016/9/8 10:27, Chao Yu wrote:
> Update my maintainership of f2fs to maintainer instead of reviewer.
> 
> Signed-off-by: Chao Yu 
> ---
>  MAINTAINERS | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0bbe4b1..97abd05 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5061,7 +5061,7 @@ F:  include/linux/fscrypto.h
>  F2FS FILE SYSTEM
>  M:   Jaegeuk Kim 
>  M:   Changman Lee 
> -R:   Chao Yu 
> +M:   Chao Yu 
>  L:   linux-f2fs-de...@lists.sourceforge.net
>  W:   http://en.wikipedia.org/wiki/F2FS
>  T:   git git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
>

Re: [f2fs-dev] [PATCH] f2fs: check free_sections for defragmentation

2016-09-08 Thread Chao Yu

On 2016/9/8 8:18, Jaegeuk Kim wrote:
> I just wanted to fix this without any multiple changes.
> We can do like this as well. :)
> 
> From 6526f0377fd6616ae65b854fbd614e8ed9598fdd Mon Sep 17 00:00:00 2001
> From: Jaegeuk Kim 
> Date: Thu, 1 Sep 2016 12:02:51 -0700
> Subject: [PATCH] f2fs: check free_sections for defragmentation
> 
> Fix wrong condition check for defragmentation of a file.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu

Re: [f2fs-dev] [PATCH] f2fs: merge WRITE bio into previous WRITE_SYNC

2016-09-08 Thread Chao Yu

On 2016/9/8 8:26, Jaegeuk Kim wrote:
> On Wed, Sep 07, 2016 at 10:12:17PM +0800, Chao Yu wrote:
>> On 2016/9/3 2:36, Jaegeuk Kim wrote:
>>> On Fri, Sep 02, 2016 at 03:33:33PM +0800, Chao Yu wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> On 2016/8/27 8:53, Jaegeuk Kim wrote:
>>>>> This can avoid bio splits due to different op_flags.
>>>>
>>>> I thought about this, but I think this is not a good idea to increase 
>>>> merging
>>>> ratio of pages in bio. It breaks the rule of SYNC/ASYNC IO defined by 
>>>> system
>>>> which indicate degree of IO emergency, finally, some/more non-emergent IO 
>>>> will
>>>> treated as emergent one by IO scheduler, it will interrupt SYNC IOs in 
>>>> block
>>>> layer, more seriously, it may make real SYNC IO starvation.
>>>
>>> I understand your concern.
>>> Originally, I tried to avoid breaking a big WRITE_SYNC by a small number of
>>
>> Hmm.. I'm worry about the opposite case: user triggers small WRITE_SYNC IO
>> periodically, meanwhile there are big number of WRITE, with our new approach,
>> actually we will increase the number of synchronous WRITE IO obviously 
>> because
>> we will mix ASYNC/SYNC WRITE into bio cache intensively more than before 
>> since
>> we drop writepages mutexlock. So I'm afread the result is that it will 
>> mislead
>> scheduling of block layer.
>>
>>> WRITE. And, I thought new WRITE can be piggybacked into previous WRITE_SYNC.
>>>
>>> IMO, this happens very occassionally since previous pending bio should be
>>> WRITE_SYNC while a new request is WRITE. Even if this happens, the 
>>> piggybacked
>>> size would not exceed over bio's max pages.
>>> If lots of WRITE come, we won't change at all.
>>
>> I thinks this is related to writeback / blocklayer / cgroup subsystem which 
>> use
>> this tag frequently, maybe we should Cc their's mailing list for more 
>> opinion...
> 
> Except cgroup, since we do not support it yet. :P

Yeap.

> 
> Anyway, I think we'd better verify the effect of this for a while.
> For example, I'm able to write a simple program to measure fsync latency while
> a bunch of buffered writes.
> Meanwhile, I'll put it back to the end of dev-test repo. :)

Sounds good plan. Hoping we will not suffer from regression here. ;)

Thanks,

> 
> Thanks,
> 
>>
>> What's your opinion? :)
>>
>> thanks,
>>
>>>
>>> Thanks,
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>> Signed-off-by: Jaegeuk Kim 
>>>>> ---
>>>>>  fs/f2fs/data.c | 5 +
>>>>>  1 file changed, 5 insertions(+)
>>>>>
>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
>>>>> index 7c8e219..c7c2022 100644
>>>>> --- a/fs/f2fs/data.c
>>>>> +++ b/fs/f2fs/data.c
>>>>> @@ -267,6 +267,11 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
>>>>>  
>>>>>   down_write(&io->io_rwsem);
>>>>>  
>>>>> + /* WRITE can be merged into previous WRITE_SYNC */
>>>>> + if (io->bio && io->last_block_in_bio == fio->new_blkaddr - 1 &&
>>>>> + io->fio.op == fio->op && io->fio.op_flags == WRITE_SYNC)
>>>>> + fio->op_flags = WRITE_SYNC;
>>>>> +
>>>>>   if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
>>>>>   (io->fio.op != fio->op || io->fio.op_flags != fio->op_flags)))
>>>>>   __submit_merged_bio(io);
>>>>>
>>>
>>> --
>>> ___
>>> Linux-f2fs-devel mailing list
>>> linux-f2fs-de...@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>

[PATCH v2] MAINTAINERS: update f2fs entry

2016-09-08 Thread Chao Yu

This patch includes below modifications:
1. change my maintainership from reviewer to maintainer.
2. remove maintainership of Changman Lee since he is not active about
one and a half year.
3. change website of f2fs from wiki to kernel one.

Signed-off-by: Chao Yu 
---
v2: gather more modification in this patch.
 MAINTAINERS | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0bbe4b1..bd28973 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5060,10 +5060,9 @@ F:   include/linux/fscrypto.h
 
 F2FS FILE SYSTEM
 M: Jaegeuk Kim 
-M: Changman Lee 
-R: Chao Yu 
+M: Chao Yu 
 L: linux-f2fs-de...@lists.sourceforge.net
-W: http://en.wikipedia.org/wiki/F2FS
+W: https://f2fs.wiki.kernel.org/
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
 S: Maintained
 F: Documentation/filesystems/f2fs.txt
-- 
2.8.2.311.gee88674

[PATCH v3] f2fs: fix to set superblock dirty correctly

2016-08-30 Thread Chao Yu

tests/generic/251 of fstest suit complains us with below message:

[ cut here ]
invalid opcode:  [#1] PREEMPT SMP
CPU: 2 PID: 7698 Comm: fstrim Tainted: G   O4.7.0+ #21
task: e9f4e000 task.stack: e7262000
EIP: 0060:[] EFLAGS: 00010202 CPU: 2
EIP is at write_checkpoint+0xfde/0x1020 [f2fs]
EAX: f33eb300 EBX: eecac310 ECX: 0001 EDX: 0001
ESI: eecac000 EDI: eecac5f0 EBP: e7263dec ESP: e7263d18
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b76ab01c CR3: 2eb89de0 CR4: 000406f0
Stack:
 0001 a220fb7b e9f4e000 0002 419ff2d3 b3a05151 0002 e9f4e5d8
 e9f4e000 419ff2d3 b3a05151 eecac310 c10b8154 b3a05151 419ff2d3 c10b78bd
 e9f4e000 e9f4e000 e9f4e5d8 0001 e9f4e000 ec409000 eecac2cc eecac288
Call Trace:
 [] ? __lock_acquire+0x3c4/0x760
 [] ? mark_held_locks+0x5d/0x80
 [] f2fs_trim_fs+0x1c2/0x2e0 [f2fs]
 [] f2fs_ioctl+0x6b6/0x10b0 [f2fs]
 [] ? __this_cpu_preempt_check+0xf/0x20
 [] ? trace_hardirqs_off_caller+0x91/0x120
 [] ? __exchange_data_block+0xd30/0xd30 [f2fs]
 [] do_vfs_ioctl+0x81/0x7f0
 [] ? kmem_cache_free+0x245/0x2e0
 [] ? get_unused_fd_flags+0x40/0x40
 [] ? putname+0x4c/0x50
 [] ? do_sys_open+0x16e/0x1d0
 [] ? do_fast_syscall_32+0x30/0x1c0
 [] ? __this_cpu_preempt_check+0xf/0x20
 [] SyS_ioctl+0x58/0x80
 [] do_fast_syscall_32+0xa1/0x1c0
 [] sysenter_past_esp+0x45/0x74
EIP: [] write_checkpoint+0xfde/0x1020 [f2fs] SS:ESP 0068:e7263d18
---[ end trace 4de95d7e6b3aa7c6 ]---

The reason is: with below call stack, we will encounter BUG_ON during
doing fstrim.

Thread AThread B
- write_checkpoint
 - do_checkpoint
- f2fs_write_inode
 - update_inode_page
  - update_inode
   - set_page_dirty
- f2fs_set_node_page_dirty
 - inc_page_count
  - percpu_counter_inc
  - set_sbi_flag(SBI_IS_DIRTY)
  - clear_sbi_flag(SBI_IS_DIRTY)

Thread CThread D
- f2fs_write_node_page
 - set_node_addr
  - __set_nat_cache_dirty
   - nm_i->dirty_nat_cnt++
- do_vfs_ioctl
 - f2fs_ioctl
  - f2fs_trim_fs
   - write_checkpoint
- f2fs_bug_on(nm_i->dirty_nat_cnt)

Fix it by setting superblock dirty correctly in do_checkpoint and
f2fs_write_node_page.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 727e97e..b80dd37 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1154,6 +1154,16 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
clear_sbi_flag(sbi, SBI_IS_DIRTY);
clear_sbi_flag(sbi, SBI_NEED_CP);
 
+   /*
+* redirty superblock if metadata like node page or inode cache is
+* updated during writing checkpoint.
+*/
+   if (get_pages(sbi, F2FS_DIRTY_NODES) ||
+   get_pages(sbi, F2FS_DIRTY_IMETA))
+   set_sbi_flag(sbi, SBI_IS_DIRTY);
+
+   f2fs_bug_on(sbi, get_pages(sbi, F2FS_DIRTY_DENTS));
+
return 0;
 }
 
-- 
2.8.2.311.gee88674

Re: [PATCH] f2fs: merge WRITE bio into previous WRITE_SYNC

2016-09-02 Thread Chao Yu

Hi Jaegeuk,

On 2016/8/27 8:53, Jaegeuk Kim wrote:
> This can avoid bio splits due to different op_flags.

I thought about this, but I think this is not a good idea to increase merging
ratio of pages in bio. It breaks the rule of SYNC/ASYNC IO defined by system
which indicate degree of IO emergency, finally, some/more non-emergent IO will
treated as emergent one by IO scheduler, it will interrupt SYNC IOs in block
layer, more seriously, it may make real SYNC IO starvation.

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/data.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 7c8e219..c7c2022 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -267,6 +267,11 @@ void f2fs_submit_page_mbio(struct f2fs_io_info *fio)
>  
>   down_write(&io->io_rwsem);
>  
> + /* WRITE can be merged into previous WRITE_SYNC */
> + if (io->bio && io->last_block_in_bio == fio->new_blkaddr - 1 &&
> + io->fio.op == fio->op && io->fio.op_flags == WRITE_SYNC)
> + fio->op_flags = WRITE_SYNC;
> +
>   if (io->bio && (io->last_block_in_bio != fio->new_blkaddr - 1 ||
>   (io->fio.op != fio->op || io->fio.op_flags != fio->op_flags)))
>   __submit_merged_bio(io);
>

Re: [PATCH] f2fs: do not use discard_map for non-discard case

2016-08-05 Thread Chao Yu

Hi Jaegeuk,

On 2016/8/5 3:04, Jaegeuk Kim wrote:
> We don't need to keep discard_map, if f2fs has no discard mount option.

In trim_fs path, we will still use discard_map though, right?

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/f2fs.h|  2 ++
>  fs/f2fs/segment.c | 87 
> ++-
>  fs/f2fs/super.c   | 10 +++
>  3 files changed, 86 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 61cb83d..a400715 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2036,6 +2036,8 @@ int lookup_journal_in_cursum(struct f2fs_journal *, 
> int, unsigned int, int);
>  void flush_sit_entries(struct f2fs_sb_info *, struct cp_control *);
>  int build_segment_manager(struct f2fs_sb_info *);
>  void destroy_segment_manager(struct f2fs_sb_info *);
> +int build_discard_map(struct f2fs_sb_info *);
> +void destroy_discard_map(struct f2fs_sb_info *);
>  int __init create_segment_manager_caches(void);
>  void destroy_segment_manager_caches(void);
>  
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index a46296f..66822d3 100644
> --- a/fs/f2fs/segment.c
> +++ b/fs/f2fs/segment.c
> @@ -660,11 +660,11 @@ static void add_discard_addrs(struct f2fs_sb_info *sbi, 
> struct cp_control *cpc)
>   bool force = (cpc->reason == CP_DISCARD);
>   int i;
>  
> - if (se->valid_blocks == max_blocks)
> + if (se->valid_blocks == max_blocks || !test_opt(sbi, DISCARD))
>   return;
>  
>   if (!force) {
> - if (!test_opt(sbi, DISCARD) || !se->valid_blocks ||
> + if (!se->valid_blocks ||
>   SM_I(sbi)->nr_discards >= SM_I(sbi)->max_discards)
>   return;
>   }
> @@ -818,12 +818,14 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, 
> block_t blkaddr, int del)
>   if (del > 0) {
>   if (f2fs_test_and_set_bit(offset, se->cur_valid_map))
>   f2fs_bug_on(sbi, 1);
> - if (!f2fs_test_and_set_bit(offset, se->discard_map))
> + if (test_opt(sbi, DISCARD) &&
> + !f2fs_test_and_set_bit(offset, se->discard_map))
>   sbi->discard_blks--;
>   } else {
>   if (!f2fs_test_and_clear_bit(offset, se->cur_valid_map))
>   f2fs_bug_on(sbi, 1);
> - if (f2fs_test_and_clear_bit(offset, se->discard_map))
> + if (test_opt(sbi, DISCARD) &&
> + f2fs_test_and_clear_bit(offset, se->discard_map))
>   sbi->discard_blks++;
>   }
>   if (!f2fs_test_bit(offset, se->ckpt_valid_map))
> @@ -2096,6 +2098,55 @@ out:
>   set_prefree_as_free_segments(sbi);
>  }
>  
> +static int __init_discard_map(struct f2fs_sb_info *sbi)
> +{
> + struct sit_info *sit_i;
> + unsigned int start;
> +
> + if (!test_opt(sbi, DISCARD))
> + return 0;
> +
> + sit_i = SIT_I(sbi);
> +
> + for (start = 0; start < MAIN_SEGS(sbi); start++) {
> + sit_i->sentries[start].discard_map
> + = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
> + if (!sit_i->sentries[start].discard_map)
> + return -ENOMEM;
> + }
> + return 0;
> +}
> +
> +static void __build_discard_map(struct f2fs_sb_info *sbi)
> +{
> + struct sit_info *sit_i;
> + unsigned int start;
> +
> + if (!test_opt(sbi, DISCARD))
> + return;
> +
> + sit_i = SIT_I(sbi);
> +
> + for (start = 0; start < MAIN_SEGS(sbi); start++) {
> + struct seg_entry *se = &sit_i->sentries[start];
> +
> + memcpy(se->discard_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE);
> + sbi->discard_blks += sbi->blocks_per_seg - se->valid_blocks;
> + }
> +}
> +
> +int build_discard_map(struct f2fs_sb_info *sbi)
> +{
> + int err;
> +
> + err = __init_discard_map(sbi);
> + if (err)
> + return err;
> +
> + __build_discard_map(sbi);
> + return 0;
> +}
> +
>  static int build_sit_info(struct f2fs_sb_info *sbi)
>  {
>   struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
> @@ -2127,14 +2178,14 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
>   = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
>   sit_i->sentries[start].ckpt_valid_map
>   = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
> - sit_i->sentries[start].discard_map
> - = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
>   if (!sit_i->sentries[start].cur_valid_map ||
> - !sit_i->sentries[start].ckpt_valid_map ||
> - !sit_i->sentries[start].discard_map)
> + !sit_i->sentries[start].ckpt_valid_map)
>   return -ENOMEM;
>   }
>  
> + if (__init_discard_map(sbi))
> + return -ENOMEM;
> +
>   sit_i->tmp_map = kzall

[PATCH 1/2] f2fs: clean up bio cache trace

2016-08-06 Thread Chao Yu

From: Chao Yu 

Trace info related to bio cache operation is out of format, clean up it.

Before:
   <...>-28308 [002]   4781.052703: f2fs_submit_write_bio: dev = 
(251,1), WRITEWRITE_SYNC ^H, DATA, sector = 271424, size = 126976
   <...>-28308 [002]   4781.052820: f2fs_submit_page_mbio: dev = 
(251,1), ino = 103, page_index = 0x1f, oldaddr = 0x, newaddr = 0x84a7 
rw = WRITEWRITE_SYNCi ^H, type = DATA
kworker/u8:2-29988 [001]   5549.293877: f2fs_submit_page_mbio: dev = 
(251,1), ino = 91, page_index = 0xd, oldaddr = 0x, newaddr = 0x782f rw 
= WRITE0x0i ^H type = DATA

After:
kworker/u8:2-8678  [000]   7945.124459: f2fs_submit_write_bio: dev = 
(251,1), rw = WRITE_SYNC, DATA, sector = 74080, size = 53248
kworker/u8:2-8678  [000]   7945.124551: f2fs_submit_page_mbio: dev = 
(251,1), ino = 11, page_index = 0xec, oldaddr = 0x, newaddr = 0x243a, 
rw = WRITE, type = DATA

Signed-off-by: Chao Yu 
---
 include/trace/events/f2fs.h | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index ff95fd0..903a091 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -58,16 +58,12 @@ TRACE_DEFINE_ENUM(CP_DISCARD);
 #define F2FS_BIO_FLAG_MASK(t)  (t & (REQ_RAHEAD | WRITE_FLUSH_FUA))
 #define F2FS_BIO_EXTRA_MASK(t) (t & (REQ_META | REQ_PRIO))
 
-#define show_bio_type(op, op_flags) show_bio_op(op),   \
-   show_bio_op_flags(op_flags), show_bio_extra(op_flags)
-
-#define show_bio_op(op)
\
-   __print_symbolic(op,\
-   { READ, "READ" },   \
-   { WRITE,"WRITE" })
+#define show_bio_type(op_flags)show_bio_op_flags(op_flags),
\
+   show_bio_extra(op_flags)
 
 #define show_bio_op_flags(flags)   \
__print_symbolic(F2FS_BIO_FLAG_MASK(flags), \
+   { 0,"WRITE" },  \
{ REQ_RAHEAD,   "READAHEAD" },  \
{ READ_SYNC,"READ_SYNC" },  \
{ WRITE_SYNC,   "WRITE_SYNC" }, \
@@ -754,12 +750,12 @@ DECLARE_EVENT_CLASS(f2fs__submit_page_bio,
),
 
TP_printk("dev = (%d,%d), ino = %lu, page_index = 0x%lx, "
-   "oldaddr = 0x%llx, newaddr = 0x%llx rw = %s%si%s, type = %s",
+   "oldaddr = 0x%llx, newaddr = 0x%llx, rw = %s%s, type = %s",
show_dev_ino(__entry),
(unsigned long)__entry->index,
(unsigned long long)__entry->old_blkaddr,
(unsigned long long)__entry->new_blkaddr,
-   show_bio_type(__entry->op, __entry->op_flags),
+   show_bio_type(__entry->op_flags),
show_block_type(__entry->type))
 );
 
@@ -806,9 +802,9 @@ DECLARE_EVENT_CLASS(f2fs__submit_bio,
__entry->size   = bio->bi_iter.bi_size;
),
 
-   TP_printk("dev = (%d,%d), %s%s%s, %s, sector = %lld, size = %u",
+   TP_printk("dev = (%d,%d), rw = %s%s, %s, sector = %lld, size = %u",
show_dev(__entry),
-   show_bio_type(__entry->op, __entry->op_flags),
+   show_bio_type(__entry->op_flags),
show_block_type(__entry->type),
(unsigned long long)__entry->sector,
__entry->size)
-- 
2.7.2

[PATCH 2/2] Revert "f2fs: move i_size_write in f2fs_write_end"

2016-08-06 Thread Chao Yu

From: Chao Yu 

This reverts commit a2ee0a300344a6da76186129b078113354fe13d2.

When testing with generic/032 of xfstest suit, failure message will be
reported as below:

generic/032 8s ... [failed, exit status 1] - output mismatch (see 
results/generic/032.out.bad)
--- tests/generic/032.out   2015-01-11 16:52:27.643681072 +0800
+++ results/generic/032.out.bad 2016-08-06 13:44:43.861330500 +0800
@@ -1,5 +1,5 @@
 QA output created by 032
-100 iterations
-000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
-*
-010
+1: [768..775]: unwritten
+Unwritten extents found!
...
(Run 'diff -u tests/generic/032.out results/generic/032.out.bad'  to see 
the entire diff)
Ran: generic/032
Failures: generic/032
Failed 1 of 1 tests

In write_end(), we should update i_size of inode before unlock page,
otherwise, we will lose newly updated data in following race condition.

Thread AThread B
- write_end
 - unlock page
- writepages
 - lock_page
  - writepage
  if page is out-of-range of file size,
  we will skip writting the page.
 - update i_size

Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index e262427..5bb0bd2 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1700,11 +1700,11 @@ static int f2fs_write_end(struct file *file,
trace_f2fs_write_end(inode, pos, len, copied);
 
set_page_dirty(page);
-   f2fs_put_page(page, 1);
 
if (pos + copied > i_size_read(inode))
f2fs_i_size_write(inode, pos + copied);
 
+   f2fs_put_page(page, 1);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
return copied;
 }
-- 
2.7.2

Re: [PATCH 2/3] f2fs: schedule in between two continous batch discards

2016-08-25 Thread Chao Yu

Hi Jaegeuk,

On 2016/8/24 0:53, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Sun, Aug 21, 2016 at 11:21:30PM +0800, Chao Yu wrote:
>> From: Chao Yu 
>>
>> In batch discard approach of fstrim will grab/release gc_mutex lock
>> repeatly, it makes contention of the lock becoming more intensive.
>>
>> So after one batch discards were issued in checkpoint and the lock
>> was released, it's better to do schedule() to increase opportunity
>> of grabbing gc_mutex lock for other competitors.
>>
>> Signed-off-by: Chao Yu 
>> ---
>>  fs/f2fs/segment.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>> index 020767c..d0f74eb 100644
>> --- a/fs/f2fs/segment.c
>> +++ b/fs/f2fs/segment.c
>> @@ -1305,6 +1305,8 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct 
>> fstrim_range *range)
>>  mutex_unlock(&sbi->gc_mutex);
>>  if (err)
>>  break;
>> +
>> +schedule();
> 
> Hmm, if other thread is already waiting for gc_mutex, we don't need this here.
> In order to avoid long latency, wouldn't it be enough to reduce the batch 
> size?

Hmm, when fstrim call mutex_unlock we will pop one blocked locker from FIFO list
of mutex lock, and wake it up, then fstrimer will try to lock gc_mutex for next
batch trim, so the popped locker and fstrimer will make a new competition in
gc_mutex. If fstrimer is running in a big core, and popped locker is running in
a small core, we can't guarantee popped locker can win the race, and for the
most of time, fstrimer will win. So in order to reduce starvation of other
gc_mutext locker, it's better to do schedule() here.

Thanks,

> 
> Thanks,
> 
>>  }
>>  out:
>>  range->len = F2FS_BLK_TO_BYTES(cpc.trimmed);
>> -- 
>> 2.7.2
> 
> .
>

Re: [PATCH 5/6] f2fs: enable inline_dentry by default

2016-08-25 Thread Chao Yu

Hi Jaegeuk,

On 2016/8/24 0:57, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Mon, Aug 22, 2016 at 09:49:13AM +0800, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2016/5/10 7:04, Jaegeuk Kim wrote:
>>> On Mon, May 09, 2016 at 07:56:34PM +0800, Chao Yu wrote:
>>>> Make inline_dentry as default mount option to improve space usage and
>>>> IO performance in scenario of numerous small directory.
>>>
>>> Hmm, I've not much tested this so far.
>>> Let me take time to consider this for a while.
>>
>> IMO, this feature is almost stable since I fixed most of bugs which occurs
>> during inline conversion. And now I enable this feature by default when I do 
>> the
>> test with fstest suit and fsstress, I didn't find any more bugs reported by
>> those test tools.
>>
>> How do you think of enabling inline_dentry by default now?
> 
> Okay, let me start all my tests with this. :)

Cool, thanks for your support. ;)

Thanks,

> 
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>
>>>>
>>>> Signed-off-by: Chao Yu 
>>>> ---
>>>>  fs/f2fs/super.c | 1 +
>>>>  1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>> index 28c8992..4a4f4bd 100644
>>>> --- a/fs/f2fs/super.c
>>>> +++ b/fs/f2fs/super.c
>>>> @@ -824,6 +824,7 @@ static void default_options(struct f2fs_sb_info *sbi)
>>>>  
>>>>set_opt(sbi, BG_GC);
>>>>set_opt(sbi, INLINE_DATA);
>>>> +  set_opt(sbi, INLINE_DENTRY);
>>>>set_opt(sbi, EXTENT_CACHE);
>>>>  
>>>>  #ifdef CONFIG_F2FS_FS_XATTR
>>>> -- 
>>>> 2.8.2.311.gee88674
>>> .
>>>
> 
> .
>

Re: [PATCH 1/2] f2fs: fix to preallocate block only aligned to 4K

2016-08-25 Thread Chao Yu

Hi Jaegeuk,

On 2016/8/24 7:19, Jaegeuk Kim wrote:
> Hi Chao,
> 
> There is a bug when ki_pos = 1024, and iov_iter_count(from) = 1024 in 
> xfstests.
> Could you check the below patch to fix your one?

Oh, you're right, thanks for fixing it. :)

Thanks,

> 
> ---
>  fs/f2fs/data.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 37a59f7..7c8e219 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -626,8 +626,12 @@ ssize_t f2fs_preallocate_blocks(struct kiocb *iocb, 
> struct iov_iter *from)
>   ssize_t ret = 0;
>  
>   map.m_lblk = F2FS_BLK_ALIGN(iocb->ki_pos);
> - map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from)) -
> - map.m_lblk;
> + map.m_len = F2FS_BYTES_TO_BLK(iocb->ki_pos + iov_iter_count(from));
> + if (map.m_len > map.m_lblk)
> + map.m_len -= map.m_lblk;
> + else
> + map.m_len = 0;
> +
>   map.m_next_pgofs = NULL;
>  
>   if (f2fs_encrypted_inode(inode))
> @@ -673,6 +677,9 @@ int f2fs_map_blocks(struct inode *inode, struct 
> f2fs_map_blocks *map,
>   bool allocated = false;
>   block_t blkaddr;
>  
> + if (!maxblocks)
> + return 0;
> +
>   map->m_len = 0;
>   map->m_flags = 0;
>  
>

Re: [f2fs-dev] [PATCH] f2fs: fix a bug when using namehash to locate dentry bucket

2016-08-25 Thread Chao Yu

On 2016/8/25 20:42, Shuoran Liu wrote:
> In the following scenario,
> 
> 1) we don't have the key and doing a lookup for encrypted file,
> 2) and the encrypted filename is big name
> 
> we should use fname->hash as name hash value instead of what is
> calculated by fname->disk_name. Because in such case,
> fname->disk_name is empty.

Your signiture is missing here.

Anyway that's a good catch!

Acked-by: Chao Yu 

> ---
>  fs/f2fs/dir.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
> index 9054aea..b3e6f7f 100644
> --- a/fs/f2fs/dir.c
> +++ b/fs/f2fs/dir.c
> @@ -172,7 +172,10 @@ static struct f2fs_dir_entry *find_in_level(struct inode 
> *dir,
>   int max_slots;
>   f2fs_hash_t namehash;
>  
> - namehash = f2fs_dentry_hash(&name);
> + if(fname->hash)
> + namehash = cpu_to_le32(fname->hash);
> + else
> + namehash = f2fs_dentry_hash(&name);
>  
>   nbucket = dir_buckets(level, F2FS_I(dir)->i_dir_level);
>   nblock = bucket_blocks(level);
>

Re: [f2fs-dev] [PATCH -next] f2fs: fix non static symbol warning

2016-08-25 Thread Chao Yu

On 2016/8/23 23:23, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Fixes the following sparse warning:
> 
> fs/f2fs/data.c:969:12: warning:
>  symbol 'f2fs_grab_bio' was not declared. Should it be static?
> 
> Signed-off-by: Wei Yongjun 

Acked-by: Chao Yu

Re: [PATCH 2/3] f2fs: schedule in between two continous batch discards

2016-08-25 Thread Chao Yu

Hi Jaegeuk,

On 2016/8/26 0:57, Jaegeuk Kim wrote:
> Hi Chao,
> 
> On Thu, Aug 25, 2016 at 05:22:29PM +0800, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2016/8/24 0:53, Jaegeuk Kim wrote:
>>> Hi Chao,
>>>
>>> On Sun, Aug 21, 2016 at 11:21:30PM +0800, Chao Yu wrote:
>>>> From: Chao Yu 
>>>>
>>>> In batch discard approach of fstrim will grab/release gc_mutex lock
>>>> repeatly, it makes contention of the lock becoming more intensive.
>>>>
>>>> So after one batch discards were issued in checkpoint and the lock
>>>> was released, it's better to do schedule() to increase opportunity
>>>> of grabbing gc_mutex lock for other competitors.
>>>>
>>>> Signed-off-by: Chao Yu 
>>>> ---
>>>>  fs/f2fs/segment.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>> index 020767c..d0f74eb 100644
>>>> --- a/fs/f2fs/segment.c
>>>> +++ b/fs/f2fs/segment.c
>>>> @@ -1305,6 +1305,8 @@ int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct 
>>>> fstrim_range *range)
>>>>mutex_unlock(&sbi->gc_mutex);
>>>>if (err)
>>>>break;
>>>> +
>>>> +  schedule();
>>>
>>> Hmm, if other thread is already waiting for gc_mutex, we don't need this 
>>> here.
>>> In order to avoid long latency, wouldn't it be enough to reduce the batch 
>>> size?
>>
>> Hmm, when fstrim call mutex_unlock we will pop one blocked locker from FIFO 
>> list
>> of mutex lock, and wake it up, then fstrimer will try to lock gc_mutex for 
>> next
>> batch trim, so the popped locker and fstrimer will make a new competition in
>> gc_mutex.
> 
> Before trying to grab gc_mutex by fstrim again, there are already blocked 
> tasks
> waiting for gc_mutex. Hence the next one should be selectec by FIFO, no?

The next one which is going to be waked up is selected by FIFO, but the waked
one is still needs to be race with other mutex lock grabber.

So there is no such guarantee that the waked one must get the lock.

Thanks,

> 
> Thanks,
> 
>> If fstrimer is running in a big core, and popped locker is running in
>> a small core, we can't guarantee popped locker can win the race, and for the
>> most of time, fstrimer will win. So in order to reduce starvation of other
>> gc_mutext locker, it's better to do schedule() here.
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>
>>>>}
>>>>  out:
>>>>range->len = F2FS_BLK_TO_BYTES(cpc.trimmed);
>>>> -- 
>>>> 2.7.2
>>>
>>> .
>>>
> 
> .
>

[PATCH] f2fs: add noinline_dentry mount option

2016-08-25 Thread Chao Yu

This patch adds noinline_dentry mount option.

Signed-off-by: Chao Yu 
---
 Documentation/filesystems/f2fs.txt | 1 +
 fs/f2fs/super.c| 7 +++
 2 files changed, 8 insertions(+)

diff --git a/Documentation/filesystems/f2fs.txt 
b/Documentation/filesystems/f2fs.txt
index ecd8080..753dd4f 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -131,6 +131,7 @@ inline_dentry  Enable the inline dir feature: data 
in new created
directory entries can be written into inode block. The
space of inode block which is used to store inline
dentries is limited to ~3.4k.
+noinline_dentryDiable the inline dentry feature.
 flush_merge   Merge concurrent cache_flush commands as much as possible
to eliminate redundant command issues. If the underlying
   device handles the cache_flush command relatively slowly,
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 7974833..b776414 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -87,6 +87,7 @@ enum {
Opt_inline_xattr,
Opt_inline_data,
Opt_inline_dentry,
+   Opt_noinline_dentry,
Opt_flush_merge,
Opt_noflush_merge,
Opt_nobarrier,
@@ -118,6 +119,7 @@ static match_table_t f2fs_tokens = {
{Opt_inline_xattr, "inline_xattr"},
{Opt_inline_data, "inline_data"},
{Opt_inline_dentry, "inline_dentry"},
+   {Opt_noinline_dentry, "noinline_dentry"},
{Opt_flush_merge, "flush_merge"},
{Opt_noflush_merge, "noflush_merge"},
{Opt_nobarrier, "nobarrier"},
@@ -488,6 +490,9 @@ static int parse_options(struct super_block *sb, char 
*options)
case Opt_inline_dentry:
set_opt(sbi, INLINE_DENTRY);
break;
+   case Opt_noinline_dentry:
+   clear_opt(sbi, INLINE_DENTRY);
+   break;
case Opt_flush_merge:
set_opt(sbi, FLUSH_MERGE);
break;
@@ -879,6 +884,8 @@ static int f2fs_show_options(struct seq_file *seq, struct 
dentry *root)
seq_puts(seq, ",noinline_data");
if (test_opt(sbi, INLINE_DENTRY))
seq_puts(seq, ",inline_dentry");
+   else
+   seq_puts(seq, ",noinline_dentry");
if (!f2fs_readonly(sbi->sb) && test_opt(sbi, FLUSH_MERGE))
seq_puts(seq, ",flush_merge");
if (test_opt(sbi, NOBARRIER))
-- 
2.8.2.311.gee88674

[PATCH] f2fs: fix to set superblock dirty correctly

2016-08-26 Thread Chao Yu

From: Chao Yu 

tests/generic/251 of fstest suit complains us with below message:

[ cut here ]
invalid opcode:  [#1] PREEMPT SMP
CPU: 2 PID: 7698 Comm: fstrim Tainted: G   O4.7.0+ #21
task: e9f4e000 task.stack: e7262000
EIP: 0060:[] EFLAGS: 00010202 CPU: 2
EIP is at write_checkpoint+0xfde/0x1020 [f2fs]
EAX: f33eb300 EBX: eecac310 ECX: 0001 EDX: 0001
ESI: eecac000 EDI: eecac5f0 EBP: e7263dec ESP: e7263d18
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: b76ab01c CR3: 2eb89de0 CR4: 000406f0
Stack:
 0001 a220fb7b e9f4e000 0002 419ff2d3 b3a05151 0002 e9f4e5d8
 e9f4e000 419ff2d3 b3a05151 eecac310 c10b8154 b3a05151 419ff2d3 c10b78bd
 e9f4e000 e9f4e000 e9f4e5d8 0001 e9f4e000 ec409000 eecac2cc eecac288
Call Trace:
 [] ? __lock_acquire+0x3c4/0x760
 [] ? mark_held_locks+0x5d/0x80
 [] f2fs_trim_fs+0x1c2/0x2e0 [f2fs]
 [] f2fs_ioctl+0x6b6/0x10b0 [f2fs]
 [] ? __this_cpu_preempt_check+0xf/0x20
 [] ? trace_hardirqs_off_caller+0x91/0x120
 [] ? __exchange_data_block+0xd30/0xd30 [f2fs]
 [] do_vfs_ioctl+0x81/0x7f0
 [] ? kmem_cache_free+0x245/0x2e0
 [] ? get_unused_fd_flags+0x40/0x40
 [] ? putname+0x4c/0x50
 [] ? do_sys_open+0x16e/0x1d0
 [] ? do_fast_syscall_32+0x30/0x1c0
 [] ? __this_cpu_preempt_check+0xf/0x20
 [] SyS_ioctl+0x58/0x80
 [] do_fast_syscall_32+0xa1/0x1c0
 [] sysenter_past_esp+0x45/0x74
EIP: [] write_checkpoint+0xfde/0x1020 [f2fs] SS:ESP 0068:e7263d18
---[ end trace 4de95d7e6b3aa7c6 ]---

The reason is: with below call stack, we will encounter BUG_ON during
doing fstrim.

Thread AThread B
- write_checkpoint
 - do_checkpoint
- f2fs_write_inode
 - update_inode_page
  - update_inode
   - set_page_dirty
- f2fs_set_node_page_dirty
 - inc_page_count
  - percpu_counter_inc
  - set_sbi_flag(SBI_IS_DIRTY)
  - clear_sbi_flag(SBI_IS_DIRTY)

Thread CThread D
- f2fs_write_node_page
 - set_node_addr
  - __set_nat_cache_dirty
   - nm_i->dirty_nat_cnt++
- do_vfs_ioctl
 - f2fs_ioctl
  - f2fs_trim_fs
   - write_checkpoint
- f2fs_bug_on(nm_i->dirty_nat_cnt)

Fix it by setting superblock dirty correctly in do_checkpoint and
f2fs_write_node_page.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 4 
 fs/f2fs/node.c   | 1 +
 2 files changed, 5 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index cd0443d..68c723c 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1153,6 +1153,10 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, 
struct cp_control *cpc)
clear_prefree_segments(sbi, cpc);
clear_sbi_flag(sbi, SBI_IS_DIRTY);
 
+   /* redirty superblock if node page is updated by ->write_inode */
+   if (get_pages(sbi, F2FS_DIRTY_NODES))
+   set_sbi_flag(sbi, SBI_IS_DIRTY);
+
return 0;
 }
 
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 8a28800..365c6ff 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1597,6 +1597,7 @@ static int f2fs_write_node_page(struct page *page,
fio.old_blkaddr = ni.blk_addr;
write_node_page(nid, &fio);
set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(page));
+   set_sbi_flag(sbi, SBI_IS_DIRTY);
dec_page_count(sbi, F2FS_DIRTY_NODES);
up_read(&sbi->node_write);
 
-- 
2.7.2

[PATCH 2/2] f2fs: fix to update node page under cp_rwsem

2016-08-26 Thread Chao Yu

From: Chao Yu 

Update node page under cp_rwsem in order to keep data consistency
during writting checkpoint.

Signed-off-by: Chao Yu 
---
 fs/f2fs/inode.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 9ac5efc..1057c73 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -377,8 +377,11 @@ retry:
goto retry;
}
 
-   if (err)
+   if (err) {
+   f2fs_lock_op(sbi);
update_inode_page(inode);
+   f2fs_unlock_op(sbi);
+   }
sb_end_intwrite(inode->i_sb);
 no_delete:
stat_dec_inline_xattr(inode);
-- 
2.7.2

[PATCH 1/2] f2fs: do in batch synchronously readahead during GC

2016-08-26 Thread Chao Yu

From: Chao Yu 

In order to enhance performance, we try to readahead node page during
GC, but before loading node page we should get block address of node page
which is stored in NAT table, so synchronously read of single NAT page
block our readahead flow.

f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xa1e, oldaddr = 
0xa1e, newaddr = 0xa1e, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x35e9, oldaddr = 
0x72d7a, newaddr = 0x72d7a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0xc1f, oldaddr = 
0xc1f, newaddr = 0xc1f, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x389d, oldaddr = 
0x72d7d, newaddr = 0x72d7d, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3a82, oldaddr = 
0x72d7f, newaddr = 0x72d7f, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x3bfa, oldaddr = 
0x72d86, newaddr = 0x72d86, rw = READAHEAD ^H, type = NODE

This patch adds one phase that do readahead NAT pages in batch before
readahead node page for more effeciently.

f2fs_submit_page_bio: dev = (251,0), ino = 2, page_index = 0x1952, oldaddr = 
0x1952, newaddr = 0x1952, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc34, oldaddr = 
0xc34, newaddr = 0xc34, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa33, oldaddr = 
0xa33, newaddr = 0xa33, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc30, oldaddr = 
0xc30, newaddr = 0xc30, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc32, oldaddr = 
0xc32, newaddr = 0xc32, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc26, oldaddr = 
0xc26, newaddr = 0xc26, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa2b, oldaddr = 
0xa2b, newaddr = 0xa2b, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc23, oldaddr = 
0xc23, newaddr = 0xc23, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc24, oldaddr = 
0xc24, newaddr = 0xc24, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xa10, oldaddr = 
0xa10, newaddr = 0xa10, rw = READ_SYNC(MP), type = META
f2fs_submit_page_mbio: dev = (251,0), ino = 2, page_index = 0xc2c, oldaddr = 
0xc2c, newaddr = 0xc2c, rw = READ_SYNC(MP), type = META
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db7, oldaddr = 
0x6be00, newaddr = 0x6be00, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5db9, oldaddr = 
0x6be17, newaddr = 0x6be17, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dbc, oldaddr = 
0x6be1a, newaddr = 0x6be1a, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc3, oldaddr = 
0x6be20, newaddr = 0x6be20, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc7, oldaddr = 
0x6be24, newaddr = 0x6be24, rw = READAHEAD ^H, type = NODE
f2fs_submit_page_bio: dev = (251,0), ino = 1, page_index = 0x5dc9, oldaddr = 
0x6be25, newaddr = 0x6be25, rw = READAHEAD ^H, type = NODE

Signed-off-by: Chao Yu 
---
 fs/f2fs/gc.c | 33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index c1599b4..cdc44a6 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -423,10 +423,10 @@ static int check_valid_map(struct f2fs_sb_info *sbi,
 static void gc_node_segment(struct f2fs_sb_info *sbi,
struct f2fs_summary *sum, unsigned int segno, int gc_type)
 {
-   bool initial = true;
struct f2fs_summary *entry;
block_t start_addr;
int off;
+   int phase = 0;
 
start_addr = START_BLOCK(sbi, segno);
 
@@ -445,10 +445,18 @@ next_step:
if (check_valid_map(sbi, segno, off) == 0)
continue;
 
-   if (initial) {
+   if (phase == 0) {
+   ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
+   META_NAT, true);
+   continue;
+   }
+
+   if (phase == 1) {
ra_node_page(sbi, nid);
continue;
}
+
+   /* phase == 2 */
node_page = get_node_page(sbi, nid);
if (IS_ERR(node_page))
continue;
@@ -469,10 +477,8 @@ next_step:
stat_inc_node_blk_count(sbi, 1, gc_type);
}
 
-   if (initial) {
-   initial = false;
+   if (++phase <

[PATCH] f2fs: trigger normal fsync for non-atomic_write file

2017-08-18 Thread Chao Yu

If file was not opened with atomic write mode, but user uses atomic write
ioctl to fsync datas, in the flow, we should not fsync that file with
atomic write mode.

Fixes: 608514deba38 ("f2fs: set fsync mark only for the last dnode")
Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 3fe8f2ab0222..c93cf2a00395 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1698,7 +1698,7 @@ static int f2fs_ioc_commit_atomic_write(struct file *filp)
stat_dec_atomic_write(inode);
}
} else {
-   ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
+   ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, false);
}
 err_out:
inode_unlock(inode);
-- 
2.13.1.388.g69e6b9b4f4a9

Re: [PATCH v2] f2fs: introduce cur_reserved_blocks in sysfs

2017-08-18 Thread Chao Yu

Hi Yunlong,

IMO, we don't need additional sysfs entry, how about changing a bit as below?

>From 3fc8206871fe457859f1537c9dc8918b45f14601 Mon Sep 17 00:00:00 2001
From: Yunlong Song 
Date: Wed, 16 Aug 2017 23:01:56 +0800
Subject: [PATCH] f2fs: support soft block reservation

It supports to extend reserved_blocks sysfs interface to be soft
threshold, which allows user configure it exceeding current available
user space.

Signed-off-by: Yunlong Song 
Signed-off-by: Chao Yu 
---
 Documentation/ABI/testing/sysfs-fs-f2fs |  3 ++-
 fs/f2fs/f2fs.h  | 12 ++--
 fs/f2fs/super.c |  3 ++-
 fs/f2fs/sysfs.c | 16 ++--
 4 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
b/Documentation/ABI/testing/sysfs-fs-f2fs
index 11b7f4ebea7c..45c3d92f77c8 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -138,7 +138,8 @@ What:   /sys/fs/f2fs//reserved_blocks
 Date:  June 2017
 Contact:   "Chao Yu" 
 Description:
-Controls current reserved blocks in system.
+Controls current reserved blocks in system, the threshold
+is soft, it could exceed current avaible user space.

 What:  /sys/fs/f2fs//gc_urgent
 Date:  August 2017
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 336021b9b93e..a7b2d257e8ee 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1046,6 +1046,7 @@ struct f2fs_sb_info {
block_t discard_blks;   /* discard command candidats */
block_t last_valid_block_count; /* for recovery */
block_t reserved_blocks;/* configurable reserved blocks 
*/
+   block_t current_reserved_blocks;/* current reserved blocks */

u32 s_next_generation;  /* for NFS support */

@@ -1520,7 +1521,8 @@ static inline int inc_valid_block_count(struct 
f2fs_sb_info *sbi,

spin_lock(&sbi->stat_lock);
sbi->total_valid_block_count += (block_t)(*count);
-   avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks;
+   avail_user_block_count = sbi->user_block_count -
+   sbi->current_reserved_blocks;
if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
diff = sbi->total_valid_block_count - avail_user_block_count;
*count -= diff;
@@ -1554,6 +1556,9 @@ static inline void dec_valid_block_count(struct 
f2fs_sb_info *sbi,
f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
f2fs_bug_on(sbi, inode->i_blocks < sectors);
sbi->total_valid_block_count -= (block_t)count;
+   if (sbi->reserved_blocks)
+   sbi->current_reserved_blocks = min(sbi->reserved_blocks,
+   sbi->current_reserved_blocks + count);
spin_unlock(&sbi->stat_lock);
f2fs_i_blocks_write(inode, count, false, true);
 }
@@ -1700,7 +1705,7 @@ static inline int inc_valid_node_count(struct 
f2fs_sb_info *sbi,
spin_lock(&sbi->stat_lock);

valid_block_count = sbi->total_valid_block_count + 1;
-   if (unlikely(valid_block_count + sbi->reserved_blocks >
+   if (unlikely(valid_block_count + sbi->current_reserved_blocks >
sbi->user_block_count)) {
spin_unlock(&sbi->stat_lock);
goto enospc;
@@ -1743,6 +1748,9 @@ static inline void dec_valid_node_count(struct 
f2fs_sb_info *sbi,

sbi->total_valid_node_count--;
sbi->total_valid_block_count--;
+   if (sbi->reserved_blocks)
+   sbi->current_reserved_blocks = min(sbi->reserved_blocks,
+   sbi->current_reserved_blocks + 1);

spin_unlock(&sbi->stat_lock);

diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 4c1bdcb94133..036f083bbf56 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -957,7 +957,7 @@ static int f2fs_statfs(struct dentry *dentry, struct 
kstatfs *buf)
buf->f_blocks = total_count - start_count;
buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
-   sbi->reserved_blocks;
+   sbi->current_reserved_blocks;

avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;

@@ -2411,6 +2411,7 @@ static int f2fs_fill_super(struct super_block *sb, void 
*data, int silent)
le64_to_cpu(sbi->ckpt->valid_block_count);
sbi->last_valid_block_count = sbi->total_valid_b

Re: [PATCH v4 1/1] f2fs: dax: implement direct access

2017-06-21 Thread Chao Yu

Hi Qiuyang

As I tested with pmem, this patch will corrupt f2fs image with generic/051
of fstest suit.

Could you please take a look at this issue?

Thanks,

On 2017/6/15 16:56, sunqiuyang wrote:
> From: Qiuyang Sun 
> 
> This patch implements Direct Access (DAX) in F2FS.
> 
> Signed-off-by: Qiuyang Sun 
> ---
> 
> Changelog v3 -> v4:
> 
> 
>   In f2fs_iomap_begin():
> - For the write branch, if f2fs_map_blocks() returns error (probably due to
>   ENOSPC), the allocated blocks beyond original_i_size are truncated.
> - For the read branch, use F2FS_GET_BLOCK_FIEMAP instead of READ for 
>   f2fs_map_blocks(), so that contiguous unwritten blocks can be treated in
>   a batch. Accordingly, judge F2FS_MAP_UNWRITTEN before F2FS_MAP_MAPPED for
>   iomap->type.
> 
> - Add a call of f2fs_update_time() in f2fs_iomap_end().
> 
> 
> - In f2fs_move_file_range() and f2fs_ioc_defragment(), return -EINVAL for
>   DAX files, as the current implementation uses page cache.
> - Call f2fs_bug_on() in f2fs_ioc_commit_atomic_write() and 
>   f2fs_ioc_(release|abort)_volatile_write() when the inode is DAX, which 
>   should not happen.
> 
> 
> - Optimize the logic in dax_move_data_page().
> 
> 
> - Enable setting the S_DAX flag for an inode in f2fs_set_inode_flags().
> 
> The v4 patch is at f2fs-dev-test.
> 
> ---
>  fs/f2fs/data.c   | 100 +
>  fs/f2fs/f2fs.h   |   8 +++
>  fs/f2fs/file.c   | 192 
> ++-
>  fs/f2fs/gc.c | 104 --
>  fs/f2fs/inline.c |   4 ++
>  fs/f2fs/inode.c  |   8 ++-
>  fs/f2fs/namei.c  |   5 ++
>  fs/f2fs/super.c  |  15 +
>  8 files changed, 429 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 7d3af48..58efce0 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -2257,3 +2257,103 @@ int f2fs_migrate_page(struct address_space *mapping,
>   .migratepage= f2fs_migrate_page,
>  #endif
>  };
> +
> +#ifdef CONFIG_FS_DAX
> +#include 
> +#include 
> +
> +static int f2fs_iomap_begin(struct inode *inode, loff_t offset,
> + loff_t length, unsigned int flags, struct iomap *iomap)
> +{
> + struct block_device *bdev;
> + unsigned long first_block = F2FS_BYTES_TO_BLK(offset);
> + unsigned long last_block = F2FS_BYTES_TO_BLK(offset + length - 1);
> + struct f2fs_map_blocks map;
> + int ret;
> +
> + if (WARN_ON_ONCE(f2fs_has_inline_data(inode)))
> + return -ERANGE;
> +
> + map.m_lblk = first_block;
> + map.m_len = last_block - first_block + 1;
> + map.m_next_pgofs = NULL;
> +
> + if (!(flags & IOMAP_WRITE))
> + ret = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_FIEMAP);
> + else {
> + /* i_size should be kept here and changed later in f2fs_iomap_end */
> + loff_t original_i_size = i_size_read(inode);
> +
> + ret = f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_DIO);
> + if (i_size_read(inode) > original_i_size) {
> + f2fs_i_size_write(inode, original_i_size);
> + if (ret) {
> + truncate_pagecache(inode, original_i_size);
> + truncate_blocks(inode, original_i_size, true);
> + }
> + }
> + }
> +
> + if (ret)
> + return ret;
> +
> + iomap->flags = 0;
> + bdev = inode->i_sb->s_bdev;
> + iomap->bdev = bdev;
> + if (blk_queue_dax(bdev->bd_queue))
> + iomap->dax_dev = dax_get_by_host(bdev->bd_disk->disk_name);
> + else
> + iomap->dax_dev = NULL;
> + iomap->offset = F2FS_BLK_TO_BYTES((u64)first_block);
> +
> + if (map.m_len == 0) {
> + iomap->type = IOMAP_HOLE;
> + iomap->blkno = IOMAP_NULL_BLOCK;
> + iomap->length = F2FS_BLKSIZE;
> + } else {
> + if (map.m_flags & F2FS_MAP_UNWRITTEN) {
> + iomap->type = IOMAP_UNWRITTEN;
> + } else if (map.m_flags & F2FS_MAP_MAPPED) {
> + iomap->type = IOMAP_MAPPED;
> + } else {
> + WARN_ON_ONCE(1);
> + return -EIO;
> + }
> + iomap->blkno =
> + (sector_t)map.m_pblk << F2FS_LOG_SECTORS_PER_BLOCK;
> + iomap->length = F2FS_BLK_TO_BYTES((u64)map.m_len);
> + }
> +
> + if (map.m_flags & F2FS_MAP_NEW)
> + iomap->flags |= IOMAP_F_NEW;
> + return 0;
> +}
> +
> +static int f2fs_iomap_end(struct inode *inode, loff_t offset, loff_t length,
> + ssize_t written, unsigned int flags, struct iomap *iomap)
> +{
> + put_dax(iomap->dax_dev);
> + if (!(flags & IOMAP_WRITE) || (flags & IOMAP_FAULT))
> + return 0;
> +
> + if (offset + written > i_size_read(inode))
> + f2fs_i_size_write(inode, offset + written);
> +
> + if (iomap->offset + iomap->length >
> +

Re: [PATCH] f2fs: return error number for quota_write

2017-10-15 Thread Chao Yu

Hi Jaegeuk,

On 2017/10/13 7:15, Jaegeuk Kim wrote:
> This patch returns an error number to quota_write in order for quota to handle
> it correctly.

We should return error number like __generic_file_write_iter, right? it
needs to return written bytes if we have written one page or more, otherwise
return error number feedbacked from write_begin.

So how about reverting 4f31d26b0c17 ("f2fs: return wrong error number on
f2fs_quota_write")?

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/super.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 2feecf5e7f4c..840a0876005b 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -1397,8 +1397,11 @@ static ssize_t f2fs_quota_write(struct super_block 
> *sb, int type,
>  
>   err = a_ops->write_begin(NULL, mapping, off, tocopy, 0,
>   &page, NULL);
> - if (unlikely(err))
> + if (unlikely(err)) {
> + if (len == towrite)
> + return err;
>   break;
> + }
>  
>   kaddr = kmap_atomic(page);
>   memcpy(kaddr + offset, data, tocopy);
>

Re: [f2fs-dev] [PATCH] f2fs: avoid stale fi->gdirty_list pointer

2017-10-15 Thread Chao Yu

On 2017/10/13 10:14, Jaegeuk Kim wrote:
> When doing fault injection test, f2fs_evict_inode() didn't remove gdirty_list
> which incurs a kernel panic due to wrong pointer access.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu 

Minor thing, how about reverting judgment condition for readability?

if (is_set_ckpt_flags(sbi, CP_ERROR_FLAG))
f2fs_inode_synced()
else
f2fs_bug_on()

Thanks,

> ---
>  fs/f2fs/inode.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
> index f6db9d533ca4..1ae5396c97d6 100644
> --- a/fs/f2fs/inode.c
> +++ b/fs/f2fs/inode.c
> @@ -535,6 +535,8 @@ void f2fs_evict_inode(struct inode *inode)
>  
>   if (!is_set_ckpt_flags(sbi, CP_ERROR_FLAG))
>   f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
> + else
> + f2fs_inode_synced(inode);
>  
>   /* ino == 0, if f2fs_new_inode() was failed t*/
>   if (inode->i_ino)
>

Re: [PATCH] f2fs: expose some sectors to user in inline data or dentry case

2017-10-15 Thread Chao Yu

On 2017/10/14 1:31, Jaegeuk Kim wrote:
> If there's some data written through inline data or dentry, we need to shouw
> st_blocks. This fixes reporting zero blocks even though there is small written
> data.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu 

Thanks,

> ---
>  fs/f2fs/file.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 2eb3efe92018..f7be6c394fa8 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -698,6 +698,11 @@ int f2fs_getattr(const struct path *path, struct kstat 
> *stat,
> STATX_ATTR_NODUMP);
>  
>   generic_fillattr(inode, stat);
> +
> + /* we need to show initial sectors used for inline_data/dentries */
> + if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode))
> + stat->blocks += (stat->size + 511) >> 9;
> +
>   return 0;
>  }
>  
>

Re: [f2fs-dev] [PATCH v2] f2fs: add bug_on when f2fs_gc even fails to get one victim

2017-10-15 Thread Chao Yu

On 2017/10/14 20:34, Yunlong Song wrote:
> Do you mean check out-of-space test? I have tried that but no bugon.

Yes, test recent f2fs codes with kernel 4.13.0-rc1+ in VM, FYI:

kernel BUG at gc.c:1034!
invalid opcode:  [#1] SMP
Hardware name: Xen HVM domU, BIOS 4.1.2_115-900.260_ 11/06/2015
RIP: 0010:f2fs_gc+0x6e5/0x6f0 [f2fs]
RSP: 0018:c90004af7b40 EFLAGS: 00010202
RAX: 8801b0a15940 RBX:  RCX: 
RDX: 8801b0a15940 RSI: 8801978d5f00 RDI: 880128148048
RBP: c90004af7c38 R08: 8801978d5f00 R09: 0003
R10: 0003 R11: 8800060703a0 R12: 
R13:  R14: 0001 R15: 8801b4279800
FS:  7f23493cb740() GS:880216f0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7ffd05402ff8 CR3: 0001bffb3000 CR4: 001406e0
Call Trace:
 f2fs_balance_fs+0x123/0x140 [f2fs]
 f2fs_create+0x130/0x240 [f2fs]
 path_openat+0xee7/0x1360
 do_filp_open+0x7e/0xd0
 do_sys_open+0x115/0x1f0
 SyS_open+0x1e/0x20
 do_syscall_64+0x6e/0x160
 entry_SYSCALL64_slow_path+0x25/0x25

Thanks,

> 
> On 2017/10/14 8:17, Chao Yu wrote:
>> On 2017/10/13 21:31, Yunlong Song wrote:
>>> This can help us to debug on some corner case.
>> I can hit this bugon with generic/015 of fstest easily, could have a look at
>> this?
>>
>> Thanks,
>>
>>> Signed-off-by: Yunlong Song 
>>> Signed-off-by: Chao Yu 
>>> ---
>>>   fs/f2fs/gc.c | 6 +-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
>>> index 197ebf4..2b03202 100644
>>> --- a/fs/f2fs/gc.c
>>> +++ b/fs/f2fs/gc.c
>>> @@ -986,6 +986,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>> .ilist = LIST_HEAD_INIT(gc_list.ilist),
>>> .iroot = RADIX_TREE_INIT(GFP_NOFS),
>>> };
>>> +   bool need_fggc = false;
>>>   
>>> trace_f2fs_gc_begin(sbi->sb, sync, background,
>>> get_pages(sbi, F2FS_DIRTY_NODES),
>>> @@ -1018,8 +1019,10 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>> if (ret)
>>> goto stop;
>>> }
>>> -   if (has_not_enough_free_secs(sbi, 0, 0))
>>> +   if (has_not_enough_free_secs(sbi, 0, 0)) {
>>> gc_type = FG_GC;
>>> +   need_fggc = true;
>>> +   }
>>> }
>>>   
>>> /* f2fs_balance_fs doesn't need to do BG_GC in critical path. */
>>> @@ -1028,6 +1031,7 @@ int f2fs_gc(struct f2fs_sb_info *sbi, bool sync,
>>> goto stop;
>>> }
>>> if (!__get_victim(sbi, &segno, gc_type)) {
>>> +   f2fs_bug_on(sbi, !total_freed && need_fggc);
>>> ret = -ENODATA;
>>> goto stop;
>>> }
>>>
>> .
>>
>

Re: [f2fs-dev] [PATCH v2] f2fs: update dirty status for CURSEG as well

2017-10-15 Thread Chao Yu

On 2017/10/14 20:53, Yunlong Song wrote:
> Oh, yes it is. I found that problem in a kernel tree which does not have 
> commit
> c6f82fe90d7458e5fa190a6820bfc24f96b0de4e (Revert "f2fs: put allocate_segment
> after refresh_sit_entry"). In that kernel, the allocate_segment is still 
> behind
> refresh_sit_entry. Now I understand the commit message:
> "This makes a leak to register dirty segments. I reproduced the issue by
>  modified postmark which injects a lot of file create/delete/update and
>  finally triggers huge number of SSR allocations."
> 
> The reason is that if refresh_sit_entry is before allocate_segment, then the
> dirty status of CURSEG is not updated, as a result, the count of dirty 
> segments
> is wrong, which is much smaller than its real value. Then the f2fs_gc 
> can not
> do its work since it can not even get one victim, then the free segments are
> used up and then triggers much SSR. So Jay reverts the patch.
> 
> It seems there are two options:
> (1) keep this patch ([PATCH v2] f2fs: update dirty status for CURSEG as 
> well)
> and we can recover commit 3436c4bdb30de421d46f58c9174669fbcfd40ce0
> (f2fs: put allocate_segment after refresh_sit_entry)
> (2) remove this patch at all
> 
> It seems (1) is robust, but (2) avoids unnecessary check.

What about reverting 5e443818fa0b ("f2fs: handle dirty segments inside
refresh_sit_entry") to keep the original order:

1. update sit info
2. allocate new segment
3. update dirty status of segment

Thanks,

> 
> On 2017/10/14 8:14, Chao Yu wrote:
>> On 2017/10/13 21:21, Yunlong Song wrote:
>>> Without this patch, it will cause all the free segments using up in some
>>> corner case. For example, there are 100 segments, and 20 of them are
>>> reserved for ovp. If 79 segments are full of data, segment 80 becomes
>>> CURSEG segment, write 512 blocks and then delete 511 blocks. Since it is
>>> CURSEG segment, the __locate_dirty_segment will not update its dirty
>>> status. Then the dirty_segments(sbi) is 0, f2fs_gc will fail to
>>> get_victim, and f2fs_balance_fs will fail to trigger gc action. After
>>> f2fs_balance_fs returns, f2fs can continue to write data to segment 81.
>>> Again, segment 81 becomes CURSEG segment, write 512 blocks and delete
>>> 511 blocks, the dirty_segments(sbi) is 0 and f2fs_gc fail again. This
>>> can finally use up all the free segments and cause panic.
>> Look into this patch again, I found refresh_sit_entry is called after
>> ->allocate_segment, so if all 512 blocks were allocated, log header should
>> have been moved to another segment, so locate_dirty_segment in
>> refresh_sit_entry should update dirty status of previous segment correctly,
>> anything I'm missing?
>>
>> Thanks,
>>
>>> Signed-off-by: Yunlong Song 
>>> ---
>>>   fs/f2fs/segment.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index bfbcff8..0fce076 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -687,7 +687,7 @@ static void __locate_dirty_segment(struct f2fs_sb_info 
>>> *sbi, unsigned int segno,
>>> struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>>>   
>>> /* need not be added */
>>> -   if (IS_CURSEG(sbi, segno))
>>> +   if (IS_CURSEG(sbi, segno) && dirty_type == PRE)
>>> return;
>>>   
>>> if (!test_and_set_bit(segno, dirty_i->dirty_segmap[dirty_type]))
>>> @@ -737,7 +737,7 @@ static void locate_dirty_segment(struct f2fs_sb_info 
>>> *sbi, unsigned int segno)
>>> struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
>>> unsigned short valid_blocks;
>>>   
>>> -   if (segno == NULL_SEGNO || IS_CURSEG(sbi, segno))
>>> +   if (segno == NULL_SEGNO)
>>> return;
>>>   
>>> mutex_lock(&dirty_i->seglist_lock);
>>>
>> .
>>
>

[PATCH v2 1/5] f2fs: trace f2fs_lookup

2017-10-17 Thread Chao Yu

This patch adds trace for f2fs_lookup.

Signed-off-by: Chao Yu 
---
v2:
- fix warning reported by 0-day project.
- report error of d_splice_alias in trace_f2fs_lookup_end.
 fs/f2fs/namei.c | 49 +--
 include/trace/events/f2fs.h | 56 +
 2 files changed, 88 insertions(+), 17 deletions(-)

diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b6455b7ca00f..e6f86d5d97b9 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -337,12 +337,15 @@ static struct dentry *f2fs_lookup(struct inode *dir, 
struct dentry *dentry,
struct inode *inode = NULL;
struct f2fs_dir_entry *de;
struct page *page;
-   nid_t ino;
+   struct dentry *new;
+   nid_t ino = -1;
int err = 0;
unsigned int root_ino = F2FS_ROOT_INO(F2FS_I_SB(dir));
 
+   trace_f2fs_lookup_start(dir, dentry, flags);
+
if (f2fs_encrypted_inode(dir)) {
-   int res = fscrypt_get_encryption_info(dir);
+   err = fscrypt_get_encryption_info(dir);
 
/*
 * DCACHE_ENCRYPTED_WITH_KEY is set if the dentry is
@@ -352,18 +355,22 @@ static struct dentry *f2fs_lookup(struct inode *dir, 
struct dentry *dentry,
if (fscrypt_has_encryption_key(dir))
fscrypt_set_encrypted_dentry(dentry);
fscrypt_set_d_op(dentry);
-   if (res && res != -ENOKEY)
-   return ERR_PTR(res);
+   if (err && err != -ENOKEY)
+   goto out;
}
 
-   if (dentry->d_name.len > F2FS_NAME_LEN)
-   return ERR_PTR(-ENAMETOOLONG);
+   if (dentry->d_name.len > F2FS_NAME_LEN) {
+   err = -ENAMETOOLONG;
+   goto out;
+   }
 
de = f2fs_find_entry(dir, &dentry->d_name, &page);
if (!de) {
-   if (IS_ERR(page))
-   return (struct dentry *)page;
-   return d_splice_alias(inode, dentry);
+   if (IS_ERR(page)) {
+   err = PTR_ERR(page);
+   goto out;
+   }
+   goto out_splice;
}
 
ino = le32_to_cpu(de->ino);
@@ -371,19 +378,21 @@ static struct dentry *f2fs_lookup(struct inode *dir, 
struct dentry *dentry,
f2fs_put_page(page, 0);
 
inode = f2fs_iget(dir->i_sb, ino);
-   if (IS_ERR(inode))
-   return ERR_CAST(inode);
+   if (IS_ERR(inode)) {
+   err = PTR_ERR(inode);
+   goto out;
+   }
 
if ((dir->i_ino == root_ino) && f2fs_has_inline_dots(dir)) {
err = __recover_dot_dentries(dir, root_ino);
if (err)
-   goto err_out;
+   goto out_iput;
}
 
if (f2fs_has_inline_dots(inode)) {
err = __recover_dot_dentries(inode, dir->i_ino);
if (err)
-   goto err_out;
+   goto out_iput;
}
if (f2fs_encrypted_inode(dir) &&
(S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) &&
@@ -392,12 +401,18 @@ static struct dentry *f2fs_lookup(struct inode *dir, 
struct dentry *dentry,
 "Inconsistent encryption contexts: %lu/%lu",
 dir->i_ino, inode->i_ino);
err = -EPERM;
-   goto err_out;
+   goto out_iput;
}
-   return d_splice_alias(inode, dentry);
-
-err_out:
+out_splice:
+   new = d_splice_alias(inode, dentry);
+   if (IS_ERR(new))
+   err = PTR_ERR(new);
+   trace_f2fs_lookup_end(dir, dentry, ino, err);
+   return new;
+out_iput:
iput(inode);
+out:
+   trace_f2fs_lookup_end(dir, dentry, ino, err);
return ERR_PTR(err);
 }
 
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 0e7a31694ff5..dcbbe6dcca9c 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -728,6 +728,62 @@ TRACE_EVENT(f2fs_get_victim,
__entry->free)
 );
 
+TRACE_EVENT(f2fs_lookup_start,
+
+   TP_PROTO(struct inode *dir, struct dentry *dentry, unsigned int flags),
+
+   TP_ARGS(dir, dentry, flags),
+
+   TP_STRUCT__entry(
+   __field(dev_t,  dev)
+   __field(ino_t,  ino)
+   __field(const char *,   name)
+   __field(unsigned int, flags)
+   ),
+
+   TP_fast_assign(
+   __entry->dev= dir->i_sb->s_dev;
+   __entry->ino= dir->i_ino;
+   __entry->name   = dentry->d_name.name;
+   __entry->flags  = flags;
+   ),
+
+   TP_printk("dev = (%d,%d), pino = %lu, name:%s, flags:%u",
+   show_dev_ino(__entry),
+   __entry->nam

Re: [f2fs-dev] [PATCH] f2fs: return error number for quota_write

2017-10-17 Thread Chao Yu



On 2017/10/17 7:04, Jaegeuk Kim wrote:
> On 10/16, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2017/10/13 7:15, Jaegeuk Kim wrote:
>>> This patch returns an error number to quota_write in order for quota to 
>>> handle
>>> it correctly.
>>
>> We should return error number like __generic_file_write_iter, right? it
>> needs to return written bytes if we have written one page or more, otherwise
>> return error number feedbacked from write_begin.
>>
>> So how about reverting 4f31d26b0c17 ("f2fs: return wrong error number on
>> f2fs_quota_write")?
> 
> I thought like that, but realized the code change is somewhat different 
> between
> them.

Hmm... main structure of codes here is copied from other file systems, is there
the same problem in *_quota_write of other file systems?

BTW, it looks making below judgment condition being useless.

if (len == towrite)
return 0;

Thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>>
>>>
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/super.c | 5 -
>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index 2feecf5e7f4c..840a0876005b 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -1397,8 +1397,11 @@ static ssize_t f2fs_quota_write(struct super_block 
>>> *sb, int type,
>>>  
>>> err = a_ops->write_begin(NULL, mapping, off, tocopy, 0,
>>> &page, NULL);
>>> -   if (unlikely(err))
>>> +   if (unlikely(err)) {
>>> +   if (len == towrite)
>>> +   return err;
>>> break;
>>> +   }
>>>  
>>> kaddr = kmap_atomic(page);
>>> memcpy(kaddr + offset, data, tocopy);
>>>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>

Re: [f2fs-dev] [PATCH] f2fs: avoid stale fi->gdirty_list pointer

2017-10-17 Thread Chao Yu

On 2017/10/17 7:06, Jaegeuk Kim wrote:
> On 10/16, Chao Yu wrote:
>> On 2017/10/13 10:14, Jaegeuk Kim wrote:
>>> When doing fault injection test, f2fs_evict_inode() didn't remove 
>>> gdirty_list
>>> which incurs a kernel panic due to wrong pointer access.
>>>
>>> Signed-off-by: Jaegeuk Kim 
>>
>> Reviewed-by: Chao Yu 
>>
>> Minor thing, how about reverting judgment condition for readability?
>>
>> if (is_set_ckpt_flags(sbi, CP_ERROR_FLAG))
>>  f2fs_inode_synced()
> 
> We don't need to expect such the errored corner case first. ;)

Alright, making compiler being aware of this by using {un,}likely? :)

Thanks,

> 
>> else
>>  f2fs_bug_on()
>>
>> Thanks,
>>
>>> ---
>>>  fs/f2fs/inode.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
>>> index f6db9d533ca4..1ae5396c97d6 100644
>>> --- a/fs/f2fs/inode.c
>>> +++ b/fs/f2fs/inode.c
>>> @@ -535,6 +535,8 @@ void f2fs_evict_inode(struct inode *inode)
>>>  
>>> if (!is_set_ckpt_flags(sbi, CP_ERROR_FLAG))
>>> f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
>>> +   else
>>> +   f2fs_inode_synced(inode);
>>>  
>>> /* ino == 0, if f2fs_new_inode() was failed t*/
>>> if (inode->i_ino)
>>>
> 
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Linux-f2fs-devel mailing list
> linux-f2fs-de...@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>

Re: [f2fs-dev] [PATCH] f2fs: handle error case when adding xattr entry

2017-10-17 Thread Chao Yu

On 2017/10/17 7:06, Jaegeuk Kim wrote:
> This patch fixes recovering incomplete xattr entries remaining in inline xattr
> and xattr block, caused by any kind of errors.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/xattr.c | 48 
>  1 file changed, 28 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
> index e74a4d7f744a..5a9c5e6ad714 100644
> --- a/fs/f2fs/xattr.c
> +++ b/fs/f2fs/xattr.c
> @@ -389,10 +389,11 @@ static inline int write_all_xattrs(struct inode *inode, 
> __u32 hsize,
>  {
>   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>   size_t inline_size = inline_xattr_size(inode);
> - void *xattr_addr;
> + struct page *in_page = NULL;
> + void *xattr_addr, *inline_addr;
>   struct page *xpage;
>   nid_t new_nid = 0;
> - int err;
> + int err = 0;
>  
>   if (hsize > inline_size && !F2FS_I(inode)->i_xattr_nid)
>   if (!alloc_nid(sbi, &new_nid))
> @@ -400,30 +401,30 @@ static inline int write_all_xattrs(struct inode *inode, 
> __u32 hsize,
>  
>   /* write to inline xattr */
>   if (inline_size) {
> - struct page *page = NULL;
> - void *inline_addr;
> -
>   if (ipage) {
>   inline_addr = inline_xattr_addr(inode, ipage);
> - f2fs_wait_on_page_writeback(ipage, NODE, true);
> - set_page_dirty(ipage);
>   } else {
> - page = get_node_page(sbi, inode->i_ino);
> - if (IS_ERR(page)) {
> + in_page = get_node_page(sbi, inode->i_ino);
> + if (IS_ERR(in_page)) {
>   alloc_nid_failed(sbi, new_nid);
> - return PTR_ERR(page);
> + return PTR_ERR(in_page);
>   }
> - inline_addr = inline_xattr_addr(inode, page);
> - f2fs_wait_on_page_writeback(page, NODE, true);
> + inline_addr = inline_xattr_addr(inode, in_page);
>   }
> - memcpy(inline_addr, txattr_addr, inline_size);
> - f2fs_put_page(page, 1);
>  
> + f2fs_wait_on_page_writeback(ipage ? ipage : in_page,
> + NODE, true);
>   /* no need to use xattr node block */
>   if (hsize <= inline_size) {
>   err = truncate_xattr_node(inode, ipage);

truncate_xattr_node(inode, ipage ? ipage : in_page);

Please add:

Reviewed-by: Chao Yu 

Thanks,

>   alloc_nid_failed(sbi, new_nid);
> - return err;
> + if (err) {
> + f2fs_put_page(in_page, 1);
> + return err;
> + }
> + memcpy(inline_addr, txattr_addr, inline_size);
> + set_page_dirty(ipage ? ipage : in_page);
> + goto in_page_out;
>   }
>   }
>  
> @@ -432,7 +433,7 @@ static inline int write_all_xattrs(struct inode *inode, 
> __u32 hsize,
>   xpage = get_node_page(sbi, F2FS_I(inode)->i_xattr_nid);
>   if (IS_ERR(xpage)) {
>   alloc_nid_failed(sbi, new_nid);
> - return PTR_ERR(xpage);
> + goto in_page_out;
>   }
>   f2fs_bug_on(sbi, new_nid);
>   f2fs_wait_on_page_writeback(xpage, NODE, true);
> @@ -442,17 +443,24 @@ static inline int write_all_xattrs(struct inode *inode, 
> __u32 hsize,
>   xpage = new_node_page(&dn, XATTR_NODE_OFFSET);
>   if (IS_ERR(xpage)) {
>   alloc_nid_failed(sbi, new_nid);
> - return PTR_ERR(xpage);
> + goto in_page_out;
>   }
>   alloc_nid_done(sbi, new_nid);
>   }
> -
>   xattr_addr = page_address(xpage);
> +
> + if (inline_size)
> + memcpy(inline_addr, txattr_addr, inline_size);
>   memcpy(xattr_addr, txattr_addr + inline_size, VALID_XATTR_BLOCK_SIZE);
> +
> + if (inline_size)
> + set_page_dirty(ipage ? ipage : in_page);
>   set_page_dirty(xpage);
> - f2fs_put_page(xpage, 1);
>  
> - return 0;
> + f2fs_put_page(xpage, 1);
> +in_page_out:
> + f2fs_put_page(in_page, 1);
> + return err;
>  }
>  
>  int f2fs_getxattr(struct inode *inode, int index, const char *name,
>

Re: [f2fs-dev] [PATCH] f2fs: use extra parenthesis around assignment/condition

2017-10-17 Thread Chao Yu

On 2017/10/17 17:07, Arnd Bergmann wrote:
> gcc warns that writing a while() loop with an assignment as the condition
> looks suspiciously like a comparison, and suggests a workaround:
> 
> fs/f2fs/checkpoint.c: In function 'sync_meta_pages':
> fs/f2fs/checkpoint.c:321:9: error: suggest parentheses around assignment used 
> as truth value [-Werror=parentheses]
>   while (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
> 
> This seems reasonable, so let's do that.
> 
> Fixes: 4aba7297f4a5 ("f2fs: simplify page iteration loops")
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Chao Yu 

Thanks,

> ---
> The warning is from mmotm. Andrew, please fold this fix into the
> patch that caused the warning, unless there are objections.
> ---
>  fs/f2fs/checkpoint.c |  4 ++--
>  fs/f2fs/node.c   | 16 
>  2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 3ed9dcbf70ae..6124f8710dc3 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -318,8 +318,8 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum 
> page_type type,
>  
>   blk_start_plug(&plug);
>  
> - while (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
> - PAGECACHE_TAG_DIRTY)) {
> + while ((nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
> + PAGECACHE_TAG_DIRTY))) {
>   int i;
>  
>   for (i = 0; i < nr_pages; i++) {
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index d4ceb9ebfe92..d6e4df0bb622 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1285,8 +1285,8 @@ static struct page *last_fsync_dnode(struct 
> f2fs_sb_info *sbi, nid_t ino)
>   pagevec_init(&pvec, 0);
>   index = 0;
>  
> - while (nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
> - PAGECACHE_TAG_DIRTY)) {
> + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
> + PAGECACHE_TAG_DIRTY))) {
>   int i;
>  
>   for (i = 0; i < nr_pages; i++) {
> @@ -1439,8 +1439,8 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct 
> inode *inode,
>   pagevec_init(&pvec, 0);
>   index = 0;
>  
> - while (nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
> - PAGECACHE_TAG_DIRTY)) {
> + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
> + PAGECACHE_TAG_DIRTY))) {
>   int i;
>  
>   for (i = 0; i < nr_pages; i++) {
> @@ -1552,8 +1552,8 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct 
> writeback_control *wbc,
>  next_step:
>   index = 0;
>  
> - while (nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
> - PAGECACHE_TAG_DIRTY)) {
> + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
> + PAGECACHE_TAG_DIRTY))) {
>   int i;
>  
>   for (i = 0; i < nr_pages; i++) {
> @@ -1650,8 +1650,8 @@ int wait_on_node_pages_writeback(struct f2fs_sb_info 
> *sbi, nid_t ino)
>  
>   pagevec_init(&pvec, 0);
>  
> - while (nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
> - PAGECACHE_TAG_WRITEBACK)) {
> + while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
> + PAGECACHE_TAG_WRITEBACK))) {
>   int i;
>  
>   for (i = 0; i < nr_pages; i++) {
>

[PATCH] f2fs: fix to correct no_fggc_candidate

2017-10-17 Thread Chao Yu

From: Chao Yu 

There may be extreme case as below:

For one section contains one segment, and there are total 100 segments
with 10% over-privision ratio in f2fs partition, fggc_threshold will
be rounded down to 460 instead of 460.8 as below caclulation:

sbi->fggc_threshold = div_u64((u64)(main_count - ovp_count) *
BLKS_PER_SEC(sbi), (main_count - resv_count));

If section usage is as:

As valid block number in all sections is large than fggc_threshold, so
none of them will be chosen as candidate due to incorrect fggc_threshold.

Let's just soften the term of choosing foreground GC candidates.

Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 5a1f7b9c8a72..8d93652d5b6a 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -731,7 +731,7 @@ static inline block_t sum_blk_addr(struct f2fs_sb_info 
*sbi, int base, int type)
 static inline bool no_fggc_candidate(struct f2fs_sb_info *sbi,
unsigned int secno)
 {
-   if (get_valid_blocks(sbi, GET_SEG_FROM_SEC(sbi, secno), true) >=
+   if (get_valid_blocks(sbi, GET_SEG_FROM_SEC(sbi, secno), true) >
sbi->fggc_threshold)
return true;
return false;
-- 
2.14.1.145.gb3622a4ee

Re: [f2fs-dev] [PATCH] f2fs: handle error case when adding xattr entry

2017-10-17 Thread Chao Yu

On 2017/10/18 0:41, Jaegeuk Kim wrote:
> On 10/17, Chao Yu wrote:
>> On 2017/10/17 7:06, Jaegeuk Kim wrote:
>>> This patch fixes recovering incomplete xattr entries remaining in inline 
>>> xattr
>>> and xattr block, caused by any kind of errors.
>>>
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/xattr.c | 48 
>>>  1 file changed, 28 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
>>> index e74a4d7f744a..5a9c5e6ad714 100644
>>> --- a/fs/f2fs/xattr.c
>>> +++ b/fs/f2fs/xattr.c
>>> @@ -389,10 +389,11 @@ static inline int write_all_xattrs(struct inode 
>>> *inode, __u32 hsize,
>>>  {
>>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
>>> size_t inline_size = inline_xattr_size(inode);
>>> -   void *xattr_addr;
>>> +   struct page *in_page = NULL;
>>> +   void *xattr_addr, *inline_addr;
>>> struct page *xpage;
>>> nid_t new_nid = 0;
>>> -   int err;
>>> +   int err = 0;
>>>  
>>> if (hsize > inline_size && !F2FS_I(inode)->i_xattr_nid)
>>> if (!alloc_nid(sbi, &new_nid))
>>> @@ -400,30 +401,30 @@ static inline int write_all_xattrs(struct inode 
>>> *inode, __u32 hsize,
>>>  
>>> /* write to inline xattr */
>>> if (inline_size) {
>>> -   struct page *page = NULL;
>>> -   void *inline_addr;
>>> -
>>> if (ipage) {
>>> inline_addr = inline_xattr_addr(inode, ipage);
>>> -   f2fs_wait_on_page_writeback(ipage, NODE, true);
>>> -   set_page_dirty(ipage);
>>> } else {
>>> -   page = get_node_page(sbi, inode->i_ino);
>>> -   if (IS_ERR(page)) {
>>> +   in_page = get_node_page(sbi, inode->i_ino);
>>> +   if (IS_ERR(in_page)) {
>>> alloc_nid_failed(sbi, new_nid);
>>> -   return PTR_ERR(page);
>>> +   return PTR_ERR(in_page);
>>> }
>>> -   inline_addr = inline_xattr_addr(inode, page);
>>> -   f2fs_wait_on_page_writeback(page, NODE, true);
>>> +   inline_addr = inline_xattr_addr(inode, in_page);
>>> }
>>> -   memcpy(inline_addr, txattr_addr, inline_size);
>>> -   f2fs_put_page(page, 1);
>>>  
>>> +   f2fs_wait_on_page_writeback(ipage ? ipage : in_page,
>>> +   NODE, true);
>>> /* no need to use xattr node block */
>>> if (hsize <= inline_size) {
>>> err = truncate_xattr_node(inode, ipage);
>>
>> truncate_xattr_node(inode, ipage ? ipage : in_page);
> 
> No, that should be ipage.

I just noted that dn.inode_page_locked in truncate_xattr_node will be set wrong,
but, anyway, it looks that won't be problem because we didn't use 
inode_page_locked
later.

There is no more users of ipage in truncate_xattr_node, so no matter we passing,
there will be safe for us, right?

Thanks,

> 
> Thanks,
> 
>> Please add:
>>
>> Reviewed-by: Chao Yu 
>>
>> Thanks,
>>
>>> alloc_nid_failed(sbi, new_nid);
>>> -   return err;
>>> +   if (err) {
>>> +   f2fs_put_page(in_page, 1);
>>> +   return err;
>>> +   }
>>> +   memcpy(inline_addr, txattr_addr, inline_size);
>>> +   set_page_dirty(ipage ? ipage : in_page);
>>> +   goto in_page_out;
>>> }
>>> }
>>>  
>>> @@ -432,7 +433,7 @@ static inline int write_all_xattrs(struct inode *inode, 
>>> __u32 hsize,
>>> xpage = get_node_page(sbi, F2FS_I(inode)->i_xattr_nid);
>>> if (IS_ERR(xpage)) {
>>> alloc_nid_failed(sbi, new_nid);
>>> -   return PTR_ERR(xpage);
>>> +   goto in_page_out;
>>> }
>>> f2fs_bug_on(sbi, new_nid);
>>> f2fs_wait_on_page_writeback(xpage, NODE, true);
>>> @@ -442,17 +443,24 @@ static inline int write_al

Re: [f2fs-dev] [PATCH] f2fs: return error number for quota_write

2017-10-17 Thread Chao Yu

On 2017/10/18 2:17, Jaegeuk Kim wrote:
> On 10/17, Chao Yu wrote:
>>
>>
>> On 2017/10/17 7:04, Jaegeuk Kim wrote:
>>> On 10/16, Chao Yu wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> On 2017/10/13 7:15, Jaegeuk Kim wrote:
>>>>> This patch returns an error number to quota_write in order for quota to 
>>>>> handle
>>>>> it correctly.
>>>>
>>>> We should return error number like __generic_file_write_iter, right? it
>>>> needs to return written bytes if we have written one page or more, 
>>>> otherwise
>>>> return error number feedbacked from write_begin.
>>>>
>>>> So how about reverting 4f31d26b0c17 ("f2fs: return wrong error number on
>>>> f2fs_quota_write")?
>>>
>>> I thought like that, but realized the code change is somewhat different 
>>> between
>>> them.
>>
>> Hmm... main structure of codes here is copied from other file systems, is 
>> there
>> the same problem in *_quota_write of other file systems?
>>
>> BTW, it looks making below judgment condition being useless.
>>
>>  if (len == towrite)
>>  return 0;
> 
> We need this to avoid needless inode updates. :P

For err = 0 and len == towrite case, it more likes a bug of quota that passing
0 in @len.

:(, Oh, still didn't get that why there is difference in between reverting and
this fixing. Can you please explain more about this?

Thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>>
>>>
>>> Thanks,
>>>
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>> Signed-off-by: Jaegeuk Kim 
>>>>> ---
>>>>>  fs/f2fs/super.c | 5 -
>>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>> index 2feecf5e7f4c..840a0876005b 100644
>>>>> --- a/fs/f2fs/super.c
>>>>> +++ b/fs/f2fs/super.c
>>>>> @@ -1397,8 +1397,11 @@ static ssize_t f2fs_quota_write(struct super_block 
>>>>> *sb, int type,
>>>>>  
>>>>>   err = a_ops->write_begin(NULL, mapping, off, tocopy, 0,
>>>>>   &page, NULL);
>>>>> - if (unlikely(err))
>>>>> + if (unlikely(err)) {
>>>>> + if (len == towrite)
>>>>> + return err;
>>>>>   break;
>>>>> + }
>>>>>  
>>>>>   kaddr = kmap_atomic(page);
>>>>>   memcpy(kaddr + offset, data, tocopy);
>>>>>
>>>
>>> --
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>>> ___
>>> Linux-f2fs-devel mailing list
>>> linux-f2fs-de...@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
>>>
> 
> .
>

Re: [f2fs-dev] [PATCH] f2fs: modify the procedure of scan free nid

2017-10-31 Thread Chao Yu

On 2017/10/31 21:37, Fan Li wrote:
> In current version, we preserve 8 pages of nat blocks as free nids,
> build bitmaps for it and use them to allocate nids until its number
> drops below NAT_ENTRY_PER_BLOCK.
> 
> After that, we have a problem, scan_free_nid_bits will scan the same
> 8 pages trying to find more free nids, but in most cases the free nids
> in these bitmaps are already in free list, scan them won't get us any
> new nids.
> Further more, after scan_free_nid_bits, the search is over if
> nid_cnt[FREE_NID] != 0.
> It causes that we scan the same pages over and over again, yet no new
> free nids are found until nid_cnt[FREE_NID]==0.
> 
> This patch mark the range where new free nids could exist and keep scan
> for free nids until nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK.
> The new vairable first_scan_block marks the start of the range, it's
> initialized with NEW_ADDR, which means all free nids before next_scan_nid
> are already in free list;
> and use next_scan_nid as the end of the range since all free nids which
> are scanned must be smaller next_scan_nid.
> 
> 
> Signed-off-by: Fan li 
> ---
>  fs/f2fs/f2fs.h |  1 +
>  fs/f2fs/node.c | 30 ++
>  2 files changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index e0ef31c..ae1cf91 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -705,6 +705,7 @@ struct f2fs_nm_info {
>   nid_t max_nid;  /* maximum possible node ids */
>   nid_t available_nids;   /* # of available node ids */
>   nid_t next_scan_nid;/* the next nid to be scanned */
> + block_t first_scan_block;   /* the first NAT block to be scanned */

As we are traveling bitmap, so how about using smaller granularity for tracking
last-scanned-position. like:

unsigned next_bitmap_pos; ?

>   unsigned int ram_thresh;/* control the memory footprint */
>   unsigned int ra_nid_pages;  /* # of nid pages to be readaheaded */
>   unsigned int dirty_nats_ratio;  /* control dirty nats ratio threshold */
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 3d0d1be..7834097 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -1950,10 +1950,23 @@ static void scan_free_nid_bits(struct f2fs_sb_info 
> *sbi)
>   struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
>   struct f2fs_journal *journal = curseg->journal;
>   unsigned int i, idx;
> + unsigned int max_blocks = NAT_BLOCK_OFFSET(nm_i->next_scan_nid);
>  
> - down_read(&nm_i->nat_tree_lock);
> + /* every free nid in blocks scanned previously is in the free list */
> + if (nm_i->first_scan_block == NEW_ADDR)

How about using nm_i->max_nid as no more free nids in bitmap?

> + return;
>  
> - for (i = 0; i < nm_i->nat_blocks; i++) {
> + /*
> +  * TODO: "next_scan_nid == 0" means after searching every nat block,
> +  *   we still can't find enough free nids, there may not be any
> +  *   more nid left to be found, we should stop at somewhere
> +  *   instead of going through these all over again.
> +  */
> + if (max_blocks == 0)
> + max_blocks = nm_i->nat_blocks;
> +
> + down_read(&nm_i->nat_tree_lock);
> + for (i = nm_i->first_scan_block; i < max_blocks; i++) {

Free nids could be set free after nodes were truncated & checkpoint, if
we start from first_scan_block, we will miss some free nids.

Thanks,

>   if (!test_bit_le(i, nm_i->nat_block_bitmap))
>   continue;
>   if (!nm_i->free_nid_count[i])
> @@ -1967,10 +1980,13 @@ static void scan_free_nid_bits(struct f2fs_sb_info 
> *sbi)
>   nid = i * NAT_ENTRY_PER_BLOCK + idx;
>   add_free_nid(sbi, nid, true);
>  
> - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
> + if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) {
> + nm_i->first_scan_block = i;
>   goto out;
> + }
>   }
>   }
> + nm_i->first_scan_block = NEW_ADDR;
>  out:
>   down_read(&curseg->journal_rwsem);
>   for (i = 0; i < nats_in_cursum(journal); i++) {
> @@ -2010,7 +2026,7 @@ static void __build_free_nids(struct f2fs_sb_info *sbi, 
> bool sync, bool mount)
>   /* try to find free nids in free_nid_bitmap */
>   scan_free_nid_bits(sbi);
>  
> - if (nm_i->nid_cnt[FREE_NID])
> + if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK)
>   return;
>   }
>  
> @@ -2163,6 +2179,7 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, int 
> nr_shrink)
>   struct f2fs_nm_info *nm_i = NM_I(sbi);
>   struct free_nid *i, *next;
>   int nr = nr_shrink;
> + nid_t min_nid = nm_i->max_nid;
>  
>   if (nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS)
>   return 0;
> @@ -2176,11 +2193,

Re: [PATCH] f2fs: don't bother with inode->i_version

2017-10-31 Thread Chao Yu

On 2017/10/30 23:11, Jeff Layton wrote:
> From: Jeff Layton 
> 
> f2fs does not set the SB_I_VERSION flag, so the i_version will never
> be incremented on write. It was recently changed to increment the
> i_version on a quota write, which isn't necessary here.
> 
> Signed-off-by: Jeff Layton 

Reviewed-by: Chao Yu 

Thanks,

> ---
>  fs/f2fs/super.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 933c3d529e65..b3359158e7be 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -618,7 +618,6 @@ static struct inode *f2fs_alloc_inode(struct super_block 
> *sb)
>   init_once((void *) fi);
>  
>   /* Initialize f2fs-specific inode info */
> - fi->vfs_inode.i_version = 1;
>   atomic_set(&fi->dirty_pages, 0);
>   fi->i_current_depth = 1;
>   fi->i_advise = 0;
> @@ -1386,7 +1385,6 @@ static ssize_t f2fs_quota_write(struct super_block *sb, 
> int type,
>  
>   if (len == towrite)
>   return 0;
> - inode->i_version++;
>   inode->i_mtime = inode->i_ctime = current_time(inode);
>   f2fs_mark_inode_dirty_sync(inode, false);
>   return len - towrite;
>

Re: [f2fs-dev] [PATCH] f2fs: modify the procedure of scan free nid

2017-11-01 Thread Chao Yu

On 2017/11/1 18:03, Fan Li wrote:
> 
> 
>> -Original Message-----
>> From: Chao Yu [mailto:c...@kernel.org]
>> Sent: Tuesday, October 31, 2017 10:32 PM
>> To: Fan Li; 'Jaegeuk Kim'
>> Cc: linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net
>> Subject: Re: [f2fs-dev] [PATCH] f2fs: modify the procedure of scan free nid
>>
>> On 2017/10/31 21:37, Fan Li wrote:
>>> In current version, we preserve 8 pages of nat blocks as free nids,
>>> build bitmaps for it and use them to allocate nids until its number
>>> drops below NAT_ENTRY_PER_BLOCK.
>>>
>>> After that, we have a problem, scan_free_nid_bits will scan the same
>>> 8 pages trying to find more free nids, but in most cases the free nids
>>> in these bitmaps are already in free list, scan them won't get us any
>>> new nids.
>>> Further more, after scan_free_nid_bits, the search is over if
>>> nid_cnt[FREE_NID] != 0.
>>> It causes that we scan the same pages over and over again, yet no new
>>> free nids are found until nid_cnt[FREE_NID]==0.
>>>
>>> This patch mark the range where new free nids could exist and keep
>>> scan for free nids until nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK.
>>> The new vairable first_scan_block marks the start of the range, it's
>>> initialized with NEW_ADDR, which means all free nids before
>>> next_scan_nid are already in free list; and use next_scan_nid as the
>>> end of the range since all free nids which are scanned must be smaller
>>> next_scan_nid.
>>>
>>>
>>> Signed-off-by: Fan li 
>>> ---
>>>  fs/f2fs/f2fs.h |  1 +
>>>  fs/f2fs/node.c | 30 ++
>>>  2 files changed, 27 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e0ef31c..ae1cf91
>>> 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -705,6 +705,7 @@ struct f2fs_nm_info {
>>> nid_t max_nid;  /* maximum possible node ids */
>>> nid_t available_nids;   /* # of available node ids */
>>> nid_t next_scan_nid;/* the next nid to be scanned */
>>> +   block_t first_scan_block;   /* the first NAT block to be scanned */
>>
>> As we are traveling bitmap, so how about using smaller granularity for 
>> tracking last-scanned-position. like:
>>
>> unsigned next_bitmap_pos; ?
>>
> Yes, I think it's a good idea, but original code scans nids by blocks, if I 
> change that, I need to change some
> other details too, and before that, I want to make sure this idea of patch is 
> right.
> I also have some ideas about it, if that's OK, I tend to submit other patches 
> to implement them.
> 
>>> unsigned int ram_thresh;/* control the memory footprint */
>>> unsigned int ra_nid_pages;  /* # of nid pages to be readaheaded */
>>> unsigned int dirty_nats_ratio;  /* control dirty nats ratio threshold */
>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d0d1be..7834097
>>> 100644
>>> --- a/fs/f2fs/node.c
>>> +++ b/fs/f2fs/node.c
>>> @@ -1950,10 +1950,23 @@ static void scan_free_nid_bits(struct f2fs_sb_info 
>>> *sbi)
>>> struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
>>> struct f2fs_journal *journal = curseg->journal;
>>> unsigned int i, idx;
>>> +   unsigned int max_blocks = NAT_BLOCK_OFFSET(nm_i->next_scan_nid);
>>>
>>> -   down_read(&nm_i->nat_tree_lock);
>>> +   /* every free nid in blocks scanned previously is in the free list */
>>> +   if (nm_i->first_scan_block == NEW_ADDR)
>>
>> How about using nm_i->max_nid as no more free nids in bitmap?
>>
> For now, I use the block as the unit of variable first_scan_block, for the 
> same reason above,
> I tend to change it in another patch.
> 
>>> +   return;
>>>
>>> -   for (i = 0; i < nm_i->nat_blocks; i++) {
>>> +   /*
>>> +* TODO: "next_scan_nid == 0" means after searching every nat block,
>>> +*   we still can't find enough free nids, there may not be any
>>> +*   more nid left to be found, we should stop at somewhere
>>> +*   instead of going through these all over again.
>>> +*/

How about trying avoid todo thing in our patch, if our new feature is not
so complicate or big.

>>> +   if (max_blocks == 0)
>>> +

[PATCH 2/4] f2fs: remove dead code in update_meta_page

2017-11-02 Thread Chao Yu

After commit a468f0ef516f ("f2fs: use crc and cp version to determine
roll-forward recovery"), last caller of update_meta_page passing @src
with NULL is gone, so remove related dead code there.

Signed-off-by: Chao Yu 
---
 fs/f2fs/segment.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 83125f92accc..eece3804c049 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2028,12 +2028,8 @@ struct page *get_sum_page(struct f2fs_sb_info *sbi, 
unsigned int segno)
 void update_meta_page(struct f2fs_sb_info *sbi, void *src, block_t blk_addr)
 {
struct page *page = grab_meta_page(sbi, blk_addr);
-   void *dst = page_address(page);
 
-   if (src)
-   memcpy(dst, src, PAGE_SIZE);
-   else
-   memset(dst, 0, PAGE_SIZE);
+   memcpy(page_address(page), src, PAGE_SIZE);
set_page_dirty(page);
f2fs_put_page(page, 1);
 }
-- 
2.13.1.388.g69e6b9b4f4a9

[PATCH 3/4] f2fs: fix summary info corruption

2017-11-02 Thread Chao Yu

Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.

The root cause is race in between __f2fs_replace_block and change_curseg
as below:

Thread AThread B
- __clone_blkaddrs
 - f2fs_replace_block
  - __f2fs_replace_block
   - segnoA = GET_SEGNO(sbi, blkaddrA);
   - type = se->type:=CURSEG_HOT_DATA
   - if (!IS_CURSEG(sbi, segnoA))
 type = CURSEG_WARM_DATA
- allocate_data_block
 - allocate_segment
  - get_ssr_segment
  - change_curseg(segnoA, 
CURSEG_HOT_DATA)
   - change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
 - __set_sit_entry_type
  - change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA

So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.

Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.

But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.

This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.

Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h|  2 ++
 fs/f2fs/segment.c | 28 +++-
 2 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index e1d3a940d9f8..4109489afa14 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -820,6 +820,8 @@ struct f2fs_sm_info {
struct dirty_seglist_info *dirty_info;  /* dirty segment information */
struct curseg_info *curseg_array;   /* active segment information */
 
+   struct rw_semaphore curseg_lock;/* for preventing curseg change 
*/
+
block_t seg0_blkaddr;   /* block address of 0'th segment */
block_t main_blkaddr;   /* start block address of main area */
block_t ssa_blkaddr;/* start block address of SSA area */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index eece3804c049..9a3a386155e8 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -2549,6 +2549,8 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct 
page *page,
struct sit_info *sit_i = SIT_I(sbi);
struct curseg_info *curseg = CURSEG_I(sbi, type);
 
+   down_read(&SM_I(sbi)->curseg_lock);
+
mutex_lock(&curseg->curseg_mutex);
down_write(&sit_i->sentry_lock);
 
@@ -2606,6 +2608,8 @@ void allocate_data_block(struct f2fs_sb_info *sbi, struct 
page *page,
}
 
mutex_unlock(&curseg->curseg_mutex);
+
+   up_read(&SM_I(sbi)->curseg_lock);
 }
 
 static void update_device_state(struct f2fs_io_info *fio)
@@ -2713,6 +2717,18 @@ int rewrite_data_page(struct f2fs_io_info *fio)
return err;
 }
 
+static inline int __f2fs_get_curseg(struct f2fs_sb_info *sbi,
+   unsigned int segno)
+{
+   int i;
+
+   for (i = CURSEG_HOT_DATA; i < NO_CHECK_TYPE; i++) {
+   if (CURSEG_I(sbi, i)->segno == segno)
+   break;
+   }
+   return i;
+}
+
 void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
block_t old_blkaddr, block_t new_blkaddr,
bool recover_curseg, bool recover_newaddr)
@@ -2728,6 +2744,8 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
se = get_seg_entry(sbi, segno);
type = se->type;
 
+   down_write(&SM_I(sbi)->curseg_lock);
+
if (!recover_curseg) {
/* for recovery flow */
if (se->valid_blocks == 0 && !IS_CURSEG(sbi, segno)) {
@@ -2737,8 +2755,13 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
type = CURSEG_WARM_DATA;
}
} else {
-   if (!IS_CURSEG(sbi, segno))
+   if (IS_CURSEG(sbi, segno)) {
+   /* se->type is volatile as SSR allocation */
+   type = __f2fs_get_curseg(sbi, segno);
+   f2fs_bug_on(sbi, type == NO_CHECK_TYPE);
+   } else {
type = CURSEG_WARM_DATA;
+   }
}
 
curseg = CURSEG_I(sbi, type);
@@ -2778,6 +2801,7 @@ void __f2fs_replace_block(struct f2fs_sb_info *sbi, 
struct f2fs_summary *sum,
 
up_write(&sit_i->sentry_lock);
mutex_unlock(&curseg->curseg_mutex);
+   up_write(&SM_I(sbi)-&

[PATCH 1/4] f2fs: remove unneeded semicolon

2017-11-02 Thread Chao Yu

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 78e1b2998bbd..98777c1ae70c 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -1016,7 +1016,7 @@ int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi)
update_inode_page(inode);
iput(inode);
}
-   };
+   }
return 0;
 }
 
-- 
2.13.1.388.g69e6b9b4f4a9

[PATCH 4/4] f2fs: avoid race in between GC and block exchange

2017-11-02 Thread Chao Yu

During block exchange in {insert,collapse,move}_range, page-block mapping
is unstable due to mapping moving or recovery, so there should be no
concurrent cache read operation rely on such mapping, nor cache write
operation to mess up block exchange.

So this patch let background GC be aware of that.

Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 0e09b9f02dc5..21ae4faa7c58 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1186,6 +1186,9 @@ static int f2fs_collapse_range(struct inode *inode, 
loff_t offset, loff_t len)
if (ret)
goto out;
 
+   /* avoid gc operation during block exchange */
+   down_write(&F2FS_I(inode)->dio_rwsem[WRITE]);
+
truncate_pagecache(inode, offset);
 
ret = f2fs_do_collapse(inode, pg_start, pg_end);
@@ -1204,6 +1207,7 @@ static int f2fs_collapse_range(struct inode *inode, 
loff_t offset, loff_t len)
f2fs_i_size_write(inode, new_size);
 
 out:
+   up_write(&F2FS_I(inode)->dio_rwsem[WRITE]);
up_write(&F2FS_I(inode)->i_mmap_sem);
return ret;
 }
@@ -1385,6 +1389,9 @@ static int f2fs_insert_range(struct inode *inode, loff_t 
offset, loff_t len)
if (ret)
goto out;
 
+   /* avoid gc operation during block exchange */
+   down_write(&F2FS_I(inode)->dio_rwsem[WRITE]);
+
truncate_pagecache(inode, offset);
 
pg_start = offset >> PAGE_SHIFT;
@@ -1412,6 +1419,8 @@ static int f2fs_insert_range(struct inode *inode, loff_t 
offset, loff_t len)
 
if (!ret)
f2fs_i_size_write(inode, new_size);
+
+   up_write(&F2FS_I(inode)->dio_rwsem[WRITE]);
 out:
up_write(&F2FS_I(inode)->i_mmap_sem);
return ret;
@@ -2274,9 +2283,13 @@ static int f2fs_move_file_range(struct file *file_in, 
loff_t pos_in,
}
 
inode_lock(src);
+   down_write(&F2FS_I(src)->dio_rwsem[WRITE]);
if (src != dst) {
-   if (!inode_trylock(dst)) {
-   ret = -EBUSY;
+   ret = -EBUSY;
+   if (!inode_trylock(dst))
+   goto out;
+   if (!down_write_trylock(&F2FS_I(dst)->dio_rwsem[WRITE])) {
+   inode_unlock(dst);
goto out;
}
}
@@ -2336,9 +2349,12 @@ static int f2fs_move_file_range(struct file *file_in, 
loff_t pos_in,
}
f2fs_unlock_op(sbi);
 out_unlock:
-   if (src != dst)
+   if (src != dst) {
+   up_write(&F2FS_I(dst)->dio_rwsem[WRITE]);
inode_unlock(dst);
+   }
 out:
+   up_write(&F2FS_I(src)->dio_rwsem[WRITE]);
inode_unlock(src);
return ret;
 }
-- 
2.13.1.388.g69e6b9b4f4a9

Re: [f2fs-dev] [PATCH] f2fs: modify the procedure of scan free nid

2017-11-02 Thread Chao Yu

On 2017/11/2 10:38, Fan Li wrote:
> 
> 
>> -Original Message-----
>> From: Chao Yu [mailto:c...@kernel.org]
>> Sent: Wednesday, November 01, 2017 8:47 PM
>> To: Fan Li; 'Jaegeuk Kim'
>> Cc: linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net
>> Subject: Re: [f2fs-dev] [PATCH] f2fs: modify the procedure of scan free nid
>>
>> On 2017/11/1 18:03, Fan Li wrote:
>>>
>>>
>>>> -Original Message-
>>>> From: Chao Yu [mailto:c...@kernel.org]
>>>> Sent: Tuesday, October 31, 2017 10:32 PM
>>>> To: Fan Li; 'Jaegeuk Kim'
>>>> Cc: linux-kernel@vger.kernel.org;
>>>> linux-f2fs-de...@lists.sourceforge.net
>>>> Subject: Re: [f2fs-dev] [PATCH] f2fs: modify the procedure of scan
>>>> free nid
>>>>
>>>> On 2017/10/31 21:37, Fan Li wrote:
>>>>> In current version, we preserve 8 pages of nat blocks as free nids,
>>>>> build bitmaps for it and use them to allocate nids until its number
>>>>> drops below NAT_ENTRY_PER_BLOCK.
>>>>>
>>>>> After that, we have a problem, scan_free_nid_bits will scan the same
>>>>> 8 pages trying to find more free nids, but in most cases the free
>>>>> nids in these bitmaps are already in free list, scan them won't get
>>>>> us any new nids.
>>>>> Further more, after scan_free_nid_bits, the search is over if
>>>>> nid_cnt[FREE_NID] != 0.
>>>>> It causes that we scan the same pages over and over again, yet no
>>>>> new free nids are found until nid_cnt[FREE_NID]==0.
>>>>>
>>>>> This patch mark the range where new free nids could exist and keep
>>>>> scan for free nids until nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK.
>>>>> The new vairable first_scan_block marks the start of the range, it's
>>>>> initialized with NEW_ADDR, which means all free nids before
>>>>> next_scan_nid are already in free list; and use next_scan_nid as the
>>>>> end of the range since all free nids which are scanned must be
>>>>> smaller next_scan_nid.
>>>>>
>>>>>
>>>>> Signed-off-by: Fan li 
>>>>> ---
>>>>>  fs/f2fs/f2fs.h |  1 +
>>>>>  fs/f2fs/node.c | 30 ++
>>>>>  2 files changed, 27 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e0ef31c..ae1cf91
>>>>> 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -705,6 +705,7 @@ struct f2fs_nm_info {
>>>>>   nid_t max_nid;  /* maximum possible node ids */
>>>>>   nid_t available_nids;   /* # of available node ids */
>>>>>   nid_t next_scan_nid;/* the next nid to be scanned */
>>>>> + block_t first_scan_block;   /* the first NAT block to be scanned */
>>>>
>>>> As we are traveling bitmap, so how about using smaller granularity for 
>>>> tracking last-scanned-position. like:
>>>>
>>>> unsigned next_bitmap_pos; ?
>>>>
>>> Yes, I think it's a good idea, but original code scans nids by blocks,
>>> if I change that, I need to change some other details too, and before that, 
>>> I want to make sure this idea of patch is right.
>>> I also have some ideas about it, if that's OK, I tend to submit other 
>>> patches to implement them.
>>>
>>>>>   unsigned int ram_thresh;/* control the memory footprint */
>>>>>   unsigned int ra_nid_pages;  /* # of nid pages to be readaheaded */
>>>>>   unsigned int dirty_nats_ratio;  /* control dirty nats ratio threshold */
>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d0d1be..7834097
>>>>> 100644
>>>>> --- a/fs/f2fs/node.c
>>>>> +++ b/fs/f2fs/node.c
>>>>> @@ -1950,10 +1950,23 @@ static void scan_free_nid_bits(struct 
>>>>> f2fs_sb_info *sbi)
>>>>>   struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
>>>>>   struct f2fs_journal *journal = curseg->journal;
>>>>>   unsigned int i, idx;
>>>>> + unsigned int max_blocks = NAT_BLOCK_OFFSET(nm_i->next_scan_nid);
>>>>>
>>>>> - down_read(&nm_i->nat_tree_lock);
>>>>> + /* every free ni

Re: [f2fs-dev] [PATCH] f2fs: save a multiplication for last_nid calculation

2017-11-02 Thread Chao Yu

On 2017/11/2 11:02, Fan Li wrote:
> Use a slightly easier way to calculate last_nid.
> 
> Signed-off-by: Fan li 

Reviewed-by: Chao Yu 

Thanks,

> ---
>  fs/f2fs/node.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
> index 7834097..55ab330 100644
> --- a/fs/f2fs/node.c
> +++ b/fs/f2fs/node.c
> @@ -2642,7 +2642,7 @@ static inline void load_free_nid_bitmap(struct 
> f2fs_sb_info *sbi)
> __set_bit_le(i, nm_i->nat_block_bitmap);
> 
> nid = i * NAT_ENTRY_PER_BLOCK;
> -   last_nid = (i + 1) * NAT_ENTRY_PER_BLOCK;
> +   last_nid = nid + NAT_ENTRY_PER_BLOCK;
> 
> spin_lock(&NM_I(sbi)->nid_list_lock);
> for (; nid < last_nid; nid++)
> --
> 2.7.4
>

Re: [f2fs-dev] [PATCH 1/2] f2fs: add quota_ino feature infra

2017-11-02 Thread Chao Yu

On 2017/10/31 11:40, Jaegeuk Kim wrote:
> This patch adds quota_ino feature infra to be used for quota files.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu 

> ---
>  fs/f2fs/f2fs.h  | 6 ++
>  fs/f2fs/sysfs.c | 7 +++
>  include/linux/f2fs_fs.h | 6 +-
>  3 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4a75f07f1dc8..9a1c7ffa6845 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -122,6 +122,7 @@ struct f2fs_mount_info {
>  #define F2FS_FEATURE_PRJQUOTA0x0010
>  #define F2FS_FEATURE_INODE_CHKSUM0x0020
>  #define F2FS_FEATURE_FLEXIBLE_INLINE_XATTR   0x0040
> +#define F2FS_FEATURE_QUOTA_INO   0x0080
>  
>  #define F2FS_HAS_FEATURE(sb, mask)   \
>   ((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0)
> @@ -3070,6 +3071,11 @@ static inline int 
> f2fs_sb_has_flexible_inline_xattr(struct super_block *sb)
>   return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_FLEXIBLE_INLINE_XATTR);
>  }
>  
> +static inline int f2fs_sb_has_quota_ino(struct super_block *sb)
> +{
> + return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_QUOTA_INO);
> +}
> +
>  #ifdef CONFIG_BLK_DEV_ZONED
>  static inline int get_blkz_type(struct f2fs_sb_info *sbi,
>   struct block_device *bdev, block_t blkaddr)
> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> index f0fdc89ce82f..9835348b6e5d 100644
> --- a/fs/f2fs/sysfs.c
> +++ b/fs/f2fs/sysfs.c
> @@ -110,6 +110,9 @@ static ssize_t features_show(struct f2fs_attr *a,
>   if (f2fs_sb_has_flexible_inline_xattr(sb))
>   len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
>   len ? ", " : "", "flexible_inline_xattr");
> + if (f2fs_sb_has_quota_ino(sb))
> + len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
> + len ? ", " : "", "quota_ino");
>   len += snprintf(buf + len, PAGE_SIZE - len, "\n");
>   return len;
>  }
> @@ -227,6 +230,7 @@ enum feat_id {
>   FEAT_PROJECT_QUOTA,
>   FEAT_INODE_CHECKSUM,
>   FEAT_FLEXIBLE_INLINE_XATTR,
> + FEAT_QUOTA_INO,
>  };
>  
>  static ssize_t f2fs_feature_show(struct f2fs_attr *a,
> @@ -240,6 +244,7 @@ static ssize_t f2fs_feature_show(struct f2fs_attr *a,
>   case FEAT_PROJECT_QUOTA:
>   case FEAT_INODE_CHECKSUM:
>   case FEAT_FLEXIBLE_INLINE_XATTR:
> + case FEAT_QUOTA_INO:
>   return snprintf(buf, PAGE_SIZE, "supported\n");
>   }
>   return 0;
> @@ -314,6 +319,7 @@ F2FS_FEATURE_RO_ATTR(extra_attr, FEAT_EXTRA_ATTR);
>  F2FS_FEATURE_RO_ATTR(project_quota, FEAT_PROJECT_QUOTA);
>  F2FS_FEATURE_RO_ATTR(inode_checksum, FEAT_INODE_CHECKSUM);
>  F2FS_FEATURE_RO_ATTR(flexible_inline_xattr, FEAT_FLEXIBLE_INLINE_XATTR);
> +F2FS_FEATURE_RO_ATTR(quota_ino, FEAT_QUOTA_INO);
>  
>  #define ATTR_LIST(name) (&f2fs_attr_##name.attr)
>  static struct attribute *f2fs_attrs[] = {
> @@ -364,6 +370,7 @@ static struct attribute *f2fs_feat_attrs[] = {
>   ATTR_LIST(project_quota),
>   ATTR_LIST(inode_checksum),
>   ATTR_LIST(flexible_inline_xattr),
> + ATTR_LIST(quota_ino),
>   NULL,
>  };
>  
> diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
> index 50a8ee501bf1..ce34007972c3 100644
> --- a/include/linux/f2fs_fs.h
> +++ b/include/linux/f2fs_fs.h
> @@ -36,6 +36,9 @@
>  #define F2FS_NODE_INO(sbi)   ((sbi)->node_ino_num)
>  #define F2FS_META_INO(sbi)   ((sbi)->meta_ino_num)
>  
> +#define F2FS_QUOTA_INO   3
> +#define F2FS_MAX_QUOTAS  3
> +
>  #define F2FS_IO_SIZE(sbi)(1 << (sbi)->write_io_size_bits) /* Blocks */
>  #define F2FS_IO_SIZE_KB(sbi) (1 << ((sbi)->write_io_size_bits + 2)) /* KB */
>  #define F2FS_IO_SIZE_BYTES(sbi)  (1 << ((sbi)->write_io_size_bits + 12)) 
> /* B */
> @@ -108,7 +111,8 @@ struct f2fs_super_block {
>   __u8 encryption_level;  /* versioning level for encryption */
>   __u8 encrypt_pw_salt[16];   /* Salt used for string2key algorithm */
>   struct f2fs_device devs[MAX_DEVICES];   /* device list */
> - __u8 reserved[327]; /* valid reserved region */
> + __le32 qf_ino[F2FS_MAX_QUOTAS]; /* quota inode numbers */
> + __u8 reserved[315]; /* valid reserved region */
>  } __packed;
>  
>  /*
>

Re: [f2fs-dev] [PATCH 2/2] f2fs: support quota sys files

2017-11-02 Thread Chao Yu

On 2017/10/31 11:40, Jaegeuk Kim wrote:
> This patch supports hidden quota files in the system, which will be used for
> Android. It requires up-to-date f2fs-tools later than v1.9.0.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/checkpoint.c |   9 +++-
>  fs/f2fs/f2fs.h   |   9 +++-
>  fs/f2fs/recovery.c   |   8 ++-
>  fs/f2fs/super.c  | 145 
> ++-
>  4 files changed, 153 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 6b52d4b66c7b..78e1b2998bbd 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -615,6 +615,9 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
>   block_t start_blk, orphan_blocks, i, j;
>   unsigned int s_flags = sbi->sb->s_flags;
>   int err = 0;
> +#ifdef CONFIG_QUOTA
> + int quota_enabled;
> +#endif
>  
>   if (!is_set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG))
>   return 0;
> @@ -627,8 +630,9 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
>  #ifdef CONFIG_QUOTA
>   /* Needed for iput() to work correctly and not trash data */
>   sbi->sb->s_flags |= MS_ACTIVE;
> +
>   /* Turn on quotas so that they are updated correctly */
> - f2fs_enable_quota_files(sbi);
> + quota_enabled = f2fs_enable_quota_files(sbi, s_flags & MS_RDONLY);
>  #endif
>  
>   start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
> @@ -656,7 +660,8 @@ int recover_orphan_inodes(struct f2fs_sb_info *sbi)
>  out:
>  #ifdef CONFIG_QUOTA
>   /* Turn quotas off */
> - f2fs_quota_off_umount(sbi->sb);
> + if (quota_enabled)
> + f2fs_quota_off_umount(sbi->sb);
>  #endif
>   sbi->sb->s_flags = s_flags; /* Restore MS_RDONLY status */
>  
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 9a1c7ffa6845..e1d3a940d9f8 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -1384,6 +1384,13 @@ static inline unsigned long long cur_cp_version(struct 
> f2fs_checkpoint *cp)
>   return le64_to_cpu(cp->checkpoint_ver);
>  }
>  
> +static inline unsigned long f2fs_qf_ino(struct super_block *sb, int type)
> +{
> + if (type < F2FS_QUOTA_INO)

Why not just use F2FS_MAX_QUOTAS instead of F2FS_QUOTA_INO? since in patch 1/2
we have defined qf_ino array with upper-boundary F2FS_MAX_QUOTAS.

__le32 qf_ino[F2FS_MAX_QUOTAS]; /* quota inode numbers */

Other part looks good to me. ;)

Reviewed-by: Chao Yu 

Thanks,

> + return le32_to_cpu(F2FS_SB(sb)->raw_super->qf_ino[type]);
> + return 0;
> +}
> +
>  static inline __u64 cur_cp_crc(struct f2fs_checkpoint *cp)
>  {
>   size_t crc_offset = le32_to_cpu(cp->checksum_offset);
> @@ -2526,7 +2533,7 @@ static inline int f2fs_add_link(struct dentry *dentry, 
> struct inode *inode)
>   */
>  int f2fs_inode_dirtied(struct inode *inode, bool sync);
>  void f2fs_inode_synced(struct inode *inode);
> -void f2fs_enable_quota_files(struct f2fs_sb_info *sbi);
> +int f2fs_enable_quota_files(struct f2fs_sb_info *sbi, bool rdonly);
>  void f2fs_quota_off_umount(struct super_block *sb);
>  int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover);
>  int f2fs_sync_fs(struct super_block *sb, int sync);
> diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
> index 9626758bc762..92c57ace1939 100644
> --- a/fs/f2fs/recovery.c
> +++ b/fs/f2fs/recovery.c
> @@ -594,6 +594,9 @@ int recover_fsync_data(struct f2fs_sb_info *sbi, bool 
> check_only)
>   int ret = 0;
>   unsigned long s_flags = sbi->sb->s_flags;
>   bool need_writecp = false;
> +#ifdef CONFIG_QUOTA
> + int quota_enabled;
> +#endif
>  
>   if (s_flags & MS_RDONLY) {
>   f2fs_msg(sbi->sb, KERN_INFO, "orphan cleanup on readonly fs");
> @@ -604,7 +607,7 @@ int recover_fsync_data(struct f2fs_sb_info *sbi, bool 
> check_only)
>   /* Needed for iput() to work correctly and not trash data */
>   sbi->sb->s_flags |= MS_ACTIVE;
>   /* Turn on quotas so that they are updated correctly */
> - f2fs_enable_quota_files(sbi);
> + quota_enabled = f2fs_enable_quota_files(sbi, s_flags & MS_RDONLY);
>  #endif
>  
>   fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
> @@ -665,7 +668,8 @@ int recover_fsync_data(struct f2fs_sb_info *sbi, bool 
> check_only)
>  out:
>  #ifdef CONFIG_QUOTA
>   /* Turn quotas off */
> - f2fs_quota_off_umount(sbi->sb);
> + if (quota_enabled)
> + f2fs_quota_off_umount(sbi->sb);
>  #endif
>   sbi->sb->s_flags = s_flags; /* Restore MS_RDONLY status */
>  
> diff

Re: [PATCH 1/2] f2fs: add missing quota_initialize in f2fs_set_acl

2017-10-24 Thread Chao Yu

On 2017/10/24 6:14, Jaegeuk Kim wrote:
> This patch adds to call quota_intialize in f2fs_set_acl.
> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/acl.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
> index 436b3a1464d9..f6471f9d707e 100644
> --- a/fs/f2fs/acl.c
> +++ b/fs/f2fs/acl.c
> @@ -209,6 +209,10 @@ static int __f2fs_set_acl(struct inode *inode, int type,
>   int error;
>   umode_t mode = inode->i_mode;
>  
> + error = dquot_initialize(inode);
> + if (error)
> + return error;

Could you move this to f2fs_setxattr, and also add missing dquot_initialize in
unlink and rename like ext4?

Thanks,

> +
>   switch (type) {
>   case ACL_TYPE_ACCESS:
>   name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS;
>

Re: [PATCH] f2fs: show # of dirty segments via sysfs

2017-10-24 Thread Chao Yu

On 2017/10/24 16:36, Jaegeuk Kim wrote:
> This patch adds one sysfs entry to show # of dirty segments which can be
> used for gc timing by user.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu 

Thanks,

> ---
>  fs/f2fs/sysfs.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> index ca74bfdfd4eb..e09e59cc678a 100644
> --- a/fs/f2fs/sysfs.c
> +++ b/fs/f2fs/sysfs.c
> @@ -63,6 +63,13 @@ static unsigned char *__struct_ptr(struct f2fs_sb_info 
> *sbi, int struct_type)
>   return NULL;
>  }
>  
> +static ssize_t dirty_segments_show(struct f2fs_attr *a,
> + struct f2fs_sb_info *sbi, char *buf)
> +{
> + return snprintf(buf, PAGE_SIZE, "%llu\n",
> + (unsigned long long)(dirty_segments(sbi)));
> +}
> +
>  static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
>   struct f2fs_sb_info *sbi, char *buf)
>  {
> @@ -283,6 +290,7 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_enable, 
> iostat_enable);
>  F2FS_RW_ATTR(FAULT_INFO_RATE, f2fs_fault_info, inject_rate, inject_rate);
>  F2FS_RW_ATTR(FAULT_INFO_TYPE, f2fs_fault_info, inject_type, inject_type);
>  #endif
> +F2FS_GENERAL_RO_ATTR(dirty_segments);
>  F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
>  F2FS_GENERAL_RO_ATTR(features);
>  
> @@ -326,6 +334,7 @@ static struct attribute *f2fs_attrs[] = {
>   ATTR_LIST(inject_rate),
>   ATTR_LIST(inject_type),
>  #endif
> + ATTR_LIST(dirty_segments),
>   ATTR_LIST(lifetime_write_kbytes),
>   ATTR_LIST(features),
>   ATTR_LIST(reserved_blocks),
>

Re: [f2fs-dev] [PATCH 2/2] f2fs: stop all the operations by cp_error flag

2017-10-24 Thread Chao Yu

On 2017/10/24 6:14, Jaegeuk Kim wrote:
> This patch replaces to use cp_error flag instead of RDONLY for quota off.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu 

Thanks,

> ---
>  fs/f2fs/acl.c|  3 +++
>  fs/f2fs/checkpoint.c |  1 -
>  fs/f2fs/file.c   | 23 +++
>  fs/f2fs/namei.c  | 30 ++
>  fs/f2fs/super.c  |  3 +++
>  5 files changed, 59 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
> index f6471f9d707e..a9bf5151e7c2 100644
> --- a/fs/f2fs/acl.c
> +++ b/fs/f2fs/acl.c
> @@ -254,6 +254,9 @@ static int __f2fs_set_acl(struct inode *inode, int type,
>  
>  int f2fs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  {
> + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode
> + return -EIO;
> +
>   return __f2fs_set_acl(inode, type, acl, NULL);
>  }
>  
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index 201608281681..6b52d4b66c7b 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -29,7 +29,6 @@ struct kmem_cache *inode_entry_slab;
>  void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io)
>  {
>   set_ckpt_flags(sbi, CP_ERROR_FLAG);
> - sbi->sb->s_flags |= MS_RDONLY;
>   if (!end_io)
>   f2fs_flush_merged_writes(sbi);
>  }
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 56232a72d2a3..0e09b9f02dc5 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -53,6 +53,9 @@ static int f2fs_vm_page_mkwrite(struct vm_fault *vmf)
>   struct dnode_of_data dn;
>   int err;
>  
> + if (unlikely(f2fs_cp_error(sbi)))
> + return -EIO;
> +
>   sb_start_pagefault(inode->i_sb);
>  
>   f2fs_bug_on(sbi, f2fs_has_inline_data(inode));
> @@ -310,6 +313,8 @@ static int f2fs_do_sync_file(struct file *file, loff_t 
> start, loff_t end,
>  
>  int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
>  {
> + if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(file)
> + return -EIO;
>   return f2fs_do_sync_file(file, start, end, datasync, false);
>  }
>  
> @@ -446,6 +451,9 @@ static int f2fs_file_mmap(struct file *file, struct 
> vm_area_struct *vma)
>   struct inode *inode = file_inode(file);
>   int err;
>  
> + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode
> + return -EIO;
> +
>   /* we don't need to use inline_data strictly */
>   err = f2fs_convert_inline_inode(inode);
>   if (err)
> @@ -632,6 +640,9 @@ int f2fs_truncate(struct inode *inode)
>  {
>   int err;
>  
> + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode
> + return -EIO;
> +
>   if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
>   S_ISLNK(inode->i_mode)))
>   return 0;
> @@ -731,6 +742,9 @@ int f2fs_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   int err;
>   bool size_changed = false;
>  
> + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode
> + return -EIO;
> +
>   err = setattr_prepare(dentry, attr);
>   if (err)
>   return err;
> @@ -1459,6 +1473,9 @@ static long f2fs_fallocate(struct file *file, int mode,
>   struct inode *inode = file_inode(file);
>   long ret = 0;
>  
> + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode
> + return -EIO;
> +
>   /* f2fs only support ->fallocate for regular file */
>   if (!S_ISREG(inode->i_mode))
>   return -EINVAL;
> @@ -2637,6 +2654,9 @@ static int f2fs_ioc_fssetxattr(struct file *filp, 
> unsigned long arg)
>  
>  long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>  {
> + if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp)
> + return -EIO;
> +
>   switch (cmd) {
>   case F2FS_IOC_GETFLAGS:
>   return f2fs_ioc_getflags(filp, arg);
> @@ -2694,6 +2714,9 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, 
> struct iov_iter *from)
>   struct blk_plug plug;
>   ssize_t ret;
>  
> + if (unlikely(f2fs_cp_error(F2FS_I_SB(inode
> + return -EIO;
> +
>   inode_lock(inode);
>   ret = generic_write_checks(iocb, from);
>   if (ret > 0) {
> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
> index e6f86d5d97b9..944f7a6940b6 100644
> --- a/fs/f2fs/namei.c
> +++ b/fs/f2fs/namei.c
> @@ -183,6 +183,9 @@ static int f2fs_create(struct inode *dir, struct dentry 
> *dentry, umode_t mode,
>   nid_t ino = 0;
>   int err;
>

Re: [PATCH v2 6/6] f2fs: give up CP_TRIMMED_FLAG if it drops discards

2017-10-24 Thread Chao Yu

On 2017/10/24 20:46, Jaegeuk Kim wrote:
> On 10/24, Chao Yu wrote:
>> Hi Jaegeuk,
>>
>> On 2017/10/4 9:08, Chao Yu wrote:
>>> From: Chao Yu 
>>>
>>> In ->umount, once we drop remained discard entries, we should not
>>> set CP_TRIMMED_FLAG with another checkpoint.
>>>
>>> Signed-off-by: Chao Yu 
>>> ---
>>> v2:
>>> - rebase on last codes of Jaegeuk's dev-test branch.
>>>  fs/f2fs/f2fs.h|  2 +-
>>>  fs/f2fs/segment.c | 15 +++
>>>  fs/f2fs/super.c   |  5 +++--
>>>  3 files changed, 15 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index f274805e231d..c85f49c41003 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -2565,7 +2565,7 @@ void init_discard_policy(struct discard_policy 
>>> *dpolicy, int discard_type,
>>> unsigned int granularity);
>>>  void refresh_sit_entry(struct f2fs_sb_info *sbi, block_t old, block_t new);
>>>  void stop_discard_thread(struct f2fs_sb_info *sbi);
>>> -void f2fs_wait_discard_bios(struct f2fs_sb_info *sbi);
>>> +bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi);
>>>  void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control 
>>> *cpc);
>>>  void release_discard_addrs(struct f2fs_sb_info *sbi);
>>>  int npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 4a108321233d..bfbcff8339c5 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -1196,12 +1196,13 @@ static int __issue_discard_cmd(struct f2fs_sb_info 
>>> *sbi,
>>> return issued;
>>>  }
>>>  
>>> -static void __drop_discard_cmd(struct f2fs_sb_info *sbi)
>>> +static bool __drop_discard_cmd(struct f2fs_sb_info *sbi)
>>>  {
>>> struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
>>> struct list_head *pend_list;
>>> struct discard_cmd *dc, *tmp;
>>> int i;
>>> +   bool dropped = false;
>>>  
>>> mutex_lock(&dcc->cmd_lock);
>>> for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
>>> @@ -1209,9 +1210,12 @@ static void __drop_discard_cmd(struct f2fs_sb_info 
>>> *sbi)
>>> list_for_each_entry_safe(dc, tmp, pend_list, list) {
>>> f2fs_bug_on(sbi, dc->state != D_PREP);
>>> __remove_discard_cmd(sbi, dc);
>>> +   dropped = true;
>>> }
>>> }
>>> mutex_unlock(&dcc->cmd_lock);
>>> +
>>> +   return dropped;
>>>  }
>>>  
>>>  static void __wait_one_discard_bio(struct f2fs_sb_info *sbi,
>>> @@ -1306,15 +1310,18 @@ void stop_discard_thread(struct f2fs_sb_info *sbi)
>>>  }
>>>  
>>>  /* This comes from f2fs_put_super */
>>> -void f2fs_wait_discard_bios(struct f2fs_sb_info *sbi)
>>> +bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi)
>>>  {
>>> struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
>>> struct discard_policy dpolicy;
>>> +   bool dropped;
>>>  
>>> init_discard_policy(&dpolicy, DPOLICY_UMOUNT, dcc->discard_granularity);
>>> __issue_discard_cmd(sbi, &dpolicy);
>>> -   __drop_discard_cmd(sbi);
>>> +   dropped = __drop_discard_cmd(sbi);
>>> __wait_all_discard_cmd(sbi, &dpolicy);
>>> +
>>> +   return dropped;
>>>  }
>>>  
>>>  static int issue_discard_thread(void *data)
>>> @@ -1659,7 +1666,7 @@ void init_discard_policy(struct discard_policy 
>>> *dpolicy,
>>> dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME;
>>> dpolicy->max_requests = DEF_MAX_DISCARD_REQUEST;
>>> dpolicy->io_aware_gran = MAX_PLIST_NUM;
>>> -   dpolicy->io_aware = false;
>>> +   dpolicy->io_aware = true;
>>
>> I notice this change should not belong to this patch, could you please help 
>> to
>> move this into "f2fs: split discard policy" in your branch?
> 
> Yup, done. Could you check it in dev-test?

I didn't find it, did you forget to push to that branch?

Thanks,

> 
> Thanks,
> 
>>
>> Thanks,
>>
>>> } else if (discard_type == DPOLICY_FSTRIM) {
>>> dpolicy->max_requests = DEF_MAX_DISCARD_REQUEST;
>>> dpolicy->io_aware_gran = MAX_PLIST_NUM;
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index a13269d1a1f0..1d68c18a487b 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -807,6 +807,7 @@ static void f2fs_put_super(struct super_block *sb)
>>>  {
>>> struct f2fs_sb_info *sbi = F2FS_SB(sb);
>>> int i;
>>> +   bool dropped;
>>>  
>>> f2fs_quota_off_umount(sb);
>>>  
>>> @@ -827,9 +828,9 @@ static void f2fs_put_super(struct super_block *sb)
>>> }
>>>  
>>> /* be sure to wait for any on-going discard commands */
>>> -   f2fs_wait_discard_bios(sbi, true);
>>> +   dropped = f2fs_wait_discard_bios(sbi);
>>>  
>>> -   if (f2fs_discard_en(sbi) && !sbi->discard_blks) {
>>> +   if (f2fs_discard_en(sbi) && !sbi->discard_blks && !dropped) {
>>> struct cp_control cpc = {
>>> .reason = CP_UMOUNT | CP_TRIMMED,
>>> };
>>>

Re: [PATCH v2 6/6] f2fs: give up CP_TRIMMED_FLAG if it drops discards

2017-10-24 Thread Chao Yu

On 2017/10/25 13:45, Jaegeuk Kim wrote:
> On 10/24, Chao Yu wrote:
>> On 2017/10/24 20:46, Jaegeuk Kim wrote:
>>> On 10/24, Chao Yu wrote:
>>>> Hi Jaegeuk,
>>>>
>>>> On 2017/10/4 9:08, Chao Yu wrote:
>>>>> From: Chao Yu 
>>>>>
>>>>> In ->umount, once we drop remained discard entries, we should not
>>>>> set CP_TRIMMED_FLAG with another checkpoint.
>>>>>
>>>>> Signed-off-by: Chao Yu 
>>>>> ---
>>>>> v2:
>>>>> - rebase on last codes of Jaegeuk's dev-test branch.
>>>>>  fs/f2fs/f2fs.h|  2 +-
>>>>>  fs/f2fs/segment.c | 15 +++
>>>>>  fs/f2fs/super.c   |  5 +++--
>>>>>  3 files changed, 15 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>> index f274805e231d..c85f49c41003 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -2565,7 +2565,7 @@ void init_discard_policy(struct discard_policy 
>>>>> *dpolicy, int discard_type,
>>>>>   unsigned int granularity);
>>>>>  void refresh_sit_entry(struct f2fs_sb_info *sbi, block_t old, block_t 
>>>>> new);
>>>>>  void stop_discard_thread(struct f2fs_sb_info *sbi);
>>>>> -void f2fs_wait_discard_bios(struct f2fs_sb_info *sbi);
>>>>> +bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi);
>>>>>  void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control 
>>>>> *cpc);
>>>>>  void release_discard_addrs(struct f2fs_sb_info *sbi);
>>>>>  int npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
>>>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>>>> index 4a108321233d..bfbcff8339c5 100644
>>>>> --- a/fs/f2fs/segment.c
>>>>> +++ b/fs/f2fs/segment.c
>>>>> @@ -1196,12 +1196,13 @@ static int __issue_discard_cmd(struct 
>>>>> f2fs_sb_info *sbi,
>>>>>   return issued;
>>>>>  }
>>>>>  
>>>>> -static void __drop_discard_cmd(struct f2fs_sb_info *sbi)
>>>>> +static bool __drop_discard_cmd(struct f2fs_sb_info *sbi)
>>>>>  {
>>>>>   struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
>>>>>   struct list_head *pend_list;
>>>>>   struct discard_cmd *dc, *tmp;
>>>>>   int i;
>>>>> + bool dropped = false;
>>>>>  
>>>>>   mutex_lock(&dcc->cmd_lock);
>>>>>   for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
>>>>> @@ -1209,9 +1210,12 @@ static void __drop_discard_cmd(struct f2fs_sb_info 
>>>>> *sbi)
>>>>>   list_for_each_entry_safe(dc, tmp, pend_list, list) {
>>>>>   f2fs_bug_on(sbi, dc->state != D_PREP);
>>>>>   __remove_discard_cmd(sbi, dc);
>>>>> + dropped = true;
>>>>>   }
>>>>>   }
>>>>>   mutex_unlock(&dcc->cmd_lock);
>>>>> +
>>>>> + return dropped;
>>>>>  }
>>>>>  
>>>>>  static void __wait_one_discard_bio(struct f2fs_sb_info *sbi,
>>>>> @@ -1306,15 +1310,18 @@ void stop_discard_thread(struct f2fs_sb_info *sbi)
>>>>>  }
>>>>>  
>>>>>  /* This comes from f2fs_put_super */
>>>>> -void f2fs_wait_discard_bios(struct f2fs_sb_info *sbi)
>>>>> +bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi)
>>>>>  {
>>>>>   struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
>>>>>   struct discard_policy dpolicy;
>>>>> + bool dropped;
>>>>>  
>>>>>   init_discard_policy(&dpolicy, DPOLICY_UMOUNT, dcc->discard_granularity);
>>>>>   __issue_discard_cmd(sbi, &dpolicy);
>>>>> - __drop_discard_cmd(sbi);
>>>>> + dropped = __drop_discard_cmd(sbi);
>>>>>   __wait_all_discard_cmd(sbi, &dpolicy);
>>>>> +
>>>>> + return dropped;
>>>>>  }
>>>>>  
>>>>>  static int issue_discard_thread(void *data)
>>>>> @@ -1659,7 +1666,7 @@ void init_discard_policy(struct discard_policy 
>>>>> *dpolicy,
>>>>>   dpolicy->m

Re: [PATCH 1/2] f2fs: add missing quota_initialize in f2fs_set_acl

2017-10-24 Thread Chao Yu

On 2017/10/25 13:44, Jaegeuk Kim wrote:
> On 10/24, Chao Yu wrote:
>> On 2017/10/24 6:14, Jaegeuk Kim wrote:
>>> This patch adds to call quota_intialize in f2fs_set_acl.
>>>
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/acl.c | 4 
>>>  1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
>>> index 436b3a1464d9..f6471f9d707e 100644
>>> --- a/fs/f2fs/acl.c
>>> +++ b/fs/f2fs/acl.c
>>> @@ -209,6 +209,10 @@ static int __f2fs_set_acl(struct inode *inode, int 
>>> type,
>>> int error;
>>> umode_t mode = inode->i_mode;
>>>  
>>> +   error = dquot_initialize(inode);
>>> +   if (error)
>>> +   return error;
>>
>> Could you move this to f2fs_setxattr, and also add missing dquot_initialize 
>> in
>> unlink and rename like ext4?
> 
> I've checked that f2fs_unlink and f2fs_rename are calling dquot_initialize().

ext4_unlink:

retval = dquot_initialize(dir);
if (retval)
return retval;
retval = dquot_initialize(d_inode(dentry));
if (retval)
return retval;

f2fs_unlink:

err = dquot_initialize(dir);
if (err)
return err;

ext4_rename

retval = dquot_initialize(old.dir);
if (retval)
return retval;
retval = dquot_initialize(new.dir);
if (retval)
return retval;

/* Initialize quotas before so that eventual writes go
 * in separate transaction */
if (new.inode) {
retval = dquot_initialize(new.inode);
if (retval)
return retval;
}

f2fs_rename

err = dquot_initialize(old_dir);
if (err)
goto out;

err = dquot_initialize(new_dir);
if (err)
goto out;

ext4 call one more dquot_initialize than f2fs, I didn't look into this in
detail, but it's better to check that. :)

Thanks,
> 
> Thanks,
> 
>>
>> Thanks,
>>
>>> +
>>> switch (type) {
>>> case ACL_TYPE_ACCESS:
>>> name_index = F2FS_XATTR_INDEX_POSIX_ACL_ACCESS;
>>>
> 
> .
>

[PATCH] f2fs: fix to keep backward compatibility of flexible inline xattr feature

2017-10-25 Thread Chao Yu

Previously, in inode layout, we will always reserve 200 bytes for inline
xattr space no matter the inode enables inline xattr feature or not, due
to this reason, max inline size of inode is fixed, but now, if inline
xattr is not enabled, max inline size of inode will be enlarged by 200
bytes, for regular and symlink inode, it will be safe to reuse resevered
space as they are all zero, but for directory, we need to keep the
reservation for stablizing directory structure.

Reported-by: Sheng Yong 
Signed-off-by: Chao Yu 
---
 fs/f2fs/f2fs.h  |  5 +
 fs/f2fs/inode.c | 15 ---
 fs/f2fs/namei.c |  2 ++
 3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2af1d31ae74b..7ddd0d085e3b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -2393,10 +2393,7 @@ static inline int get_extra_isize(struct inode *inode)
 static inline int f2fs_sb_has_flexible_inline_xattr(struct super_block *sb);
 static inline int get_inline_xattr_addrs(struct inode *inode)
 {
-   if (!f2fs_has_inline_xattr(inode))
-   return 0;
-   if (!f2fs_sb_has_flexible_inline_xattr(F2FS_I_SB(inode)->sb))
-   return DEFAULT_INLINE_XATTR_ADDRS;
+
return F2FS_I(inode)->i_inline_xattr_size;
 }
 
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index bb876737e653..7f31b22c9efa 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -232,10 +232,19 @@ static int do_read_inode(struct inode *inode)
fi->i_extra_isize = f2fs_has_extra_attr(inode) ?
le16_to_cpu(ri->i_extra_isize) : 0;
 
-   if (!f2fs_has_inline_xattr(inode))
-   fi->i_inline_xattr_size = 0;
-   else if (f2fs_sb_has_flexible_inline_xattr(sbi->sb))
+   /*
+* Previously, we will always reserve DEFAULT_INLINE_XATTR_ADDRS size
+* space for inline xattr datas, if inline xattr is not enabled, we
+* can expect all zero in reserved area, so for regular or symlink,
+* it will be safe to reuse reserved area, but for directory, we
+* should keep the reservation for stablizing directory structure.
+*/
+   if (f2fs_has_extra_attr(inode) &&
+   f2fs_sb_has_flexible_inline_xattr(sbi->sb))
fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
+   else if (!f2fs_has_inline_xattr(inode) &&
+   (S_ISREG(inode->i_mode) || S_ISLNK(inode->i_mode)))
+   fi->i_inline_xattr_size = 0;
else
fi->i_inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
 
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index e6f86d5d97b9..a1c56a14c191 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -91,6 +91,8 @@ static struct inode *f2fs_new_inode(struct inode *dir, 
umode_t mode)
f2fs_sb_has_flexible_inline_xattr(sbi->sb) &&
f2fs_has_inline_xattr(inode))
F2FS_I(inode)->i_inline_xattr_size = sbi->inline_xattr_size;
+   else
+   F2FS_I(inode)->i_inline_xattr_size = 0;
 
if (test_opt(sbi, INLINE_DATA) && f2fs_may_inline_data(inode))
set_inode_flag(inode, FI_INLINE_DATA);
-- 
2.13.1.388.g69e6b9b4f4a9

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3036 matches

Mail list logo