On 2017/11/6 15:09, Fan Li wrote:
> 
> 
>> -----Original Message-----
>> From: Chao Yu [mailto:c...@kernel.org]
>> Sent: Friday, November 03, 2017 9:16 PM
>> To: Fan Li; 'Chao Yu'; 'Jaegeuk Kim'
>> Cc: linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net
>> Subject: Re: [f2fs-dev] [PATCH RESEND] f2fs: modify the procedure of scan 
>> free nid
>>
>> On 2017/11/3 18:29, Fan Li wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Chao Yu [mailto:yuch...@huawei.com]
>>>> Sent: Friday, November 03, 2017 4:54 PM
>>>> To: Fan Li; 'Chao Yu'; 'Jaegeuk Kim'
>>>> Cc: linux-kernel@vger.kernel.org;
>>>> linux-f2fs-de...@lists.sourceforge.net
>>>> Subject: Re: [f2fs-dev] [PATCH RESEND] f2fs: modify the procedure of
>>>> scan free nid
>>>>
>>>> On 2017/11/3 15:31, Fan Li wrote:
>>>>> In current version, we preserve 8 pages of nat blocks as free nids,
>>>>> we build bitmaps for it and use them to allocate nids until its
>>>>> number drops below NAT_ENTRY_PER_BLOCK.
>>>>>
>>>>> After that, we have a problem, scan_free_nid_bits will scan the same
>>>>> 8 pages trying to find more free nids, but in most cases the free
>>>>> nids in these bitmaps are already in free list, scan them won't get
>>>>> us any new nids.
>>>>> Further more, after scan_free_nid_bits, the scan is over if
>>>>> nid_cnt[FREE_NID] != 0.
>>>>> It causes that we scan the same pages over and over again, and no
>>>>> new free nids are found until nid_cnt[FREE_NID]==0. While the
>>>>> scanned pages increase, the problem grows worse.
>>>>>
>>>>> This patch mark the range where new free nids could exist and keep
>>>>> scan for free nids until nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK.
>>>>> The new vairable first_scan_block marks the start of the range, it's
>>>>> initialized with NEW_ADDR, which means all free nids before
>>>>> next_scan_nid are already in free list; and use next_scan_nid as the
>>>>> end of the range since all free nids which are scanned in
>>>>> scan_free_nid_bits must be smaller next_scan_nid.
>>>>
>>>> Think over again, IMO, we can add an variable for stating total count
>>>> of free nids in bitamp, if there is no free nid, just
>>> skipping scanning all
>>>> existed bitmap.
>>>>
>>>> And if there is only few free nid scattered in bitmap, the cost will
>>>> be limited because we will skip scanning
>>> nm_i::free_nid_bitmap if
>>>> nm_i::free_nid_count is zero. Once we find one free nid, let's skip out.
>>>>
>>>> Since there shouldn't be very heavy overhead for CPU during traveling
>>>> nm_i::nat_block_bitmap, I expect below change could be more simple for 
>>>> maintaining and being with the same effect.
>>>>
>>>> How do you think?
>>>>
>>>
>>> I think if you need this to work, check total_bitmap_free_nid may not be 
>>> sufficient enough.
>>> The problem this patch presents is  that even all the free nids are
>>> already in the free list, we still scan all the pages.
>>> The scan proceeds once free nid count is below NAT_ENTRY_PER_BLOCK.
>>> So in most cases, there are still free nids in the bitmap during the
>>> scan, and current codes will check every one of them to see if they are 
>>> actually in free list.
>>> If only check total_bitmap_free_nid == 0 won't take this overhead away.
>>
>> Oh, you could see that, we have added free_nid_count in each NAT block's 
>> free_nid_bitmap bitmap, before scan the bitmap, we will
> make
>> sure there is at least one free nid.
>>
>> scan_free_nid_bits()
>>      for (i = 0; i < nm_i->nat_blocks; i++) {
>>              if (!test_bit_le(i, nm_i->nat_block_bitmap))
>>                      continue;
>>              if (!nm_i->free_nid_count[i])
>>                      continue;
> Do you mean  free_nid_count here?
> I thought free_nid_count only represents how many nats are marked free in 
> bitmap of one block.

Right.

> 
> To my understanding, even a nat is already in the free list, it will still 
> have a bit marked as free in 
> free_nid_bitmap and a count in free_nid_count.
> That means if free_nid_count != 0, and there are marked bits in the bitmap, 
> the free nats in this
> block could still  be all in the free list.

Yes.

> The purpose of scan is to find new nats and add them to free list, go through 
> the nats which are
> already in the free list isn't what we want.
> And in xfstest, under most cases scan_free_nid_bits runs, all free nats are 
> indeed in the free list.

Could you test that diff to check whether we will scan free_nid_bitmap
which indicates there are free nids, but can not add any of them into
free list due to they already are there.

-               if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
+               if (nm_i->nid_cnt[FREE_NID])

Due to above change, I expect that will be eliminated...

Thanks,

> 
>>              for (idx = 0; idx < NAT_ENTRY_PER_BLOCK; idx++) {
>>                      nid_t nid;
>>
>>                      if (!test_bit_le(idx, nm_i->free_nid_bitmap[i]))
>>                              continue;
>>
>>                      nid = i * NAT_ENTRY_PER_BLOCK + idx;
>>                      add_free_nid(sbi, nid, true);
>>
>>                      if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
>>                              goto out;
>>              }
>>      }
>>
>> And In that diff, we have changed the exiting condition, once we have 
>> grabbed one free nid, stop building.
>>
>>>> -                  if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
>>>> +                  if (nm_i->nid_cnt[FREE_NID])
>>>>                            goto out;
>>
>>
>> So with that simple change, only overhead here is we need to travel 
>> nat_block_bitmap all the time when total_bitmap_free_nid is
> nonzero,
>> but I think that would not be an critical issue here.
>>
>>>
>>> I considered a lot of ways to fix this problem before I submit this
>>> patch, One of my idea is quite similar to yours, but I use "if
>>> (total_bitmap_free_nid == nm_i->nid_cnt[FREE_NID])" to decide whether
>>> skip or not.
>>
>> Hmm.. can we confirm that if there is no free nid in all bitmap, we can skip 
>> the unneeded scanning? Anyway, I think you can write
> a patch to
>> fix that first?
>> More like that diff.
>>
>>> If you insist, I can submit this simpler one instead, but some follow
>>> upgrade would be unavailable, for example, use smaller granularity for
>>> tracking last-scanned-position that we talked about.> I know sometimes
>>> I can be obsessed with the performance, I usually choose the faster
>>> way over simpler ones. If you think it's too much, please tell me, I'm
>>> sure we can find some middle ground.
>>
>> Yup, I think that's why you're the expert of algorithm, I have no doubt 
>> about that. :)
>>
>> IMO, instead of reducing cpu overhead without simple change, I prefer the 
>> one can reducing IO, e.g. if NAT block contains maximum
> count
>> free nids, we can load these nids first, after they were been allocated, in 
>> checkpoint, we can write these nat entries into one
> NAT block. On
>> the contrary, if we load free nids with same count from different NAT 
>> blocks, in checkpoint, maybe we will write them into more
> NAT blocks.
>>
>> Thanks,
>>
>>>
>>> Thank you
>>>
>>>
>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index
>>>> cb3f10bc8723..238d95e89dec 100644
>>>> --- a/fs/f2fs/f2fs.h
>>>> +++ b/fs/f2fs/f2fs.h
>>>> @@ -729,6 +729,7 @@ struct f2fs_nm_info {
>>>>    unsigned char (*free_nid_bitmap)[NAT_ENTRY_BITMAP_SIZE];
>>>>    unsigned char *nat_block_bitmap;
>>>>    unsigned short *free_nid_count; /* free nid count of NAT block */
>>>> +  unsigned int total_bitmap_free_nid;     /* total free nid count in 
>>>> bitmap */
>>>>
>>>>    /* for checkpoint */
>>>>    char *nat_bitmap;               /* NAT bitmap pointer */
>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index
>>>> fef5c68886b1..e4861908a396 100644
>>>> --- a/fs/f2fs/node.c
>>>> +++ b/fs/f2fs/node.c
>>>> @@ -1911,10 +1911,13 @@ static void update_free_nid_bitmap(struct 
>>>> f2fs_sb_info *sbi, nid_t nid,
>>>>    else
>>>>            __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
>>>>
>>>> -  if (set)
>>>> +  if (set) {
>>>>            nm_i->free_nid_count[nat_ofs]++;
>>>> -  else if (!build)
>>>> +          nm_i->total_bitmap_free_nid++;
>>>> +  } else if (!build) {
>>>>            nm_i->free_nid_count[nat_ofs]--;
>>>> +          nm_i->total_bitmap_free_nid--;
>>>> +  }
>>>>  }
>>>>
>>>>  static void scan_nat_page(struct f2fs_sb_info *sbi, @@ -1958,6
>>>> +1961,9 @@ static void scan_free_nid_bits(struct f2fs_sb_info
>>> *sbi)
>>>>
>>>>    down_read(&nm_i->nat_tree_lock);
>>>>
>>>> +  if (!nm_i->total_bitmap_free_nid)
>>>> +          goto out;
>>>> +
>>>>    for (i = 0; i < nm_i->nat_blocks; i++) {
>>>>            if (!test_bit_le(i, nm_i->nat_block_bitmap))
>>>>                    continue;
>>>> @@ -1972,7 +1978,7 @@ static void scan_free_nid_bits(struct f2fs_sb_info 
>>>> *sbi)
>>>>                    nid = i * NAT_ENTRY_PER_BLOCK + idx;
>>>>                    add_free_nid(sbi, nid, true);
>>>>
>>>> -                  if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
>>>> +                  if (nm_i->nid_cnt[FREE_NID])
>>>>                            goto out;
>>>>            }
>>>>    }
>>>>
>>>> Thanks,
>>>>
>>>>>
>>>>> Signed-off-by: Fan li <fanofcode...@samsung.com>
>>>>> ---
>>>>>  fs/f2fs/f2fs.h |  1 +
>>>>>  fs/f2fs/node.c | 42 +++++++++++++++++++++++++++++++++++-------
>>>>>  2 files changed, 36 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e0ef31c..ae1cf91
>>>>> 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -705,6 +705,7 @@ struct f2fs_nm_info {
>>>>>   nid_t max_nid;                  /* maximum possible node ids */
>>>>>   nid_t available_nids;           /* # of available node ids */
>>>>>   nid_t next_scan_nid;            /* the next nid to be scanned */
>>>>> + block_t first_scan_block;       /* the first NAT block to be scanned */
>>>>>   unsigned int ram_thresh;        /* control the memory footprint */
>>>>>   unsigned int ra_nid_pages;      /* # of nid pages to be readaheaded */
>>>>>   unsigned int dirty_nats_ratio;  /* control dirty nats ratio threshold */
>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d0d1be..f921e0c
>>>>> 100644
>>>>> --- a/fs/f2fs/node.c
>>>>> +++ b/fs/f2fs/node.c
>>>>> @@ -1812,7 +1812,7 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, 
>>>>> nid_t nid, bool build)
>>>>>   struct f2fs_nm_info *nm_i = NM_I(sbi);
>>>>>   struct free_nid *i, *e;
>>>>>   struct nat_entry *ne;
>>>>> - int err = -EINVAL;
>>>>> + int need_free = 1;
>>>>>   bool ret = false;
>>>>>
>>>>>   /* 0 nid should not be used */
>>>>> @@ -1863,13 +1863,25 @@ static bool add_free_nid(struct f2fs_sb_info 
>>>>> *sbi, nid_t nid, bool build)
>>>>>           }
>>>>>   }
>>>>>   ret = true;
>>>>> - err = __insert_free_nid(sbi, i, FREE_NID);
>>>>> + need_free = __insert_free_nid(sbi, i, FREE_NID);
>>>>>  err_out:
>>>>>   spin_unlock(&nm_i->nid_list_lock);
>>>>>   radix_tree_preload_end();
>>>>>  err:
>>>>> - if (err)
>>>>> + if (need_free)
>>>>>           kmem_cache_free(free_nid_slab, i);
>>>>> + /*
>>>>> +  * For nid that should be free but not in the free
>>>>> +  * structure, update the scan range in hope of adding
>>>>> +  * it in the next scan.
>>>>> +  */
>>>>> + if (!ret || need_free < 0) {
>>>>> +         block_t tmp_block = NAT_BLOCK_OFFSET(nid);
>>>>> +
>>>>> +         if (tmp_block < nm_i->first_scan_block)
>>>>> +                 nm_i->first_scan_block = tmp_block;
>>>>> + }
>>>>> +
>>>>>   return ret;
>>>>>  }
>>>>>
>>>>> @@ -1950,10 +1962,17 @@ static void scan_free_nid_bits(struct 
>>>>> f2fs_sb_info *sbi)
>>>>>   struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
>>>>>   struct f2fs_journal *journal = curseg->journal;
>>>>>   unsigned int i, idx;
>>>>> + unsigned int max_blocks = NAT_BLOCK_OFFSET(nm_i->next_scan_nid);
>>>>>
>>>>> - down_read(&nm_i->nat_tree_lock);
>>>>> + /* every free nid in blocks scanned previously is in the free list */
>>>>> + if (nm_i->first_scan_block == NEW_ADDR)
>>>>> +         return;
>>>>>
>>>>> - for (i = 0; i < nm_i->nat_blocks; i++) {
>>>>> + if (max_blocks == 0)
>>>>> +         max_blocks = nm_i->nat_blocks;
>>>>> +
>>>>> + down_read(&nm_i->nat_tree_lock);
>>>>> + for (i = nm_i->first_scan_block; i < max_blocks; i++) {
>>>>>           if (!test_bit_le(i, nm_i->nat_block_bitmap))
>>>>>                   continue;
>>>>>           if (!nm_i->free_nid_count[i])
>>>>> @@ -1967,10 +1986,13 @@ static void scan_free_nid_bits(struct 
>>>>> f2fs_sb_info *sbi)
>>>>>                   nid = i * NAT_ENTRY_PER_BLOCK + idx;
>>>>>                   add_free_nid(sbi, nid, true);
>>>>>
>>>>> -                 if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
>>>>> +                 if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) {
>>>>> +                         nm_i->first_scan_block = i;
>>>>>                           goto out;
>>>>> +                 }
>>>>>           }
>>>>>   }
>>>>> + nm_i->first_scan_block = NEW_ADDR;
>>>>>  out:
>>>>>   down_read(&curseg->journal_rwsem);
>>>>>   for (i = 0; i < nats_in_cursum(journal); i++) { @@ -2010,7 +2032,7
>>>>> @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, 
>>>>> bool mount)
>>>>>           /* try to find free nids in free_nid_bitmap */
>>>>>           scan_free_nid_bits(sbi);
>>>>>
>>>>> -         if (nm_i->nid_cnt[FREE_NID])
>>>>> +         if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK)
>>>>>                   return;
>>>>>   }
>>>>>
>>>>> @@ -2163,6 +2185,7 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, int 
>>>>> nr_shrink)
>>>>>   struct f2fs_nm_info *nm_i = NM_I(sbi);
>>>>>   struct free_nid *i, *next;
>>>>>   int nr = nr_shrink;
>>>>> + nid_t min_nid = nm_i->max_nid;
>>>>>
>>>>>   if (nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS)
>>>>>           return 0;
>>>>> @@ -2176,11 +2199,15 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, 
>>>>> int nr_shrink)
>>>>>                           nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS)
>>>>>                   break;
>>>>>
>>>>> +         if (i->nid < min_nid)
>>>>> +                 min_nid = i->nid;
>>>>>           __remove_free_nid(sbi, i, FREE_NID);
>>>>>           kmem_cache_free(free_nid_slab, i);
>>>>>           nr_shrink--;
>>>>>   }
>>>>>   spin_unlock(&nm_i->nid_list_lock);
>>>>> + if (min_nid != nm_i->max_nid)
>>>>> +         nm_i->first_scan_block = NAT_BLOCK_OFFSET(min_nid);
>>>>>   mutex_unlock(&nm_i->build_lock);
>>>>>
>>>>>   return nr - nr_shrink;
>>>>> @@ -2674,6 +2701,7 @@ static int init_node_manager(struct f2fs_sb_info 
>>>>> *sbi)
>>>>>   init_rwsem(&nm_i->nat_tree_lock);
>>>>>
>>>>>   nm_i->next_scan_nid = le32_to_cpu(sbi->ckpt->next_free_nid);
>>>>> + nm_i->first_scan_block = NEW_ADDR;
>>>>>   nm_i->bitmap_size = __bitmap_size(sbi, NAT_BITMAP);
>>>>>   version_bitmap = __bitmap_ptr(sbi, NAT_BITMAP);
>>>>>   if (!version_bitmap)
>>>>>
>>>>
>>>
>>>
> 
> 
> 
> .
> 

Reply via email to