On 2017/11/6 18:42, Chao Yu wrote: > On 2017/11/6 15:09, Fan Li wrote: >> >> >>> -----Original Message----- >>> From: Chao Yu [mailto:c...@kernel.org] >>> Sent: Friday, November 03, 2017 9:16 PM >>> To: Fan Li; 'Chao Yu'; 'Jaegeuk Kim' >>> Cc: linux-kernel@vger.kernel.org; linux-f2fs-de...@lists.sourceforge.net >>> Subject: Re: [f2fs-dev] [PATCH RESEND] f2fs: modify the procedure of scan >>> free nid >>> >>> On 2017/11/3 18:29, Fan Li wrote: >>>> >>>> >>>>> -----Original Message----- >>>>> From: Chao Yu [mailto:yuch...@huawei.com] >>>>> Sent: Friday, November 03, 2017 4:54 PM >>>>> To: Fan Li; 'Chao Yu'; 'Jaegeuk Kim' >>>>> Cc: linux-kernel@vger.kernel.org; >>>>> linux-f2fs-de...@lists.sourceforge.net >>>>> Subject: Re: [f2fs-dev] [PATCH RESEND] f2fs: modify the procedure of >>>>> scan free nid >>>>> >>>>> On 2017/11/3 15:31, Fan Li wrote: >>>>>> In current version, we preserve 8 pages of nat blocks as free nids, >>>>>> we build bitmaps for it and use them to allocate nids until its >>>>>> number drops below NAT_ENTRY_PER_BLOCK. >>>>>> >>>>>> After that, we have a problem, scan_free_nid_bits will scan the same >>>>>> 8 pages trying to find more free nids, but in most cases the free >>>>>> nids in these bitmaps are already in free list, scan them won't get >>>>>> us any new nids. >>>>>> Further more, after scan_free_nid_bits, the scan is over if >>>>>> nid_cnt[FREE_NID] != 0. >>>>>> It causes that we scan the same pages over and over again, and no >>>>>> new free nids are found until nid_cnt[FREE_NID]==0. While the >>>>>> scanned pages increase, the problem grows worse. >>>>>> >>>>>> This patch mark the range where new free nids could exist and keep >>>>>> scan for free nids until nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK. >>>>>> The new vairable first_scan_block marks the start of the range, it's >>>>>> initialized with NEW_ADDR, which means all free nids before >>>>>> next_scan_nid are already in free list; and use next_scan_nid as the >>>>>> end of the range since all free nids which are scanned in >>>>>> scan_free_nid_bits must be smaller next_scan_nid. >>>>> >>>>> Think over again, IMO, we can add an variable for stating total count >>>>> of free nids in bitamp, if there is no free nid, just >>>> skipping scanning all >>>>> existed bitmap. >>>>> >>>>> And if there is only few free nid scattered in bitmap, the cost will >>>>> be limited because we will skip scanning >>>> nm_i::free_nid_bitmap if >>>>> nm_i::free_nid_count is zero. Once we find one free nid, let's skip out. >>>>> >>>>> Since there shouldn't be very heavy overhead for CPU during traveling >>>>> nm_i::nat_block_bitmap, I expect below change could be more simple for >>>>> maintaining and being with the same effect. >>>>> >>>>> How do you think? >>>>> >>>> >>>> I think if you need this to work, check total_bitmap_free_nid may not be >>>> sufficient enough. >>>> The problem this patch presents is that even all the free nids are >>>> already in the free list, we still scan all the pages. >>>> The scan proceeds once free nid count is below NAT_ENTRY_PER_BLOCK. >>>> So in most cases, there are still free nids in the bitmap during the >>>> scan, and current codes will check every one of them to see if they are >>>> actually in free list. >>>> If only check total_bitmap_free_nid == 0 won't take this overhead away. >>> >>> Oh, you could see that, we have added free_nid_count in each NAT block's >>> free_nid_bitmap bitmap, before scan the bitmap, we will >> make >>> sure there is at least one free nid. >>> >>> scan_free_nid_bits() >>> for (i = 0; i < nm_i->nat_blocks; i++) { >>> if (!test_bit_le(i, nm_i->nat_block_bitmap)) >>> continue; >>> if (!nm_i->free_nid_count[i]) >>> continue; >> Do you mean free_nid_count here? >> I thought free_nid_count only represents how many nats are marked free in >> bitmap of one block. > > Right. > >> >> To my understanding, even a nat is already in the free list, it will still >> have a bit marked as free in >> free_nid_bitmap and a count in free_nid_count. >> That means if free_nid_count != 0, and there are marked bits in the bitmap, >> the free nats in this >> block could still be all in the free list. > > Yes. > >> The purpose of scan is to find new nats and add them to free list, go >> through the nats which are >> already in the free list isn't what we want. >> And in xfstest, under most cases scan_free_nid_bits runs, all free nats are >> indeed in the free list. > > Could you test that diff to check whether we will scan free_nid_bitmap > which indicates there are free nids, but can not add any of them into > free list due to they already are there. > > - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) > + if (nm_i->nid_cnt[FREE_NID]) > > Due to above change, I expect that will be eliminated...
Please check and test this diff: diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 8df2ce9a1356..ff0ce3a4ceab 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -729,6 +729,7 @@ struct f2fs_nm_info { unsigned char (*free_nid_bitmap)[NAT_ENTRY_BITMAP_SIZE]; unsigned char *nat_block_bitmap; unsigned short *free_nid_count; /* free nid count of NAT block */ + unsigned int total_bitmap_free_nid; /* total free nid count in bitmap */ /* for checkpoint */ char *nat_bitmap; /* NAT bitmap pointer */ diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index fef5c68886b1..33ee1302dbe1 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1911,10 +1911,13 @@ static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid, else __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]); - if (set) + if (set) { nm_i->free_nid_count[nat_ofs]++; - else if (!build) + nm_i->total_bitmap_free_nid++; + } else if (!build) { nm_i->free_nid_count[nat_ofs]--; + nm_i->total_bitmap_free_nid--; + } } static void scan_nat_page(struct f2fs_sb_info *sbi, @@ -1958,6 +1961,9 @@ static void scan_free_nid_bits(struct f2fs_sb_info *sbi) down_read(&nm_i->nat_tree_lock); + if (!nm_i->total_bitmap_free_nid) + goto out; + for (i = 0; i < nm_i->nat_blocks; i++) { if (!test_bit_le(i, nm_i->nat_block_bitmap)) continue; @@ -1972,7 +1978,7 @@ static void scan_free_nid_bits(struct f2fs_sb_info *sbi) nid = i * NAT_ENTRY_PER_BLOCK + idx; add_free_nid(sbi, nid, true); - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) + if (nm_i->nid_cnt[FREE_NID]) goto out; } } @@ -2012,6 +2018,10 @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount) return; if (!mount) { + /* if there are free nids in list, allocate them in prior */ + if (sync && nm_i->nid_cnt[FREE_NID]) + return; + /* try to find free nids in free_nid_bitmap */ scan_free_nid_bits(sbi); Thanks, > > Thanks, > >> >>> for (idx = 0; idx < NAT_ENTRY_PER_BLOCK; idx++) { >>> nid_t nid; >>> >>> if (!test_bit_le(idx, nm_i->free_nid_bitmap[i])) >>> continue; >>> >>> nid = i * NAT_ENTRY_PER_BLOCK + idx; >>> add_free_nid(sbi, nid, true); >>> >>> if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) >>> goto out; >>> } >>> } >>> >>> And In that diff, we have changed the exiting condition, once we have >>> grabbed one free nid, stop building. >>> >>>>> - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) >>>>> + if (nm_i->nid_cnt[FREE_NID]) >>>>> goto out; >>> >>> >>> So with that simple change, only overhead here is we need to travel >>> nat_block_bitmap all the time when total_bitmap_free_nid is >> nonzero, >>> but I think that would not be an critical issue here. >>> >>>> >>>> I considered a lot of ways to fix this problem before I submit this >>>> patch, One of my idea is quite similar to yours, but I use "if >>>> (total_bitmap_free_nid == nm_i->nid_cnt[FREE_NID])" to decide whether >>>> skip or not. >>> >>> Hmm.. can we confirm that if there is no free nid in all bitmap, we can >>> skip the unneeded scanning? Anyway, I think you can write >> a patch to >>> fix that first? >>> More like that diff. >>> >>>> If you insist, I can submit this simpler one instead, but some follow >>>> upgrade would be unavailable, for example, use smaller granularity for >>>> tracking last-scanned-position that we talked about.> I know sometimes >>>> I can be obsessed with the performance, I usually choose the faster >>>> way over simpler ones. If you think it's too much, please tell me, I'm >>>> sure we can find some middle ground. >>> >>> Yup, I think that's why you're the expert of algorithm, I have no doubt >>> about that. :) >>> >>> IMO, instead of reducing cpu overhead without simple change, I prefer the >>> one can reducing IO, e.g. if NAT block contains maximum >> count >>> free nids, we can load these nids first, after they were been allocated, in >>> checkpoint, we can write these nat entries into one >> NAT block. On >>> the contrary, if we load free nids with same count from different NAT >>> blocks, in checkpoint, maybe we will write them into more >> NAT blocks. >>> >>> Thanks, >>> >>>> >>>> Thank you >>>> >>>> >>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index >>>>> cb3f10bc8723..238d95e89dec 100644 >>>>> --- a/fs/f2fs/f2fs.h >>>>> +++ b/fs/f2fs/f2fs.h >>>>> @@ -729,6 +729,7 @@ struct f2fs_nm_info { >>>>> unsigned char (*free_nid_bitmap)[NAT_ENTRY_BITMAP_SIZE]; >>>>> unsigned char *nat_block_bitmap; >>>>> unsigned short *free_nid_count; /* free nid count of NAT block */ >>>>> + unsigned int total_bitmap_free_nid; /* total free nid count in >>>>> bitmap */ >>>>> >>>>> /* for checkpoint */ >>>>> char *nat_bitmap; /* NAT bitmap pointer */ >>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index >>>>> fef5c68886b1..e4861908a396 100644 >>>>> --- a/fs/f2fs/node.c >>>>> +++ b/fs/f2fs/node.c >>>>> @@ -1911,10 +1911,13 @@ static void update_free_nid_bitmap(struct >>>>> f2fs_sb_info *sbi, nid_t nid, >>>>> else >>>>> __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]); >>>>> >>>>> - if (set) >>>>> + if (set) { >>>>> nm_i->free_nid_count[nat_ofs]++; >>>>> - else if (!build) >>>>> + nm_i->total_bitmap_free_nid++; >>>>> + } else if (!build) { >>>>> nm_i->free_nid_count[nat_ofs]--; >>>>> + nm_i->total_bitmap_free_nid--; >>>>> + } >>>>> } >>>>> >>>>> static void scan_nat_page(struct f2fs_sb_info *sbi, @@ -1958,6 >>>>> +1961,9 @@ static void scan_free_nid_bits(struct f2fs_sb_info >>>> *sbi) >>>>> >>>>> down_read(&nm_i->nat_tree_lock); >>>>> >>>>> + if (!nm_i->total_bitmap_free_nid) >>>>> + goto out; >>>>> + >>>>> for (i = 0; i < nm_i->nat_blocks; i++) { >>>>> if (!test_bit_le(i, nm_i->nat_block_bitmap)) >>>>> continue; >>>>> @@ -1972,7 +1978,7 @@ static void scan_free_nid_bits(struct f2fs_sb_info >>>>> *sbi) >>>>> nid = i * NAT_ENTRY_PER_BLOCK + idx; >>>>> add_free_nid(sbi, nid, true); >>>>> >>>>> - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) >>>>> + if (nm_i->nid_cnt[FREE_NID]) >>>>> goto out; >>>>> } >>>>> } >>>>> >>>>> Thanks, >>>>> >>>>>> >>>>>> Signed-off-by: Fan li <fanofcode...@samsung.com> >>>>>> --- >>>>>> fs/f2fs/f2fs.h | 1 + >>>>>> fs/f2fs/node.c | 42 +++++++++++++++++++++++++++++++++++------- >>>>>> 2 files changed, 36 insertions(+), 7 deletions(-) >>>>>> >>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index e0ef31c..ae1cf91 >>>>>> 100644 >>>>>> --- a/fs/f2fs/f2fs.h >>>>>> +++ b/fs/f2fs/f2fs.h >>>>>> @@ -705,6 +705,7 @@ struct f2fs_nm_info { >>>>>> nid_t max_nid; /* maximum possible node ids */ >>>>>> nid_t available_nids; /* # of available node ids */ >>>>>> nid_t next_scan_nid; /* the next nid to be scanned */ >>>>>> + block_t first_scan_block; /* the first NAT block to be >>>>>> scanned */ >>>>>> unsigned int ram_thresh; /* control the memory footprint >>>>>> */ >>>>>> unsigned int ra_nid_pages; /* # of nid pages to be >>>>>> readaheaded */ >>>>>> unsigned int dirty_nats_ratio; /* control dirty nats ratio >>>>>> threshold */ >>>>>> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 3d0d1be..f921e0c >>>>>> 100644 >>>>>> --- a/fs/f2fs/node.c >>>>>> +++ b/fs/f2fs/node.c >>>>>> @@ -1812,7 +1812,7 @@ static bool add_free_nid(struct f2fs_sb_info *sbi, >>>>>> nid_t nid, bool build) >>>>>> struct f2fs_nm_info *nm_i = NM_I(sbi); >>>>>> struct free_nid *i, *e; >>>>>> struct nat_entry *ne; >>>>>> - int err = -EINVAL; >>>>>> + int need_free = 1; >>>>>> bool ret = false; >>>>>> >>>>>> /* 0 nid should not be used */ >>>>>> @@ -1863,13 +1863,25 @@ static bool add_free_nid(struct f2fs_sb_info >>>>>> *sbi, nid_t nid, bool build) >>>>>> } >>>>>> } >>>>>> ret = true; >>>>>> - err = __insert_free_nid(sbi, i, FREE_NID); >>>>>> + need_free = __insert_free_nid(sbi, i, FREE_NID); >>>>>> err_out: >>>>>> spin_unlock(&nm_i->nid_list_lock); >>>>>> radix_tree_preload_end(); >>>>>> err: >>>>>> - if (err) >>>>>> + if (need_free) >>>>>> kmem_cache_free(free_nid_slab, i); >>>>>> + /* >>>>>> + * For nid that should be free but not in the free >>>>>> + * structure, update the scan range in hope of adding >>>>>> + * it in the next scan. >>>>>> + */ >>>>>> + if (!ret || need_free < 0) { >>>>>> + block_t tmp_block = NAT_BLOCK_OFFSET(nid); >>>>>> + >>>>>> + if (tmp_block < nm_i->first_scan_block) >>>>>> + nm_i->first_scan_block = tmp_block; >>>>>> + } >>>>>> + >>>>>> return ret; >>>>>> } >>>>>> >>>>>> @@ -1950,10 +1962,17 @@ static void scan_free_nid_bits(struct >>>>>> f2fs_sb_info *sbi) >>>>>> struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA); >>>>>> struct f2fs_journal *journal = curseg->journal; >>>>>> unsigned int i, idx; >>>>>> + unsigned int max_blocks = NAT_BLOCK_OFFSET(nm_i->next_scan_nid); >>>>>> >>>>>> - down_read(&nm_i->nat_tree_lock); >>>>>> + /* every free nid in blocks scanned previously is in the free >>>>>> list */ >>>>>> + if (nm_i->first_scan_block == NEW_ADDR) >>>>>> + return; >>>>>> >>>>>> - for (i = 0; i < nm_i->nat_blocks; i++) { >>>>>> + if (max_blocks == 0) >>>>>> + max_blocks = nm_i->nat_blocks; >>>>>> + >>>>>> + down_read(&nm_i->nat_tree_lock); >>>>>> + for (i = nm_i->first_scan_block; i < max_blocks; i++) { >>>>>> if (!test_bit_le(i, nm_i->nat_block_bitmap)) >>>>>> continue; >>>>>> if (!nm_i->free_nid_count[i]) >>>>>> @@ -1967,10 +1986,13 @@ static void scan_free_nid_bits(struct >>>>>> f2fs_sb_info *sbi) >>>>>> nid = i * NAT_ENTRY_PER_BLOCK + idx; >>>>>> add_free_nid(sbi, nid, true); >>>>>> >>>>>> - if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) >>>>>> + if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS) { >>>>>> + nm_i->first_scan_block = i; >>>>>> goto out; >>>>>> + } >>>>>> } >>>>>> } >>>>>> + nm_i->first_scan_block = NEW_ADDR; >>>>>> out: >>>>>> down_read(&curseg->journal_rwsem); >>>>>> for (i = 0; i < nats_in_cursum(journal); i++) { @@ -2010,7 >>>>>> +2032,7 >>>>>> @@ static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, >>>>>> bool mount) >>>>>> /* try to find free nids in free_nid_bitmap */ >>>>>> scan_free_nid_bits(sbi); >>>>>> >>>>>> - if (nm_i->nid_cnt[FREE_NID]) >>>>>> + if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK) >>>>>> return; >>>>>> } >>>>>> >>>>>> @@ -2163,6 +2185,7 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, int >>>>>> nr_shrink) >>>>>> struct f2fs_nm_info *nm_i = NM_I(sbi); >>>>>> struct free_nid *i, *next; >>>>>> int nr = nr_shrink; >>>>>> + nid_t min_nid = nm_i->max_nid; >>>>>> >>>>>> if (nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS) >>>>>> return 0; >>>>>> @@ -2176,11 +2199,15 @@ int try_to_free_nids(struct f2fs_sb_info *sbi, >>>>>> int nr_shrink) >>>>>> nm_i->nid_cnt[FREE_NID] <= >>>>>> MAX_FREE_NIDS) >>>>>> break; >>>>>> >>>>>> + if (i->nid < min_nid) >>>>>> + min_nid = i->nid; >>>>>> __remove_free_nid(sbi, i, FREE_NID); >>>>>> kmem_cache_free(free_nid_slab, i); >>>>>> nr_shrink--; >>>>>> } >>>>>> spin_unlock(&nm_i->nid_list_lock); >>>>>> + if (min_nid != nm_i->max_nid) >>>>>> + nm_i->first_scan_block = NAT_BLOCK_OFFSET(min_nid); >>>>>> mutex_unlock(&nm_i->build_lock); >>>>>> >>>>>> return nr - nr_shrink; >>>>>> @@ -2674,6 +2701,7 @@ static int init_node_manager(struct f2fs_sb_info >>>>>> *sbi) >>>>>> init_rwsem(&nm_i->nat_tree_lock); >>>>>> >>>>>> nm_i->next_scan_nid = le32_to_cpu(sbi->ckpt->next_free_nid); >>>>>> + nm_i->first_scan_block = NEW_ADDR; >>>>>> nm_i->bitmap_size = __bitmap_size(sbi, NAT_BITMAP); >>>>>> version_bitmap = __bitmap_ptr(sbi, NAT_BITMAP); >>>>>> if (!version_bitmap) >>>>>> >>>>> >>>> >>>> >> >> >> >> . >> > > > . >