Hi Chao, On Sun, Jan 04, 2015 at 11:19:28AM +0800, Chao Yu wrote: > Hi Changman, > > Sorry for replying late! > > > -----Original Message----- > > From: Changman Lee [mailto:cm224....@samsung.com] > > Sent: Tuesday, December 30, 2014 8:32 AM > > To: Jaegeuk Kim > > Cc: Chao Yu; linux-f2fs-de...@lists.sourceforge.net; > > linux-kernel@vger.kernel.org > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree > > > > Hi all, > > > > On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote: > > > Hi Chao, > > > > > > On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote: > > > > > > [snip] > > > > > > Nice draft. :) > > > > > > > > > > > Please see the draft below. > > > > > > > > 1) Extent management: > > > > If we use global management that managing all extents which are from > > > > different > > > > inodes in sbi, we will face with serious lock contention when we access > > > > these > > > > extents belong to different inodes concurrently, the loss may > > > > outweights the > > > > gain. > > > > > > Agreed. > > > > > > > So we choose a local management for extent which means all extents are > > > > managed by inode itself to avoid above lock contention. Addtionlly, we > > > > manage > > > > all extents globally by linking all inode into a global lru list for > > > > extent > > > > cache shrinker. > > > > Approach: > > > > a) build extent tree/rwlock/lru list/extent count in each inode. > > > > *extent tree: link all extent in rb-tree; > > > > *rwlock: protect fields when accessing extent cache > > > > concurrently; > > > > *lru list: sort all extents in accessing time order; > > > > *extent count: record total count of extents in cache. > > > > b) use lru shrink list in sbi to manage all inode which cached > > > > extents. > > > > *inode will be added or repostioned in this global list > > > > whenever > > > > extent is being access in this inode. > > > > *use spinlock to protect this shrink list. > > > > > > 1. How about adding a data structure with inode number instead of > > > referring > > > inode pointer? > > > > > > 2. How about managing extent entries globally and setting an upper bound > > > to > > > the number of extent entries instead of limiting them per each inode? > > > (The rb-tree will handle many extents per inode.) > > > > > > 3. It needs to set a minimum length for the candidate of extent cache. > > > (e.g., 64) > > > > > > > Agreed. > > > > > So, for example, > > > struct ino_entry_for_extents { > > > inode number; > > > rb_tree for extent_entry objects; > > > rwlock; > > > }; > > > > > > struct extent_entry { > > > blkaddr, len; > > > list_head *; > > > }; > > > > > > Something like this. > > > > > > [A, B, C, ... are extent entry] > > > > > > The sbi has > > > 1. an extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU) > > > 2. radix_tree: ino_entry_for_extents (#10) has D, B in rb-tree > > > ` ino_entry_for_extents (#11) has A, C in rb-tree > > > ` ino_entry_for_extents (#12) has F in rb-tree > > > ` ino_entry_for_extents (#13) has G, E in rb-tree > > > > > > In f2fs_update_extent_cache and __get_data_block for #10, > > > ino_entry_for_extents (#10) was founded and updated D or B. > > > Then, updated entries are moved to MRU. > > > > > > In f2fs_evict_inode for #11, A and C are moved to LRU. > > > But, if this inode is unlinked, all the A, C, and ino_entry_for_extens > > > (#11) > > > should be released. > > > > > > In f2fs_balance_fs_bg, some LRU extents are released according to the > > > amount > > > of consumed memory. Then, it frees any ino_entry_for_extents having no > > > extent. > > > > > > IMO, we don't need to consider readahead for this, since get_data_block > > > will > > > be called by VFS readahead. > > > > > > Furthermore, we need to think about whether LRU is really best or not. > > > IMO, the extent cache aims to improve second access speed, rather than > > > initial > > > cold misses. So, maybe MRU or another algorithms would be better. > > > > > > > Right. It's very comflicated to judge which is better. > > In read or write path, extents could be made every time. At that time, we > > should > > decide which extent evicts instead of new extents if we set upper bound. > > In update, one extent could be seperated into 3. It requires 3 insertion > > and 1 deletion. > > So if update happends frequently, we could give up extent management for > > some ranges. > > And we need to bring ideas from vm managemnt. For example, > > active/inactive list and second chance to promotion, or batch work for > > insertion/deletion > > > > I thought suddenly 'Simple is best'. > > Let's think about better ideas together. > > Yeah, how about using an opposite way to the way of page cache manager? > > for example: > node page A,B,C,D is in page cache; > extent a,b,c,d is in extent cache; > extent a is built from page A, ..., d is built from page D. > page cache: LRU A -> B -> C -> D MRU > extent cache: LRU a -> b -> c -> d MRU > > If we use > 1) the same way LRU, cache pair A-a, B-b, ... may be reclaimed in the same > time as OOM. > 2) the opposite way, maybe A,B in page cache and d,c in extent cache will be > reclaimed, > but we still can hit whole cache in combination of page cache and extent > cache. > > So by the way '2)' we can increase the total coverage area of page cache + > extent cache.
Good idea. :) If we decide which one is better between global lru and per inode lru, it's expected that f2fs read performance will be more improved. I'll wait your patch. Thanks, > > > > > > Thanks, > > > > > > > > > > > 2) Limitation: > > > > In one inode, as we split or add extent in extent cache when > > > > read/write, extent > > > > number will enlarge, so memory and CPU overhead will increase. > > > > In order to control the overhead of memory and CPU, we try to set a > > > > upper bound > > > > number to limit total extent number in each inode, This number is global > > > > configuration which is visable to all inode. This number will be > > > > exported to > > > > sysfs for configuring according to requirement of user. By default, > > > > designed > > > > number is 8. > > > > > > > > Chao, > > It's better which # of extent are controlled globally rather than limit > > extents > > per inode as Jaegeuk said to reduce extent management overhead. > > It's OK. > > > > > > > 3) Shrinker: > > > > There are two shrink paths: > > > > a) one is triggered when extent count has exceed the upper > > > > bound of > > > > inode's extent cache. We will try to release extent(s) from > > > > head of > > > > inode's inner extent lru list until extent count is equal to > > > > upper bound. > > > > This operation could be in f2fs_update_extent_cache(). > > > > b) the other one is triggered when memory util exceed > > > > threshold, we try > > > > get inode from head of global lru list(s), and release > > > > extent(s) with > > > > fixed number (by default: 64 extents) in inode one by one. > > > > This operation could be in f2fs_balance_fs_bg(). > > > > > > > > Let's consider to use register_shrinker which is called by vm when > > memory pressure happens. > > Great, thanks for your reminding! :-) > > Thanks, > Yu > > > > > > > 4) Revalidation: > > > > In ->evict(), extent cache will be released, in order to reuse extent > > > > cache > > > > of inode when reopen for high hit ratio, a proper way is to add cached > > > > extent > > > > tree into a hash table (could be radix tree), revalidate extent tree > > > > and recover > > > > it to inode when reopen. > > > > Besides, a global lru list is needed here for shrinker. > > > > > > > > 5) Readahead: > > > > Expand extent cache by readaheading when we call get_dnode_of_data in > > > > non-emergency > > > > path. Note, ra can affect lock contention for both ->rwlock in extent > > > > cache and > > > > ->lock of global shrink list. > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/