Hi Chao,

On Sun, Jan 04, 2015 at 11:19:28AM +0800, Chao Yu wrote:
> Hi Changman,
> 
> Sorry for replying late!
> 
> > -----Original Message-----
> > From: Changman Lee [mailto:cm224....@samsung.com]
> > Sent: Tuesday, December 30, 2014 8:32 AM
> > To: Jaegeuk Kim
> > Cc: Chao Yu; linux-f2fs-de...@lists.sourceforge.net; 
> > linux-kernel@vger.kernel.org
> > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> > 
> > Hi all,
> > 
> > On Mon, Dec 29, 2014 at 01:23:00PM -0800, Jaegeuk Kim wrote:
> > > Hi Chao,
> > >
> > > On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote:
> > >
> > > [snip]
> > >
> > > Nice draft. :)
> > >
> > > >
> > > > Please see the draft below.
> > > >
> > > > 1) Extent management:
> > > > If we use global management that managing all extents which are from 
> > > > different
> > > > inodes in sbi, we will face with serious lock contention when we access 
> > > > these
> > > > extents belong to different inodes concurrently, the loss may 
> > > > outweights the
> > > > gain.
> > >
> > > Agreed.
> > >
> > > > So we choose a local management for extent which means all extents are
> > > > managed by inode itself to avoid above lock contention. Addtionlly, we 
> > > > manage
> > > > all extents globally by linking all inode into a global lru list for 
> > > > extent
> > > > cache shrinker.
> > > > Approach:
> > > >         a) build extent tree/rwlock/lru list/extent count in each inode.
> > > >                 *extent tree: link all extent in rb-tree;
> > > >                 *rwlock: protect fields when accessing extent cache 
> > > > concurrently;
> > > >                 *lru list: sort all extents in accessing time order;
> > > >                 *extent count: record total count of extents in cache.
> > > >         b) use lru shrink list in sbi to manage all inode which cached 
> > > > extents.
> > > >                 *inode will be added or repostioned in this global list 
> > > > whenever
> > > >                 extent is being access in this inode.
> > > >                 *use spinlock to protect this shrink list.
> > >
> > > 1. How about adding a data structure with inode number instead of 
> > > referring
> > > inode pointer?
> > >
> > > 2. How about managing extent entries globally and setting an upper bound 
> > > to
> > > the number of extent entries instead of limiting them per each inode?
> > > (The rb-tree will handle many extents per inode.)
> > >
> > > 3. It needs to set a minimum length for the candidate of extent cache.
> > >  (e.g., 64)
> > >
> > 
> > Agreed.
> > 
> > > So, for example,
> > > struct ino_entry_for_extents {
> > >   inode number;
> > >   rb_tree for extent_entry objects;
> > >   rwlock;
> > > };
> > >
> > > struct extent_entry {
> > >   blkaddr, len;
> > >   list_head *;
> > > };
> > >
> > > Something like this.
> > >
> > > [A, B, C, ... are extent entry]
> > >
> > > The sbi has
> > > 1. an extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU)
> > > 2. radix_tree:  ino_entry_for_extents (#10) has D, B in rb-tree
> > >               ` ino_entry_for_extents (#11) has A, C in rb-tree
> > >               ` ino_entry_for_extents (#12) has F    in rb-tree
> > >               ` ino_entry_for_extents (#13) has G, E in rb-tree
> > >
> > > In f2fs_update_extent_cache and __get_data_block for #10,
> > >   ino_entry_for_extents (#10) was founded and updated D or B.
> > >   Then, updated entries are moved to MRU.
> > >
> > > In f2fs_evict_inode for #11, A and C are moved to LRU.
> > > But, if this inode is unlinked, all the A, C, and ino_entry_for_extens 
> > > (#11)
> > > should be released.
> > >
> > > In f2fs_balance_fs_bg, some LRU extents are released according to the 
> > > amount
> > > of consumed memory. Then, it frees any ino_entry_for_extents having no 
> > > extent.
> > >
> > > IMO, we don't need to consider readahead for this, since get_data_block 
> > > will
> > > be called by VFS readahead.
> > >
> > > Furthermore, we need to think about whether LRU is really best or not.
> > > IMO, the extent cache aims to improve second access speed, rather than 
> > > initial
> > > cold misses. So, maybe MRU or another algorithms would be better.
> > >
> > 
> > Right. It's very comflicated to judge which is better.
> > In read or write path, extents could be made every time. At that time, we 
> > should
> > decide which extent evicts instead of new extents if we set upper bound.
> > In update, one extent could be seperated into 3. It requires 3 insertion 
> > and 1 deletion.
> > So if update happends frequently, we could give up extent management for 
> > some ranges.
> > And we need to bring ideas from vm managemnt. For example,
> > active/inactive list and second chance to promotion, or batch work for 
> > insertion/deletion
> > 
> > I thought suddenly 'Simple is best'.
> > Let's think about better ideas together.
> 
> Yeah, how about using an opposite way to the way of page cache manager?
> 
> for example:
> node page A,B,C,D is in page cache;
> extent a,b,c,d is in extent cache;
> extent a is built from page A, ..., d is built from page D.
> page cache: LRU A -> B -> C -> D MRU
> extent cache: LRU a -> b -> c -> d MRU
> 
> If we use
> 1) the same way LRU, cache pair A-a, B-b, ... may be reclaimed in the same 
> time as OOM.
> 2) the opposite way, maybe A,B in page cache and d,c in extent cache will be 
> reclaimed,
> but we still can hit whole cache in combination of page cache and extent 
> cache.
> 
> So by the way '2)' we can increase the total coverage area of page cache + 
> extent cache.

Good idea. :)
If we decide which one is better between global lru and per inode lru,
it's expected that f2fs read performance will be more improved.
I'll wait your patch.

Thanks,

> 
> > 
> > > Thanks,
> > >
> > > >
> > > > 2) Limitation:
> > > > In one inode, as we split or add extent in extent cache when 
> > > > read/write, extent
> > > > number will enlarge, so memory and CPU overhead will increase.
> > > > In order to control the overhead of memory and CPU, we try to set a 
> > > > upper bound
> > > > number to limit total extent number in each inode, This number is global
> > > > configuration which is visable to all inode. This number will be 
> > > > exported to
> > > > sysfs for configuring according to requirement of user. By default, 
> > > > designed
> > > > number is 8.
> > > >
> > 
> > Chao,
> > It's better which # of extent are controlled globally rather than limit 
> > extents
> > per inode as Jaegeuk said to reduce extent management overhead.
> 
> It's OK.
> 
> > 
> > > > 3) Shrinker:
> > > > There are two shrink paths:
> > > >         a) one is triggered when extent count has exceed the upper 
> > > > bound of
> > > >         inode's extent cache. We will try to release extent(s) from 
> > > > head of
> > > >         inode's inner extent lru list until extent count is equal to 
> > > > upper bound.
> > > >         This operation could be in f2fs_update_extent_cache().
> > > >         b) the other one is triggered when memory util exceed 
> > > > threshold, we try
> > > >         get inode from head of global lru list(s), and release 
> > > > extent(s) with
> > > >         fixed number (by default: 64 extents) in inode one by one.
> > > >         This operation could be in f2fs_balance_fs_bg().
> > > >
> > 
> > Let's consider to use register_shrinker which is called by vm when
> > memory pressure happens.
> 
> Great, thanks for your reminding! :-)
> 
> Thanks,
> Yu
> 
> > 
> > > > 4) Revalidation:
> > > > In ->evict(), extent cache will be released, in order to reuse extent 
> > > > cache
> > > > of inode when reopen for high hit ratio, a proper way is to add cached 
> > > > extent
> > > > tree into a hash table (could be radix tree), revalidate extent tree 
> > > > and recover
> > > > it to inode when reopen.
> > > > Besides, a global lru list is needed here for shrinker.
> > > >
> > > > 5) Readahead:
> > > > Expand extent cache by readaheading when we call get_dnode_of_data in 
> > > > non-emergency
> > > > path. Note, ra can affect lock contention for both ->rwlock in extent 
> > > > cache and
> > > > ->lock of global shrink list.
> > > >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to