> From: Seth Jennings [mailto:sjenn...@linux.vnet.ibm.com] > Subject: Re: [PATCH 7/8] zswap: add to mm/ > > > I am eagerly studying one of the details of your zswap "flush" > > code in this patch to see how you solved a problem or two that > > I was struggling with for the similar mechanism RFC'ed for zcache > > (see https://lkml.org/lkml/2012/10/3/558). I like the way > > that you force the newly-uncompressed to-be-flushed page immediately > > into a swap bio in zswap_flush_entry via the call to swap_writepage, > > though I'm not entirely convinced that there aren't some race > > conditions there. However, won't swap_writepage simply call > > frontswap_store instead and re-compress the page back into zswap? > > I break part of swap_writepage() into a bottom half called > __swap_writepage() that doesn't include the call to frontswap_store(). > __swap_writepage() is what is called from zswap_flush_entry(). That > is how I avoid flushed pages recycling back into zswap and the > potential recursion mentioned.
OK, I missed that. Nice. I will see if I can use the same with zcache and, if so, would be happy to support the change to swap_writepage. In your next version, maybe you could break out that chunk into a separate distinct patch so it can be pulled separately into Andrew's tree? > > A second related issue that concerns me is that, although you > > are now, like zcache2, using an LRU queue for compressed pages > > (aka "zpages"), there is no relationship between that queue and > > physical pageframes. In other words, you may free up 100 zpages > > out of zswap via zswap_flush_entries, but not free up a single > > pageframe. This seems like a significant design issue. Or am > > I misunderstanding the code? > > You understand correctly. There is room for optimization here and it > is something I'm working on right now. > > What I'm looking to do is give zswap a little insight into zsmalloc > internals, Not to be at all snide, but had you been as eager to break the zsmalloc abstraction last spring, a lot of unpleasantness and extra work might have been avoided. :v( > namely the ability figure out what class size a particular > allocation is in and, in the event the store can't be satisfied, flush > an entry from that exact class size so that we can be assured the > store will succeed with minimal flushing work. In this solution, > there would be an LRU list per zsmalloc class size tracked in zswap. > The result is LRU-ish flushing overall with class size being the first > flush selection criteria and LRU as the second. Clever and definitely useful, though I think there are two related problems and IIUC this solves only one of them. The problem it _does_ solve is (A) where to put a new zpage: Move a zpage from the same class to real-swap-disk and then fill its slot with the new zpage. The problem it _doesn't_ solve is (B) how to shrink the total number of pageframes used by zswap, even by a single page. I believe (though cannot prove right now) that this latter problem will need to be solved to implement any suitable MM policy for balancing pages-used-for-compression vs pages-not-used-for-compression. I fear that problem (B) is the fundamental concern with using a high-density storage allocator such as zsmalloc, which is why I abandoned zsmalloc in favor of a more-predictable-but- less-dense allocator (zbud). However, if you have a solution for (B) as well, I would gladly abandon zbud in zcache (for _both_ cleancache and frontswap pages) and our respective in-kernel compression efforts would be more easy to merge into one solution in the future. > > A third concern is about scalability... the locking seems very > > coarse-grained. In zcache, you personally observed and fixed > > hashbucket contention (see https://lkml.org/lkml/2011/9/29/215). > > Doesn't zswap's tree_lock essentially use a single tree (per > > swaptype), i.e. no scalability? > > The reason the coarse lock isn't a problem for zswap like the hash > bucket locks where in zcache is that the lock is not held for long > periods time as it is in zcache. It is only held while operating on > the tree, not during compression/decompression and larger memory > operations. Hmmm... IIRC, to avoid races in zcache, it was necessary to update both the data (zpage) and meta-data ("tree" in zswap, and tmem-data-structure in zcache) atomically. I will need to study your code more to understand how zswap avoids this requirement. Or if it is obvious to you, I would be grateful if you would point it out to me. Thanks, Dan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/