On Mon, 23 Jul 2007 22:24:57 +0800 Fengguang Wu <[EMAIL PROTECTED]> wrote:
> On Mon, Jul 23, 2007 at 07:00:59PM +1000, Nick Piggin wrote: > > Rusty Russell wrote: > > >On Sun, 2007-07-22 at 16:10 +0800, Fengguang Wu wrote: > > > > >>So I opt for it being made tunable, safe, and turned off by default. > > > > I hate tunables :) Unless we have workload A that gets a reasonable > > benefit from something and workload B that gets a significant regression, > > and no clear way to reconcile them... > > Me too ;) > > But sometimes we really want to avoid flushing the cache. > Andrew's user space LD_PRELOAD+fadvise based tool fit nicely here. It's the only way to go in some situations. Sometimes the kernel just cannot predict the future sufficiently well, and the costs of making a mistake are terribly high. We need human help. And it should be administration-time help, not programming-time help. > > >I'd like to see it turned on by default in -mm, and try to come up with > > >some server-like workload to measure the effect. Should be easy to > > >simulate something (eg. apache server, where clients grab some files in > > >preference, and apache server where clients grab different files). > > > > I don't like this kind of conditional information going from something > > like readahead into page reclaim. Unless it is for readahead _specific_ > > data such as "I got these all wrong, so you can reclaim them" (which > > this isn't). > > > > Possibly it makes sense to realise that the given pages are cheaper > > to read back in as they are apparently being read-ahead very nicely. > > In fact I have talked to Jens about it in last year's kernel summit. > The patch below explains itself. > --- > Subject: cost based page reclaim > > Cost based page reclaim - a minimalist implementation. > > Suppose we cached 32 small files each with 1 page, and one 32-page chunk from > a > large file. Should we first drop the 32-pages which are read in one I/O, or > drop the 32 distinct pages, each costs one I/O? (Given that the files are of > equal hotness.) > > Page replacement algorithms should be designed to minimize the number of I/Os, > instead of the number of page faults. Dividing the cost of I/O by the number > of > pages it bring in, we get the cost of the page. The bigger page cost, the more > 'lives/bloods' the page should have. > > This patch adds life to costly pages by pretending that they are > referenced more times. Possible downsides: > - burdens the pressure of vmscan > - active pages are no longer that 'active' > This is all fun stuff, but how do we find out that changes like this are good ones, apart from shipping it and seeing who gets hurt 12 months later? > +#define log2(n) fls(n) <look at include/linux/log2.h> - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/