On Mon, Jan 13, 2014 at 6:44 PM, Dave Chinner <da...@fromorbit.com> wrote:
> On Tue, Jan 14, 2014 at 02:26:25AM +0100, Andres Freund wrote: > > On 2014-01-13 17:13:51 -0800, James Bottomley wrote: > > > a file into a user provided buffer, thus obtaining a page cache entry > > > and a copy in their userspace buffer, then insert the page of the user > > > buffer back into the page cache as the page cache page ... that's > right, > > > isn't it postgress people? > > > > Pretty much, yes. We'd probably hint (*advise(DONTNEED)) that the page > > isn't needed anymore when reading. And we'd normally write if the page > > is dirty. > > So why, exactly, do you even need the kernel page cache here? We don't need it, but it would be nice. > You've > got direct access to the copy of data read into userspace, and you > want direct control of when and how the data in that buffer is > written and reclaimed. Why push that data buffer back into the > kernel and then have to add all sorts of kernel interfaces to > control the page you already have control of? > Say 25% of the RAM is dedicated to the database's shared buffers, and 75% is left to the kernel's judgement. It sure would be nice if the kernel had the capability of using some of that 75% for database pages, if it thought that that was the best use for it. Which is what we do get now, at the expense of quite a lot of double buffering (by which I mean, a lot of pages are both in the kernel cache and the database cache--not just transiently during the copy process, but for quite a while). If we had the ability to re-inject clean pages into the kernel's cache, we would get that benefit without the double buffering. Cheers, Jeff