On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote: > > > On 06/15/2018 08:01 PM, Andres Freund wrote: > > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: > > > > > > > > > On 14.06.2018 09:52, Thomas Munro wrote: > > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik > > > > <k.knizh...@postgrespro.ru> wrote: > > > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch > > > > > block > > > > > references in WAL records > > > > > using posix_fadvise(WILLNEED) system call. > > > > Hi Konstantin, > > > > > > > > Why stop at the page cache... what about shared buffers? > > > > > > > > > > It is good question. I thought a lot about prefetching directly to shared > > > buffers. > > > > I think that's definitely how this should work. I'm pretty strongly > > opposed to a prefetching implementation that doesn't read into s_b. > > > > Could you elaborate why prefetching into s_b is so much better (I'm sure it > has advantages, but I suppose prefetching into page cache would be much > easier to implement).
I think there's a number of issues with just issuing prefetch requests via fadvise etc: - it leads to guaranteed double buffering, in a way that's just about guaranteed to *never* be useful. Because we'd only prefetch whenever there's an upcoming write, there's simply no benefit in the page staying in the page cache - we'll write out the whole page back to the OS. - reading from the page cache is far from free - so you add costs to the replay process that it doesn't need to do. - you don't have any sort of completion notification, so you basically just have to guess how far ahead you want to read. If you read a bit too much you suddenly get into synchronous blocking land. - The OS page is actually not particularly scalable to large amounts of data either. Nor are the decisions what to keep cached likley to be particularly useful. - We imo need to add support for direct IO before long, and adding more and more work to reach feature parity strikes meas a bad move. Greetings, Andres Freund