Re: WAL prefetch

2018-07-09 Thread Konstantin Knizhnik
On 09.07.2018 21:28, Andres Freund wrote: Hi, On 2018-07-09 11:59:06 +0200, Tomas Vondra wrote: * During the design phase, I looked into using bgworkers but given the number of in-flight pread(2) calls required to fully utilize the IO subsystem, I opted for something threaded (I was

Re: WAL prefetch

2018-07-09 Thread Konstantin Knizhnik
On 08.07.2018 00:47, Tomas Vondra wrote: Hi, I've done a bit of testing on the current patch, mostly to see how much the prefetching can help (if at all). While the patch is still in early WIP stages (at least that's my assessment, YMMV), the improvement are already quite significant. I've a

Re: WAL prefetch

2018-07-09 Thread Andres Freund
Hi, On 2018-07-09 11:59:06 +0200, Tomas Vondra wrote: > > * During the design phase, I looked into using bgworkers but given the > > number of > >in-flight pread(2) calls required to fully utilize the IO subsystem, I > > opted > >for something threaded (I was also confined to using Solar

Re: WAL prefetch

2018-07-09 Thread Tomas Vondra
On 07/09/2018 02:26 AM, Sean Chittenden wrote: > ... snip ... > The real importance of prefaulting becomes apparent in the following two situations: 1. Priming the OS's filesystem cache, notably after an OS restart. This is of value to all PostgreSQL scenarios, regardless of whether o

Re: WAL prefetch

2018-07-08 Thread Sean Chittenden
> Without prefetching, it's ~70GB of WAL. With prefetching, it's only about > 30GB. Considering the 1-hour test generates about 90GB of WAL, this means the > replay speed grew from 20GB/h to almost 60GB/h. That's rather measurable > improvement ;-) Thank you everyone for this reasonably in-depth t

Re: WAL prefetch

2018-06-27 Thread Tomas Vondra
On 06/27/2018 11:44 AM, Konstantin Knizhnik wrote: ... I have improved my WAL prefetch patch. The main reason of slowdown recovery speed with enabled prefetch was that it doesn't take in account initialized pagesĀ  (XLOG_HEAP_INIT_PAGE) and doesn't remember (cache) full page writes. The main

Re: WAL prefetch

2018-06-27 Thread Konstantin Knizhnik
On 22.06.2018 11:35, Konstantin Knizhnik wrote: On 21.06.2018 19:57, Tomas Vondra wrote: On 06/21/2018 04:01 PM, Konstantin Knizhnik wrote: I continue my experiments with WAL prefetch. I have embedded prefetch in Postgres: now walprefetcher is started together with startup process and is

Re: WAL prefetch

2018-06-22 Thread Konstantin Knizhnik
On 21.06.2018 19:57, Tomas Vondra wrote: On 06/21/2018 04:01 PM, Konstantin Knizhnik wrote: I continue my experiments with WAL prefetch. I have embedded prefetch in Postgres: now walprefetcher is started together with startup process and is able to help it to speedup recovery. The patch i

Re: WAL prefetch

2018-06-21 Thread Tomas Vondra
On 06/21/2018 04:01 PM, Konstantin Knizhnik wrote: I continue my experiments with WAL prefetch. I have embedded prefetch in Postgres: now walprefetcher is started together with startup process and is able to help it to speedup recovery. The patch is attached. Unfortunately result is negativ

Re: WAL prefetch

2018-06-21 Thread Konstantin Knizhnik
I continue my experiments with WAL prefetch. I have embedded prefetch in Postgres: now walprefetcher is started together with startup process and is able to help it to speedup recovery. The patch is attached. Unfortunately result is negative (at least at my desktop: SSD, 16Gb RAM). Recovery wi

Re: WAL prefetch

2018-06-19 Thread Tomas Vondra
On 06/19/2018 06:34 PM, Konstantin Knizhnik wrote: On 19.06.2018 18:50, Andres Freund wrote: On 2018-06-19 12:08:27 +0300, Konstantin Knizhnik wrote: I do not think that prefetching in shared buffers requires much more efforts and make patch more envasive... It even somehow simplify it, beca

Re: WAL prefetch

2018-06-19 Thread Andres Freund
Hi, On 2018-06-19 18:41:24 +0200, Tomas Vondra wrote: > I'm confused. I thought you wanted to prefetch directly to shared buffers, > so that it also works with direct I/O in the future. But now you suggest to > use posix_fadvise() to work around the synchronous buffer read limitation. I > don't fo

Re: WAL prefetch

2018-06-19 Thread Andres Freund
On 2018-06-19 19:34:22 +0300, Konstantin Knizhnik wrote: > On 19.06.2018 18:50, Andres Freund wrote: > > On 2018-06-19 12:08:27 +0300, Konstantin Knizhnik wrote: > > > I do not think that prefetching in shared buffers requires much more > > > efforts > > > and make patch more envasive... > > > It

Re: WAL prefetch

2018-06-19 Thread Tomas Vondra
On 06/19/2018 05:50 PM, Andres Freund wrote: On 2018-06-19 12:08:27 +0300, Konstantin Knizhnik wrote: I do not think that prefetching in shared buffers requires much more efforts and make patch more envasive... It even somehow simplify it, because there is no to maintain own cache of prefetch

Re: WAL prefetch

2018-06-19 Thread Konstantin Knizhnik
On 19.06.2018 18:50, Andres Freund wrote: On 2018-06-19 12:08:27 +0300, Konstantin Knizhnik wrote: I do not think that prefetching in shared buffers requires much more efforts and make patch more envasive... It even somehow simplify it, because there is no to maintain own cache of prefetched

Re: WAL prefetch

2018-06-19 Thread Andres Freund
On 2018-06-19 12:08:27 +0300, Konstantin Knizhnik wrote: > I do not think that prefetching in shared buffers requires much more efforts > and make patch more envasive... > It even somehow simplify it, because there is no to maintain own cache of > prefetched pages... > But it will definitely have

Re: WAL prefetch

2018-06-19 Thread Tomas Vondra
On 06/19/2018 04:50 PM, Konstantin Knizhnik wrote: On 19.06.2018 16:57, Ants Aasma wrote: On Tue, Jun 19, 2018 at 4:04 PM Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote: Right. My point is that while spawning bgworkers probably helps, I don't expect it to be enough

Re: WAL prefetch

2018-06-19 Thread Konstantin Knizhnik
On 19.06.2018 16:57, Ants Aasma wrote: On Tue, Jun 19, 2018 at 4:04 PM Tomas Vondra mailto:tomas.von...@2ndquadrant.com>> wrote: Right. My point is that while spawning bgworkers probably helps, I don't expect it to be enough to fill the I/O queues on modern storage systems.

Re: WAL prefetch

2018-06-19 Thread Ants Aasma
On Tue, Jun 19, 2018 at 4:04 PM Tomas Vondra wrote: > Right. My point is that while spawning bgworkers probably helps, I don't > expect it to be enough to fill the I/O queues on modern storage systems. > Even if you start say 16 prefetch bgworkers, that's not going to be > enough for large arrays

Re: WAL prefetch

2018-06-19 Thread Tomas Vondra
On 06/19/2018 02:33 PM, Konstantin Knizhnik wrote: On 19.06.2018 14:03, Tomas Vondra wrote: On 06/19/2018 11:08 AM, Konstantin Knizhnik wrote: ... >>> Also there are two points which makes prefetching into shared buffers more complex: 1. Need to spawn multiple workers to make prefetch in p

Re: WAL prefetch

2018-06-19 Thread Konstantin Knizhnik
On 19.06.2018 14:03, Tomas Vondra wrote: On 06/19/2018 11:08 AM, Konstantin Knizhnik wrote: On 18.06.2018 23:47, Andres Freund wrote: On 2018-06-18 16:44:09 -0400, Robert Haas wrote: On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund wrote: The posix_fadvise approach is not perfect, no dou

Re: WAL prefetch

2018-06-19 Thread Tomas Vondra
On 06/19/2018 11:08 AM, Konstantin Knizhnik wrote: On 18.06.2018 23:47, Andres Freund wrote: On 2018-06-18 16:44:09 -0400, Robert Haas wrote: On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund wrote: The posix_fadvise approach is not perfect, no doubt about that. But it works pretty well for

Re: WAL prefetch

2018-06-19 Thread Konstantin Knizhnik
On 18.06.2018 23:47, Andres Freund wrote: On 2018-06-18 16:44:09 -0400, Robert Haas wrote: On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund wrote: The posix_fadvise approach is not perfect, no doubt about that. But it works pretty well for bitmap heap scans, and it's about 13249x better (roug

Re: WAL prefetch

2018-06-18 Thread Andres Freund
On 2018-06-18 16:44:09 -0400, Robert Haas wrote: > On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund wrote: > >> The posix_fadvise approach is not perfect, no doubt about that. But it > >> works pretty well for bitmap heap scans, and it's about 13249x better > >> (rough estimate) than the current sol

Re: WAL prefetch

2018-06-18 Thread Robert Haas
On Sat, Jun 16, 2018 at 3:41 PM, Andres Freund wrote: >> The posix_fadvise approach is not perfect, no doubt about that. But it >> works pretty well for bitmap heap scans, and it's about 13249x better >> (rough estimate) than the current solution (no prefetching). > > Sure, but investing in an arc

Re: WAL prefetch

2018-06-17 Thread Konstantin Knizhnik
On 17.06.2018 03:00, Andres Freund wrote: On 2018-06-16 23:25:34 +0300, Konstantin Knizhnik wrote: On 16.06.2018 22:02, Andres Freund wrote: On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote: On 06/15/2018 08:01 PM, Andres Freund wrote: On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wro

Re: WAL prefetch

2018-06-16 Thread Andres Freund
On 2018-06-16 23:31:49 +0300, Konstantin Knizhnik wrote: > > > On 16.06.2018 22:23, Andres Freund wrote: > > Hi, > > > > On 2018-06-13 16:09:45 +0300, Konstantin Knizhnik wrote: > > > Usage: > > > 1. At master: create extension wal_prefetch > > > 2. At replica: Call pg_wal_prefetch() function: i

Re: WAL prefetch

2018-06-16 Thread Andres Freund
On 2018-06-16 23:25:34 +0300, Konstantin Knizhnik wrote: > > > On 16.06.2018 22:02, Andres Freund wrote: > > On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote: > > > > > > On 06/15/2018 08:01 PM, Andres Freund wrote: > > > > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: > > > > > > >

Re: WAL prefetch

2018-06-16 Thread Konstantin Knizhnik
On 16.06.2018 22:23, Andres Freund wrote: Hi, On 2018-06-13 16:09:45 +0300, Konstantin Knizhnik wrote: Usage: 1. At master: create extension wal_prefetch 2. At replica: Call pg_wal_prefetch() function: it will not return until you interrupt it. FWIW, I think the proper design would rather b

Re: WAL prefetch

2018-06-16 Thread Konstantin Knizhnik
On 16.06.2018 22:02, Andres Freund wrote: On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote: On 06/15/2018 08:01 PM, Andres Freund wrote: On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: On 14.06.2018 09:52, Thomas Munro wrote: On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik

Re: WAL prefetch

2018-06-16 Thread Andres Freund
Hi, On 2018-06-16 21:34:30 +0200, Tomas Vondra wrote: > > - it leads to guaranteed double buffering, in a way that's just about > > guaranteed to *never* be useful. Because we'd only prefetch whenever > > there's an upcoming write, there's simply no benefit in the page > > staying in the pag

Re: WAL prefetch

2018-06-16 Thread Tomas Vondra
On 06/16/2018 09:02 PM, Andres Freund wrote: > On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote: >> >> >> On 06/15/2018 08:01 PM, Andres Freund wrote: >>> On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: On 14.06.2018 09:52, Thomas Munro wrote: > On Thu, Jun 14, 2018 at 1

Re: WAL prefetch

2018-06-16 Thread Andres Freund
Hi, On 2018-06-13 16:09:45 +0300, Konstantin Knizhnik wrote: > Usage: > 1. At master: create extension wal_prefetch > 2. At replica: Call pg_wal_prefetch() function: it will not return until you > interrupt it. FWIW, I think the proper design would rather be a background worker that does this wor

Re: WAL prefetch

2018-06-16 Thread Andres Freund
On 2018-06-16 11:38:59 +0200, Tomas Vondra wrote: > > > On 06/15/2018 08:01 PM, Andres Freund wrote: > > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: > > > > > > > > > On 14.06.2018 09:52, Thomas Munro wrote: > > > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik > > > > wrote: >

Re: WAL prefetch

2018-06-16 Thread Stephen Frost
Greetings, * Tomas Vondra (tomas.von...@2ndquadrant.com) wrote: > On 06/16/2018 12:06 PM, Thomas Munro wrote: > >On Sat, Jun 16, 2018 at 9:38 PM, Tomas Vondra > > wrote: > >>On 06/15/2018 08:01 PM, Andres Freund wrote: > >>>On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: > On 14.06.20

Re: WAL prefetch

2018-06-16 Thread Tomas Vondra
On 06/16/2018 12:06 PM, Thomas Munro wrote: On Sat, Jun 16, 2018 at 9:38 PM, Tomas Vondra wrote: On 06/15/2018 08:01 PM, Andres Freund wrote: On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: On 14.06.2018 09:52, Thomas Munro wrote: Why stop at the page cache... what about shared

Re: WAL prefetch

2018-06-16 Thread Thomas Munro
On Sat, Jun 16, 2018 at 9:38 PM, Tomas Vondra wrote: > On 06/15/2018 08:01 PM, Andres Freund wrote: >> On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: >>> On 14.06.2018 09:52, Thomas Munro wrote: Why stop at the page cache... what about shared buffers? >>> >>> It is good question. I

Re: WAL prefetch

2018-06-16 Thread Tomas Vondra
On 06/15/2018 08:01 PM, Andres Freund wrote: On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: On 14.06.2018 09:52, Thomas Munro wrote: On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik wrote: pg_wal_prefetch function will infinitely traverse WAL and prefetch block references i

Re: WAL prefetch

2018-06-15 Thread Amit Kapila
On Sat, Jun 16, 2018 at 10:47 AM, Konstantin Knizhnik wrote: > > > On 16.06.2018 06:33, Amit Kapila wrote: >> >> On Fri, Jun 15, 2018 at 11:31 PM, Andres Freund >> wrote: >>> >>> On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: On 14.06.2018 09:52, Thomas Munro wrote: >

Re: WAL prefetch

2018-06-15 Thread Konstantin Knizhnik
On 16.06.2018 06:33, Amit Kapila wrote: On Fri, Jun 15, 2018 at 11:31 PM, Andres Freund wrote: On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: On 14.06.2018 09:52, Thomas Munro wrote: On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik wrote: pg_wal_prefetch function will infi

Re: WAL prefetch

2018-06-15 Thread Konstantin Knizhnik
On 16.06.2018 06:30, Amit Kapila wrote: On Fri, Jun 15, 2018 at 8:45 PM, Konstantin Knizhnik wrote: On 15.06.2018 18:03, Amit Kapila wrote: wal_prefetch is prefetching blocks referenced by WAL records. But in case of "full page writes" such prefetch is not needed and even is harmful. Okay

Re: WAL prefetch

2018-06-15 Thread Amit Kapila
On Fri, Jun 15, 2018 at 11:31 PM, Andres Freund wrote: > On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: >> >> >> On 14.06.2018 09:52, Thomas Munro wrote: >> > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik >> > wrote: >> > > pg_wal_prefetch function will infinitely traverse WAL an

Re: WAL prefetch

2018-06-15 Thread Amit Kapila
On Fri, Jun 15, 2018 at 8:45 PM, Konstantin Knizhnik wrote: > > On 15.06.2018 18:03, Amit Kapila wrote: > > wal_prefetch is prefetching blocks referenced by WAL records. But in case of > "full page writes" such prefetch is not needed and even is harmful. > Okay, IIUC, the basic idea is to prefetc

Re: WAL prefetch

2018-06-15 Thread Andres Freund
On 2018-06-14 10:13:44 +0300, Konstantin Knizhnik wrote: > > > On 14.06.2018 09:52, Thomas Munro wrote: > > On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik > > wrote: > > > pg_wal_prefetch function will infinitely traverse WAL and prefetch block > > > references in WAL records > > > using p

Re: WAL prefetch

2018-06-15 Thread Konstantin Knizhnik
On 15.06.2018 18:03, Amit Kapila wrote: On Fri, Jun 15, 2018 at 1:08 PM, Konstantin Knizhnik wrote: On 15.06.2018 07:36, Amit Kapila wrote: On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost wrote: I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb NVME RAID 10 storage de

Re: WAL prefetch

2018-06-15 Thread Amit Kapila
On Fri, Jun 15, 2018 at 1:08 PM, Konstantin Knizhnik wrote: > > > On 15.06.2018 07:36, Amit Kapila wrote: >> >> On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost >> wrote: I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb NVME RAID 10 storage device and 256Gb

Re: WAL prefetch

2018-06-15 Thread Konstantin Knizhnik
On 15.06.2018 07:36, Amit Kapila wrote: On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost wrote: I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb NVME RAID 10 storage device and 256Gb of RAM connected using InfiniBand. The speed of synchronous replication between two nodes

Re: WAL prefetch

2018-06-14 Thread Amit Kapila
On Fri, Jun 15, 2018 at 12:16 AM, Stephen Frost wrote: > >> I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb NVME >> RAID 10 storage device and 256Gb of RAM connected using InfiniBand. >> The speed of synchronous replication between two nodes is increased from 56k >> TPS to 60

Re: WAL prefetch

2018-06-14 Thread Stephen Frost
Greetings, * Konstantin Knizhnik (k.knizh...@postgrespro.ru) wrote: > There was very interesting presentation at pgconf about pg_prefaulter: > > http://www.pgcon.org/2018/schedule/events/1204.en.html I agree and I've chatted a bit w/ Sean further about it. > But it is implemented in GO and usin

Re: WAL prefetch

2018-06-14 Thread Konstantin Knizhnik
On 14.06.2018 16:25, Robert Haas wrote: On Thu, Jun 14, 2018 at 9:23 AM, Konstantin Knizhnik wrote: Speed of random HDD access is limited by speed of disk head movement. By running several IO requests in parallel we just increase probability of head movement, so actually parallel access to H

Re: WAL prefetch

2018-06-14 Thread Robert Haas
On Thu, Jun 14, 2018 at 9:23 AM, Konstantin Knizhnik wrote: > Speed of random HDD access is limited by speed of disk head movement. > By running several IO requests in parallel we just increase probability of > head movement, so actually parallel access to HDD may even decrease IO speed > rather t

Re: WAL prefetch

2018-06-14 Thread Konstantin Knizhnik
On 14.06.2018 15:44, Robert Haas wrote: On Wed, Jun 13, 2018 at 11:45 PM, Amit Kapila wrote: I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb NVME RAID 10 storage device and 256Gb of RAM connected using InfiniBand. The speed of synchronous replication between two nodes i

Re: WAL prefetch

2018-06-14 Thread Amit Kapila
On Thu, Jun 14, 2018 at 6:14 PM, Robert Haas wrote: > On Wed, Jun 13, 2018 at 11:45 PM, Amit Kapila wrote: >>> I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb NVME >>> RAID 10 storage device and 256Gb of RAM connected using InfiniBand. >>> The speed of synchronous replicatio

Re: WAL prefetch

2018-06-14 Thread Robert Haas
On Wed, Jun 13, 2018 at 11:45 PM, Amit Kapila wrote: >> I have tested wal_prefetch at two powerful servers with 24 cores, 3Tb NVME >> RAID 10 storage device and 256Gb of RAM connected using InfiniBand. >> The speed of synchronous replication between two nodes is increased from 56k >> TPS to 60k TP

Re: WAL prefetch

2018-06-14 Thread Konstantin Knizhnik
On 14.06.2018 09:52, Thomas Munro wrote: On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik wrote: pg_wal_prefetch function will infinitely traverse WAL and prefetch block references in WAL records using posix_fadvise(WILLNEED) system call. Hi Konstantin, Why stop at the page cache... w

Re: WAL prefetch

2018-06-13 Thread Thomas Munro
On Thu, Jun 14, 2018 at 1:09 AM, Konstantin Knizhnik wrote: > pg_wal_prefetch function will infinitely traverse WAL and prefetch block > references in WAL records > using posix_fadvise(WILLNEED) system call. Hi Konstantin, Why stop at the page cache... what about shared buffers? -- Thomas Mun

Re: WAL prefetch

2018-06-13 Thread Amit Kapila
On Wed, Jun 13, 2018 at 6:39 PM, Konstantin Knizhnik wrote: > There was very interesting presentation at pgconf about pg_prefaulter: > > http://www.pgcon.org/2018/schedule/events/1204.en.html > > But it is implemented in GO and using pg_waldump. > I tried to do the same but using built-on Postgres