Re: Use fadvise in wal replay

2023-04-08 Thread Gregory Stark (as CFM)
On Thu, 19 Jan 2023 at 18:19, Andres Freund wrote: > > On 2023-01-19 22:19:10 +0100, Tomas Vondra wrote: > > > So I'm a bit unsure about this patch. I doesn't seem like it can perform > > better than read-ahead (although perhaps it does, on a different storage > > system). > > I really don't see t

Re: Use fadvise in wal replay

2023-01-19 Thread Andres Freund
Hi, On 2023-01-19 22:19:10 +0100, Tomas Vondra wrote: > So I'm a bit unsure about this patch. I doesn't seem like it can perform > better than read-ahead (although perhaps it does, on a different storage > system). I really don't see the point of the patch as-is. It's not going to help OSs withou

Re: Use fadvise in wal replay

2023-01-19 Thread Tomas Vondra
Hi, I looked at this patch today. The change is fairly simple, so I decided to do a benchmark. To prepare, I created a cluster with a 1GB database, created a backup, and ran 1h UPDATE workload with WAL archiving. Then, the actual benchmark does this: 1. restore the datadir backup 2. copy the WAL

Re: Use fadvise in wal replay

2022-11-27 Thread Andrey Borodin
On Fri, Nov 25, 2022 at 1:12 PM Pavel Borisov wrote: > > As I've written up in the thread we can not gain much from this > optimization. The results of Jakub shows around 2% difference: > > >baseline, master, default Linux readahead (128kb): > >33.979, 0.478 > >35.137, 0.504 > >34.649, 0.518> > >

Re: Use fadvise in wal replay

2022-11-25 Thread Pavel Borisov
On Sat, 26 Nov 2022 at 01:10, Pavel Borisov wrote: > > Hi, hackers! > > On Sun, 13 Nov 2022 at 02:02, Andrey Borodin wrote: > > > > On Sun, Aug 7, 2022 at 9:41 AM Andrey Borodin wrote: > > > > > > > Hi everyone. The patch is 16 lines, looks harmless and with proven > > benefits. I'm moving this

Re: Use fadvise in wal replay

2022-11-25 Thread Pavel Borisov
Hi, hackers! On Sun, 13 Nov 2022 at 02:02, Andrey Borodin wrote: > > On Sun, Aug 7, 2022 at 9:41 AM Andrey Borodin wrote: > > > > Hi everyone. The patch is 16 lines, looks harmless and with proven > benefits. I'm moving this into RfC. As I've written up in the thread we can not gain much from t

Re: Use fadvise in wal replay

2022-11-12 Thread Andrey Borodin
On Sun, Aug 7, 2022 at 9:41 AM Andrey Borodin wrote: > Hi everyone. The patch is 16 lines, looks harmless and with proven benefits. I'm moving this into RfC. Thanks! Best regards, Andrey Borodin.

Re: Use fadvise in wal replay

2022-08-07 Thread Andrey Borodin
> On 7 Aug 2022, at 06:39, Bharath Rupireddy > wrote: > > Agree. Why can't we just prefetch the entire WAL file once whenever it > is opened for the first time? Does the OS have any limitations on max > size to prefetch at once? It may sound aggressive, but it avoids > fadvise() system calls,

Re: Use fadvise in wal replay

2022-08-06 Thread Bharath Rupireddy
On Sat, Aug 6, 2022 at 10:53 AM Andrey Borodin wrote: > > Hi Bharath, > > thank you for the suggestion. > > > On 5 Aug 2022, at 16:02, Bharath Rupireddy > > wrote: > > > > On Thu, Aug 4, 2022 at 9:48 PM Andrey Borodin wrote: > >> > >>> On 18 Jul 2022, at 22:55, Robert Haas wrote: > >>> > >>> O

Re: Use fadvise in wal replay

2022-08-05 Thread Andrey Borodin
Hi Bharath, thank you for the suggestion. > On 5 Aug 2022, at 16:02, Bharath Rupireddy > wrote: > > On Thu, Aug 4, 2022 at 9:48 PM Andrey Borodin wrote: >> >>> On 18 Jul 2022, at 22:55, Robert Haas wrote: >>> >>> On Thu, Jun 23, 2022 at 5:49 AM Jakub Wartak >>> wrote: > > I have a funda

Re: Use fadvise in wal replay

2022-08-05 Thread Bharath Rupireddy
On Thu, Aug 4, 2022 at 9:48 PM Andrey Borodin wrote: > > > On 18 Jul 2022, at 22:55, Robert Haas wrote: > > > > On Thu, Jun 23, 2022 at 5:49 AM Jakub Wartak > > wrote: I have a fundamental question on the overall idea - How beneficial it will be if the process that's reading the current WAL pa

Re: Use fadvise in wal replay

2022-08-04 Thread Andrey Borodin
> On 18 Jul 2022, at 22:55, Robert Haas wrote: > > On Thu, Jun 23, 2022 at 5:49 AM Jakub Wartak wrote: >> Cool. As for GUC I'm afraid there's going to be resistance of adding yet >> another GUC (to avoid many knobs). Ideally it would be nice if we had some >> advanced/deep/hidden parameters

Re: Use fadvise in wal replay

2022-07-18 Thread Robert Haas
On Thu, Jun 23, 2022 at 5:49 AM Jakub Wartak wrote: > Cool. As for GUC I'm afraid there's going to be resistance of adding yet > another GUC (to avoid many knobs). Ideally it would be nice if we had some > advanced/deep/hidden parameters , but there isn't such thing. > Maybe another option would

Re: Use fadvise in wal replay

2022-07-18 Thread Andrey Borodin
> On 23 Jun 2022, at 12:50, Jakub Wartak wrote: > > Thoughts? I've looked into the patch one more time. And I propose to change this line + posix_fadvise(readFile, readOff + RACHUNK, RACHUNK, POSIX_FADV_WILLNEED); to + posix_fadvise(readFile, readOff + XLOG_BLCKSZ

Re: Use fadvise in wal replay

2022-06-23 Thread Justin Pryzby
On Thu, Jun 23, 2022 at 09:49:31AM +, Jakub Wartak wrote: > it would be nice if we had some advanced/deep/hidden parameters , but there > isn't such thing. There's DEVELOPER_OPTIONS gucs, although I don't know if this is a good fit for that. -- Justin

RE: Use fadvise in wal replay

2022-06-23 Thread Jakub Wartak
Hey Andrey, > > 23 июня 2022 г., в 13:50, Jakub Wartak > написал(а): > > > > Thoughts? > The patch leaves 1st 128KB chunk unprefetched. Does it worth to add and extra > branch for 120KB after 1st block when readOff==0? > Or maybe do > + posix_fadvise(readFile, readOff + XLOG_BLCKSZ, R

Re: Use fadvise in wal replay

2022-06-23 Thread Andrey Borodin
> 23 июня 2022 г., в 13:50, Jakub Wartak написал(а): > > Thoughts? The patch leaves 1st 128KB chunk unprefetched. Does it worth to add and extra branch for 120KB after 1st block when readOff==0? Or maybe do + posix_fadvise(readFile, readOff + XLOG_BLCKSZ, RACHUNK, POSIX_FADV_WI

RE: Use fadvise in wal replay

2022-06-23 Thread Jakub Wartak
>> > On 21 Jun 2022, at 16:59, Jakub Wartak wrote: >> Oh, wow, your benchmarks show really impressive improvement. >> >> > I think that 1 additional syscall is not going to be cheap just for >> > non-standard OS configurations >> Also we can reduce number of syscalls by something like >> >> #if

Re: Use fadvise in wal replay

2022-06-22 Thread Andrey Borodin
> On 22 Jun 2022, at 13:26, Pavel Borisov wrote: > > Then I'd guess that your speedup is due to speeding up the first several Mb's > in many files opened I think in this case Thomas' aproach of prefetching next WAL segment would do better. But Jakub observed opposite results. Best regards,

Re: Use fadvise in wal replay

2022-06-22 Thread Pavel Borisov
On Wed, Jun 22, 2022 at 2:07 PM Andrey Borodin wrote: > > > > On 21 Jun 2022, at 20:52, Pavel Borisov wrote: > > > > > On 21 Jun 2022, at 16:59, Jakub Wartak > wrote: > > Oh, wow, your benchmarks show really impressive improvement. > > > > FWIW I was trying to speedup long sequential file reads

Re: Use fadvise in wal replay

2022-06-22 Thread Andrey Borodin
> On 21 Jun 2022, at 20:52, Pavel Borisov wrote: > > > On 21 Jun 2022, at 16:59, Jakub Wartak wrote: > Oh, wow, your benchmarks show really impressive improvement. > > FWIW I was trying to speedup long sequential file reads in Postgres using > fadvise hints. I've found no detectable improve

Re: Use fadvise in wal replay

2022-06-21 Thread Pavel Borisov
> > > On 21 Jun 2022, at 16:59, Jakub Wartak wrote: > Oh, wow, your benchmarks show really impressive improvement. > FWIW I was trying to speedup long sequential file reads in Postgres using fadvise hints. I've found no detectable improvements. Then I've written 1Mb - 1Gb sequential read test wit

Re: Use fadvise in wal replay

2022-06-21 Thread Andrey Borodin
> On 21 Jun 2022, at 16:59, Jakub Wartak wrote: Oh, wow, your benchmarks show really impressive improvement. > I think that 1 additional syscall is not going to be cheap just for > non-standard OS configurations Also we can reduce number of syscalls by something like #if defined(USE_POSIX_FA

RE: Use fadvise in wal replay

2022-06-21 Thread Jakub Wartak
> On Tue, Jun 21, 2022 at 10:33 PM Jakub Wartak > wrote: > > > > Maybe the important question is why would be readahead mechanism > > > > be > > > disabled in the first place via /sys | blockdev ? > > > > > > Because database should know better than OS which data needs to be > > > prefetched and w

Re: Use fadvise in wal replay

2022-06-21 Thread Amit Kapila
On Tue, Jun 21, 2022 at 5:41 PM Bharath Rupireddy wrote: > > On Tue, Jun 21, 2022 at 4:55 PM Amit Kapila wrote: > > > > On Tue, Jun 21, 2022 at 3:18 PM Andrey Borodin wrote: > > > > > > > On 21 Jun 2022, at 12:35, Amit Kapila wrote: > > > > > > > > I wonder if the newly introduced "recovery_pre

Re: Use fadvise in wal replay

2022-06-21 Thread Bharath Rupireddy
On Tue, Jun 21, 2022 at 4:55 PM Amit Kapila wrote: > > On Tue, Jun 21, 2022 at 3:18 PM Andrey Borodin wrote: > > > > > On 21 Jun 2022, at 12:35, Amit Kapila wrote: > > > > > > I wonder if the newly introduced "recovery_prefetch" [1] for PG-15 can > > > help your case? > > > > AFAICS recovery_pre

Re: Use fadvise in wal replay

2022-06-21 Thread Amit Kapila
On Tue, Jun 21, 2022 at 3:18 PM Andrey Borodin wrote: > > > On 21 Jun 2022, at 12:35, Amit Kapila wrote: > > > > I wonder if the newly introduced "recovery_prefetch" [1] for PG-15 can > > help your case? > > AFAICS recovery_prefetch tries to prefetch main fork, but does not try to > prefetch WAL

Re: Use fadvise in wal replay

2022-06-21 Thread Bharath Rupireddy
On Tue, Jun 21, 2022 at 4:22 PM Thomas Munro wrote: > > On Tue, Jun 21, 2022 at 10:33 PM Jakub Wartak wrote: > > > > Maybe the important question is why would be readahead mechanism be > > > disabled in the first place via /sys | blockdev ? > > > > > > Because database should know better than OS

Re: Use fadvise in wal replay

2022-06-21 Thread Thomas Munro
On Tue, Jun 21, 2022 at 10:33 PM Jakub Wartak wrote: > > > Maybe the important question is why would be readahead mechanism be > > disabled in the first place via /sys | blockdev ? > > > > Because database should know better than OS which data needs to be > > prefetched and which should not. Big O

RE: Use fadvise in wal replay

2022-06-21 Thread Jakub Wartak
> > Maybe the important question is why would be readahead mechanism be > disabled in the first place via /sys | blockdev ? > > Because database should know better than OS which data needs to be > prefetched and which should not. Big OS readahead affects index scan > performance. OK fair point, h

Re: Use fadvise in wal replay

2022-06-21 Thread Andrey Borodin
> On 21 Jun 2022, at 13:20, Jakub Wartak wrote: > > Maybe the important question is why would be readahead mechanism be disabled > in the first place via /sys | blockdev ? Because database should know better than OS which data needs to be prefetched and which should not. Big OS readahead af

RE: Use fadvise in wal replay

2022-06-21 Thread Jakub Wartak
>> > On 21 Jun 2022, at 12:35, Amit Kapila wrote: >> > >> > I wonder if the newly introduced "recovery_prefetch" [1] for PG-15 can >> > help your case? >> >> AFAICS recovery_prefetch tries to prefetch main fork, but does not try to >> prefetch WAL itself before reading it. Kirill is trying to sol

Re: Use fadvise in wal replay

2022-06-21 Thread Andrey Borodin
> On 21 Jun 2022, at 12:35, Amit Kapila wrote: > > I wonder if the newly introduced "recovery_prefetch" [1] for PG-15 can > help your case? AFAICS recovery_prefetch tries to prefetch main fork, but does not try to prefetch WAL itself before reading it. Kirill is trying to solve the problem o

Re: Use fadvise in wal replay

2022-06-21 Thread Amit Kapila
On Tue, Jun 21, 2022 at 1:07 PM Kirill Reshke wrote: > > Recently we faced a problem with one of our production clusters. We use a > cascade replication setup in this cluster, that is: master, standby (r1), and > cascade standby (r2). From time to time, the replication lag on r1 used to > grow,

Use fadvise in wal replay

2022-06-21 Thread Kirill Reshke
Hi hackers! Recently we faced a problem with one of our production clusters. We use a cascade replication setup in this cluster, that is: master, standby (r1), and cascade standby (r2). From time to time, the replication lag on r1 used to grow, while on r2 it did not. Analysys showed that r1 start