On Thu, Sep 01, 2022 at 04:22:20PM +1200, Thomas Munro wrote:
> On Thu, Sep 1, 2022 at 3:08 PM Kyotaro Horiguchi
> <horikyota....@gmail.com> wrote:
> > At Thu, 1 Sep 2022 12:05:36 +1200, Thomas Munro <thomas.mu...@gmail.com> 
> > wrote in
> > > On Thu, Sep 1, 2022 at 2:01 AM Justin Pryzby <pry...@telsasoft.com> wrote:
> > > > < 2022-08-31 08:44:10.495 CDT  >LOG:  checkpoint starting: 
> > > > end-of-recovery immediate wait
> > > > < 2022-08-31 08:44:10.609 CDT  >LOG:  request to flush past end of 
> > > > generated WAL; request 1201/1CAF84F0, current position 1201/1CADB730
> > > > < 2022-08-31 08:44:10.609 CDT  >CONTEXT:  writing block 0 of relation 
> > > > base/16881/2840_vm
> > > > < 2022-08-31 08:44:10.609 CDT  >ERROR:  xlog flush request 
> > > > 1201/1CAF84F0 is not satisfied --- flushed only to 1201/1CADB730
> > > > < 2022-08-31 08:44:10.609 CDT  >CONTEXT:  writing block 0 of relation 
> > > > base/16881/2840_vm
> > > > < 2022-08-31 08:44:10.609 CDT  >FATAL:  checkpoint request failed
> > > >
> > > > I was able to start it with -c recovery_prefetch=no, so it seems like
> > > > prefetch tried to do too much.  The VM runs centos7 under qemu.
> > > > I'm making a copy of the data dir in cases it's needed.
> >
> > Just for information, there was a fixed bug about
> > overwrite-aborted-contrecord feature, which causes this kind of
> > failure (xlog flush request exceeds insertion bleeding edge). If it is
> > that, it has been fixed by 6672d79139 two-days ago.
> 
> Hmm.  Justin, when you built from source, which commit were you at?
> If it's REL_15_BETA3,

No - it's:
commit a2039b1f8e90d26a7e2a115ad5784476bd6deaa2 (HEAD -> REL_15_STABLE, 
origin/REL_15_STABLE)
  
> If it's REL_15_BETA3, any chance you could cherry pick that change and
> check what happens?  And without that, could you show what this logs
> And without that, could you show what this logs
> for good and bad recovery settings?

I wasn't sure what mean by "without that" , so here's a bunch of logs to
sift through:

At a203, with #define XLOGPREFETCHER_DEBUG_LEVEL NOTICE:

[pryzbyj@template0 postgresql]$ sudo -u postgres 
./tmp_install/usr/local/pgsql/bin/postgres -D /mnt/tmp/15/data -c 
logging_collector=no -c port=5678
...
< 2022-08-31 23:31:38.690 CDT  >LOG:  redo starts at 1201/1B931F50
< 2022-08-31 23:31:40.204 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958212 from block 156 until 1201/1C3965A0 is replayed, which 
truncates the relation
< 2022-08-31 23:31:40.307 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958523 from block 23 until 1201/1C39CC98 is replayed, which 
truncates the relation
< 2022-08-31 23:31:40.493 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958523 from block 23 until 1201/1C8643C8 is replayed, because the 
relation is too small
< 2022-08-31 23:31:40.721 CDT  >LOG:  redo done at 1201/1CADB300 system usage: 
CPU: user: 0.41 s, system: 0.23 s, elapsed: 2.03 s
< 2022-08-31 23:31:41.452 CDT  >LOG:  checkpoint starting: end-of-recovery 
immediate wait
< 2022-08-31 23:31:41.698 CDT  >LOG:  request to flush past end of generated 
WAL; request 1201/1CAF84F0, current position 1201/1CADB730
< 2022-08-31 23:31:41.698 CDT  >CONTEXT:  writing block 0 of relation 
base/16881/2840_vm
< 2022-08-31 23:31:41.698 CDT  >ERROR:  xlog flush request 1201/1CAF84F0 is not 
satisfied --- flushed only to 1201/1CADB730
< 2022-08-31 23:31:41.698 CDT  >CONTEXT:  writing block 0 of relation 
base/16881/2840_vm
< 2022-08-31 23:31:41.699 CDT  >FATAL:  checkpoint request failed
< 2022-08-31 23:31:41.699 CDT  >HINT:  Consult recent messages in the server 
log for details.
< 2022-08-31 23:31:41.704 CDT  >LOG:  startup process (PID 25046) exited with 
exit code 1
< 2022-08-31 23:31:41.704 CDT  >LOG:  terminating any other active server 
processes
< 2022-08-31 23:31:41.705 CDT  >LOG:  shutting down due to startup process 
failure
< 2022-08-31 23:31:41.731 CDT  >LOG:  database system is shut down

With your patch:

[pryzbyj@template0 postgresql]$ sudo -u postgres 
./tmp_install/usr/local/pgsql/bin/postgres -D /mnt/tmp/15/data -c 
logging_collector=no -c port=5678
...
< 2022-08-31 23:34:22.897 CDT  >LOG:  redo starts at 1201/1B931F50
< 2022-08-31 23:34:23.146 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958212 from block 156 until 1201/1C3965A0 is replayed, which 
truncates the relation
< 2022-08-31 23:34:23.147 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958523 from block 23 until 1201/1C39CC98 is replayed, which 
truncates the relation
< 2022-08-31 23:34:23.268 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958523 from block 23 until 1201/1C8643C8 is replayed, because the 
relation is too small
< 2022-08-31 23:34:23.323 CDT  >LOG:  redo done at 1201/1CADB300 system usage: 
CPU: user: 0.29 s, system: 0.12 s, elapsed: 0.42 s
< 2022-08-31 23:34:23.323 CDT  >LOG:  point 0: lastRec = 12011cadb300
< 2022-08-31 23:34:23.323 CDT  >LOG:  point 0: endOfLog = 12011cadb730
< 2022-08-31 23:34:23.323 CDT  >LOG:  XXX point 1: EndOfLog = 12011cadb730
< 2022-08-31 23:34:23.386 CDT  >LOG:  XXX point 2: EndOfLog = 12011cadb730
< 2022-08-31 23:34:23.387 CDT  >LOG:  XXX point 3: Insert->CurrBytePos = 
11f39ab82310
< 2022-08-31 23:34:23.565 CDT  >LOG:  XXX point 4: Insert->CurrBytePos = 
11f39ab82310
< 2022-08-31 23:34:23.606 CDT  >LOG:  checkpoint starting: end-of-recovery 
immediate wait
< 2022-08-31 23:34:23.767 CDT  >LOG:  request to flush past end of generated 
WAL; request 1201/1CAF84F0, current position 1201/1CADB730
< 2022-08-31 23:34:23.767 CDT  >CONTEXT:  writing block 0 of relation 
base/16881/2840_vm
< 2022-08-31 23:34:23.767 CDT  >ERROR:  xlog flush request 1201/1CAF84F0 is not 
satisfied --- flushed only to 1201/1CADB730
< 2022-08-31 23:34:23.767 CDT  >CONTEXT:  writing block 0 of relation 
base/16881/2840_vm
< 2022-08-31 23:34:23.768 CDT  >FATAL:  checkpoint request failed

And without prefetch:

[pryzbyj@template0 postgresql]$ sudo -u postgres 
./tmp_install/usr/local/pgsql/bin/postgres -D /mnt/tmp/15/data -c 
logging_collector=no -c port=5678 -c recovery_prefetch=no
...
< 2022-08-31 23:37:08.792 CDT  >LOG:  redo starts at 1201/1B931F50
< 2022-08-31 23:37:09.269 CDT  >LOG:  invalid record length at 1201/1CD91010: 
wanted 24, got 0
< 2022-08-31 23:37:09.269 CDT  >LOG:  redo done at 1201/1CD90E48 system usage: 
CPU: user: 0.35 s, system: 0.11 s, elapsed: 0.47 s
< 2022-08-31 23:37:09.269 CDT  >LOG:  point 0: lastRec = 12011cd90e48
< 2022-08-31 23:37:09.269 CDT  >LOG:  point 0: endOfLog = 12011cd91010
< 2022-08-31 23:37:09.269 CDT  >LOG:  XXX point 1: EndOfLog = 12011cd91010
< 2022-08-31 23:37:09.350 CDT  >LOG:  XXX point 2: EndOfLog = 12011cd91010
< 2022-08-31 23:37:09.350 CDT  >LOG:  XXX point 3: Insert->CurrBytePos = 
11f39ae35b68
< 2022-08-31 23:37:09.420 CDT  >LOG:  XXX point 4: Insert->CurrBytePos = 
11f39ae35b68
< 2022-08-31 23:37:09.552 CDT  >LOG:  checkpoint starting: end-of-recovery 
immediate wait
< 2022-08-31 23:37:12.987 CDT  >LOG:  checkpoint complete: wrote 8030 buffers 
(49.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.403 s, sync=2.841 
s, total=3.566 s; sync files=102, longest=2.808 s, average=0.028 s; 
distance=20860 kB, estimate=20860 kB
< 2022-08-31 23:37:13.077 CDT  >LOG:  database system is ready to accept 
connections

If I revert 6672d79139 (and roll back to the unrecovered state):

[pryzbyj@template0 postgresql]$ sudo -u postgres 
./tmp_install/usr/local/pgsql/bin/postgres -D /mnt/tmp/15/data -c 
logging_collector=no -c port=5678 # -c recovery_prefetch=no
...
< 2022-08-31 23:42:40.592 CDT  >LOG:  redo starts at 1201/1B931F50
< 2022-08-31 23:42:42.168 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958212 from block 156 until 1201/1C3965A0 is replayed, which 
truncates the relation
< 2022-08-31 23:42:42.238 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958523 from block 23 until 1201/1C39CC98 is replayed, which 
truncates the relation
< 2022-08-31 23:42:42.405 CDT  >NOTICE:  suppressing prefetch in relation 
1663/16888/165958523 from block 23 until 1201/1C8643C8 is replayed, because the 
relation is too small
< 2022-08-31 23:42:42.602 CDT  >LOG:  redo done at 1201/1CADB300 system usage: 
CPU: user: 0.41 s, system: 0.25 s, elapsed: 2.01 s
< 2022-08-31 23:42:42.602 CDT  >LOG:  point 0: lastRec = 12011cadb300
< 2022-08-31 23:42:42.602 CDT  >LOG:  point 0: endOfLog = 12011cadb730
< 2022-08-31 23:42:42.602 CDT  >LOG:  XXX point 1: EndOfLog = 12011cadb730
< 2022-08-31 23:42:42.830 CDT  >LOG:  XXX point 2: EndOfLog = 12011cadb730
< 2022-08-31 23:42:42.830 CDT  >LOG:  XXX point 3: Insert->CurrBytePos = 
11f39ab82310
< 2022-08-31 23:42:43.194 CDT  >LOG:  XXX point 4: Insert->CurrBytePos = 
11f39ab82310
< 2022-08-31 23:42:43.266 CDT  >LOG:  checkpoint starting: end-of-recovery 
immediate wait
< 2022-08-31 23:42:43.425 CDT  >LOG:  request to flush past end of generated 
WAL; request 1201/1CAF84F0, current position 1201/1CADB730
< 2022-08-31 23:42:43.425 CDT  >CONTEXT:  writing block 0 of relation 
base/16881/2840_vm
< 2022-08-31 23:42:43.425 CDT  >ERROR:  xlog flush request 1201/1CAF84F0 is not 
satisfied --- flushed only to 1201/1CADB730
< 2022-08-31 23:42:43.425 CDT  >CONTEXT:  writing block 0 of relation 
base/16881/2840_vm
< 2022-08-31 23:42:43.425 CDT  >FATAL:  checkpoint request failed
< 2022-08-31 23:42:43.425 CDT  >HINT:  Consult recent messages in the server 
log for details.
< 2022-08-31 23:42:43.431 CDT  >LOG:  startup process (PID 2415) exited with 
exit code 1
< 2022-08-31 23:42:43.431 CDT  >LOG:  terminating any other active server 
processes
< 2022-08-31 23:42:43.432 CDT  >LOG:  shutting down due to startup process 
failure
< 2022-08-31 23:42:43.451 CDT  >LOG:  database system is shut down

If I revert 6672d79139 and disable prefetch:

[pryzbyj@template0 postgresql]$ sudo -u postgres 
./tmp_install/usr/local/pgsql/bin/postgres -D /mnt/tmp/15/data -c 
logging_collector=no -c port=5678 -c recovery_prefetch=no
...
< 2022-08-31 23:43:25.711 CDT  >LOG:  redo starts at 1201/1B931F50
< 2022-08-31 23:43:26.178 CDT  >LOG:  invalid record length at 1201/1CD91010: 
wanted 24, got 0
< 2022-08-31 23:43:26.178 CDT  >LOG:  redo done at 1201/1CD90E48 system usage: 
CPU: user: 0.33 s, system: 0.11 s, elapsed: 0.46 s
< 2022-08-31 23:43:26.178 CDT  >LOG:  point 0: lastRec = 12011cd90e48
< 2022-08-31 23:43:26.178 CDT  >LOG:  point 0: endOfLog = 12011cd91010
< 2022-08-31 23:43:26.178 CDT  >LOG:  XXX point 1: EndOfLog = 12011cd91010
< 2022-08-31 23:43:26.369 CDT  >LOG:  XXX point 2: EndOfLog = 12011cd91010
< 2022-08-31 23:43:26.369 CDT  >LOG:  XXX point 3: Insert->CurrBytePos = 
11f39ae35b68
< 2022-08-31 23:43:26.433 CDT  >LOG:  XXX point 4: Insert->CurrBytePos = 
11f39ae35b68
< 2022-08-31 23:43:26.490 CDT  >LOG:  checkpoint starting: end-of-recovery 
immediate wait
< 2022-08-31 23:43:29.519 CDT  >LOG:  checkpoint complete: wrote 8030 buffers 
(49.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.380 s, sync=2.492 
s, total=3.086 s; sync files=102, longest=2.438 s, average=0.025 s; 
distance=20860 kB, estimate=20860 kB
< 2022-08-31 23:43:29.567 CDT  >LOG:  database system is ready to accept 
connections



Reply via email to