On Fri, Aug 26, 2022 at 6:14 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Tue, Aug 23, 2022 at 12:06 AM Robert Haas <robertmh...@gmail.com> wrote: > > > However, if anything > > did try to look at file #4 it would get confused. Maybe that can > > happen if this is a streaming standby, where we only write an > > end-of-recovery record upon promotion, rather than a checkpoint, or > > maybe if there are cascading standbys someone could try to actually > > use the 000000020000000000000004 file for something. I'm not sure. But > > unless I'm missing something, that file is bogus, and our only hope of > > not having problems is that perhaps no one will ever look at it.
I tried to see the problem with the cascading standby, basically the setup is like below pgprimary->pgstandby(archive only)->pgcascade(streaming + archive). The second node has to be archive only because this 0 filled gap is created in archive only mode. With that I have noticed that the when cascading standby is getting that 0 filled gap it report same error what we seen with pg_waldump and that it keep waiting forever on that file. I have attached a test case, but I think timing is not done perfectly in this test so before the cascading standby setup some of the WAL file get removed by the pgstandby so I just put direct return in RemoveOldXlogFiles() to test this[2]. And this problem is getting resolved with the patch given by Robert upthread. [1] 2022-08-25 16:21:26.413 IST [18235] LOG: invalid record length at 0/FFFFEA8: wanted 24, got 0 [2] diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index eb5115f..990a879 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3558,6 +3558,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr lastredoptr, XLogRecPtr endptr, XLogSegNo endlogSegNo; XLogSegNo recycleSegNo; + return; /* Initialize info about where to try to recycle to */ XLByteToSeg(endptr, endlogSegNo, wal_segment_size); recycleSegNo = XLOGfileslop(lastredoptr); -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
#!/bin/bash PG_PRIMARY_PORT=5432 PG_STANDBY_PORT=5433 PG_CASCADE_PORT=5434 # Cleanup anything left over from previous runs. for d in pgprimary pgstandby pgcascade; do if test -d $d; then pg_ctl stop -D $d; rm -rf $d fi rm -f $d.log done rm -rf wal_archive rm -rf wal_archive1 # Echo commands from this point onward and exit on failure. set -ex # Create empty WAL archive. mkdir wal_archive mkdir wal_archive1 # Initialize and start primary. # enable archiving initdb -D pgprimary cat >> pgprimary/postgresql.auto.conf <<EOM port=$PG_PRIMARY_PORT archive_mode=on archive_command='cp %p `pwd`/wal_archive/%f' EOM pg_ctl start -D pgprimary -l pgprimary.log # Create archiving only standby pg_basebackup -D pgstandby -d "port=$PG_PRIMARY_PORT" cat >> pgstandby/postgresql.auto.conf <<EOM port=$PG_STANDBY_PORT restore_command='cp `pwd`/wal_archive/%f %p' archive_command='cp %p `pwd`/wal_archive1/%f' log_min_messages=DEBUG2 EOM touch pgstandby/recovery.signal # Generate a lot of WAL so that standby takes time to come out of archive recovery psql -d postgres -c "create table test1(a varchar);" psql -d postgres -c "insert into test1 select repeat('a', 1900) from generate_series(1,100000);" #start the standby pg_ctl -D pgstandby -c -l pgstandby.log -c start #start cascade standby pg_basebackup -D pgcascade -R -d "port=$PG_STANDBY_PORT" cat >> pgcascade/postgresql.auto.conf <<EOM port=$PG_CASCADE_PORT restore_command='cp `pwd`/wal_archive1/%f %p' EOM pg_ctl -D pgcascade -c -l pgcascade.log -c start #create table and switch the wal segment #this will create 0 filled gap in WAL file 00000002000000000000000F on pgstandby which will be sent to pgcascade psql -d postgres -c "create table test(a varchar);select pg_switch_wal();" #write enough record so that it switch the segment at the mid of the last wal record psql -d postgres -c "insert into test select repeat('a', 1900) from generate_series(1,8535);" sleep 2 pgbench -i -s 1 -p 5433 postgres