Re: standby promotion can create unreadable WAL

Dilip Kumar Mon, 29 Aug 2022 03:17:35 -0700

On Fri, Aug 26, 2022 at 6:14 PM Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> On Tue, Aug 23, 2022 at 12:06 AM Robert Haas <robertmh...@gmail.com> wrote:
> >
> However, if anything
> > did try to look at file #4 it would get confused. Maybe that can
> > happen if this is a streaming standby, where we only write an
> > end-of-recovery record upon promotion, rather than a checkpoint, or
> > maybe if there are cascading standbys someone could try to actually
> > use the 000000020000000000000004 file for something. I'm not sure. But
> > unless I'm missing something, that file is bogus, and our only hope of
> > not having problems is that perhaps no one will ever look at it.


I tried to see the problem with the cascading standby, basically the
setup is like below
pgprimary->pgstandby(archive only)->pgcascade(streaming + archive).

The second node has to be archive only because this 0 filled gap is
created in archive only mode.  With that I have noticed that the when
cascading standby is getting that 0 filled gap it report same error
what we seen with pg_waldump and that it keep waiting forever on that
file.  I have attached a test case, but I think timing is not done
perfectly in this test so before the cascading standby setup some of
the WAL file get removed by the pgstandby so I just put direct return
in RemoveOldXlogFiles() to test this[2].  And this problem is getting
resolved with the patch given by Robert upthread.

[1]
2022-08-25 16:21:26.413 IST [18235] LOG:  invalid record length at
0/FFFFEA8: wanted 24, got 0

[2]
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index eb5115f..990a879 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3558,6 +3558,7 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr
lastredoptr, XLogRecPtr endptr,
        XLogSegNo       endlogSegNo;
        XLogSegNo       recycleSegNo;

+       return;
        /* Initialize info about where to try to recycle to */
        XLByteToSeg(endptr, endlogSegNo, wal_segment_size);
        recycleSegNo = XLOGfileslop(lastredoptr);

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

#!/bin/bash

PG_PRIMARY_PORT=5432
PG_STANDBY_PORT=5433
PG_CASCADE_PORT=5434

# Cleanup anything left over from previous runs.
for d in pgprimary pgstandby pgcascade; do
    if test -d $d; then
		pg_ctl stop -D $d;
		rm -rf $d
	fi
	rm -f $d.log
done
rm -rf wal_archive
rm -rf wal_archive1
# Echo commands from this point onward and exit on failure.
set -ex

# Create empty WAL archive.
mkdir wal_archive
mkdir wal_archive1

# Initialize and start primary.
# enable archiving
initdb -D pgprimary
cat >> pgprimary/postgresql.auto.conf <<EOM
port=$PG_PRIMARY_PORT
archive_mode=on
archive_command='cp %p `pwd`/wal_archive/%f'
EOM
pg_ctl start -D pgprimary -l pgprimary.log

# Create archiving only standby
pg_basebackup -D pgstandby -d "port=$PG_PRIMARY_PORT"
cat >> pgstandby/postgresql.auto.conf <<EOM
port=$PG_STANDBY_PORT
restore_command='cp `pwd`/wal_archive/%f %p'
archive_command='cp %p `pwd`/wal_archive1/%f'
log_min_messages=DEBUG2
EOM
touch pgstandby/recovery.signal

# Generate a lot of WAL so that standby takes time to come out of archive recovery
psql -d postgres -c "create table test1(a varchar);"
psql -d postgres -c "insert into test1 select repeat('a', 1900) from generate_series(1,100000);"

#start the standby
pg_ctl -D pgstandby -c -l pgstandby.log -c start

#start cascade standby
pg_basebackup -D pgcascade -R -d "port=$PG_STANDBY_PORT"
cat >> pgcascade/postgresql.auto.conf <<EOM
port=$PG_CASCADE_PORT
restore_command='cp `pwd`/wal_archive1/%f %p'
EOM
pg_ctl -D pgcascade -c -l pgcascade.log -c start


#create table and switch the wal segment
#this will create 0 filled gap in WAL file 00000002000000000000000F on pgstandby which will be sent to pgcascade
psql -d postgres -c "create table test(a varchar);select pg_switch_wal();"
#write enough record so that it switch the segment at the mid of the last wal record
psql -d postgres -c "insert into test select repeat('a', 1900) from generate_series(1,8535);"
sleep 2

pgbench -i -s 1 -p 5433 postgres

Re: standby promotion can create unreadable WAL

Reply via email to