I have master and slave running with the following contents of their pg_wal
directories and archivedir:

ls -l /mnt/pgsql/archive/
-rw-rw-rw-. 1 root root 16777216 Feb 15 09:39 000000010000000000000001
-rw-rw-rw-. 1 root root 16777216 Feb 15 09:39 000000010000000000000002
-rw-rw-rw-. 1 root root      302 Feb 15 09:39
000000010000000000000002.00000028.backup

pg-hdp-node1.kitchen.local
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres postgres 16777216 Feb 15 09:39
000000010000000000000002
-rw-------. 1 postgres postgres      302 Feb 15 09:39
000000010000000000000002.00000028.backup
-rw-------. 1 postgres postgres 16777216 Feb 15 09:44
000000010000000000000003
-rw-------. 1 postgres postgres 16777216 Feb 15 09:39
000000010000000000000004
drwx------. 2 postgres postgres       96 Feb 15 09:44 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:
-rw-------. 1 postgres postgres 0 Feb 15 09:39
000000010000000000000002.00000028.backup.done
-rw-------. 1 postgres postgres 0 Feb 15 09:39 000000010000000000000002.done

pg-hdp-node2.kitchen.local
/var/lib/pgsql/10/data/pg_wal/:
-rw-------. 1 postgres root     16777216 Feb 15 09:39
000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Feb 15 09:44
000000010000000000000003
drwx------. 2 postgres root            6 Feb 15 09:39 archive_status
/var/lib/pgsql/10/data/pg_wal/archive_status:

diff from secondary pg-hdp-node2.kitchen.local on
/var/lib/pgsql/10/data/pg_wal/000000010000000000000002 and
/mnt/pgsql/archive/000000010000000000000002 shows binary differences but as
expected no differences for diff on primary pg-hdp-node1.kitchen.local

Failover is performed and pg-hdp-node2.kitchen.local tries and fails to
archive WAL segment 000000010000000000000002 because it has been previously
archived
2019-02-15 09:54:50.518 PST [780] DETAIL:  The failed archive command was:
test ! -f /mnt/pgsql/archive/000000010000000000000002 && cp
pg_wal/000000010000000000000002 /mnt/pgsql/archive/000000010000000000000002

Based on this thread
https://www.postgresql.org/message-id/11b405a6-2176-9510-bf5b-ea9c0e860635%40pgmasters.net
it is suggested to handle this case by reporting success but in my case
contents are different. I would think that previously archived
000000010000000000000002 is the right WAL segment.

So my questions are as follows:

Why WAL segments differ?
How should this be resolved on the new primary?
-- 

*Andre Piwoni*

Reply via email to