Hi

We had a failover situation where our monitoring watchdog processes promoted 
the slave to become the new master.
I restarted the old master database to ensure a clean stop/start and performed 
pg_rewind on the old master to resync with the new master. However, after 
successful rewind, there was an error restarting the new slave.
The steps I took were:

1.       Stop all watchdogs

2.       Start/stop the old master

3.       Run 'checkpoint' on new master

4.       Run the pg_rewind on old master to resync with new master

5.       Start the old master (as new slave)

Step 4 pg_rewind was successful with the new slave rewind to the same new 
timeline of the new master, however during the restart of the new slave it 
failed to start with the following errors:

80) FATAL:  the database system is starting up
cp: cannot stat '/pg_backup/backup/archive_sync/0000000400000383000000BF': No 
such file or directory
cp: cannot stat '/pg_backup/backup/archive_sync/0000000300000383000000BF': No 
such file or directory
cp: cannot stat '/pg_backup/backup/archive_sync/0000000200000383000000BF': No 
such file or directory
cp: cannot stat '/pg_backup/backup/archive_sync/0000000100000383000000BF': No 
such file or directory
2018-01-11 23:21:59 ACDT [112235]: [1-1] db=,user= app=,host= LOG:  started 
streaming WAL from primary at
383/BE000000 on timeline 6
2018-01-11 23:21:59 ACDT [112235]: [2-1] db=,user= app=,host= FATAL:  could not 
receive data from WAL stre
am: ERROR:  requested WAL segment 0000000600000383000000BE has already been 
removed

I checked the both the archive and pg_xlog directories on the new master and 
cannot locate missing file.

Has anyone experience this before with pg_rewind?

The earliest wall files in the archive directory was around just after the 
failover occurred.

Eg, in the archive directory on the new Master:
$ ls -l
total 15745032
-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 
0000000500000383000000C0.partial
-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C0
-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C1
-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C2
-rw-------. 1 postgres postgres 16777216 Jan 11 17:52 0000000600000383000000C

And on the pg_xlog directory on the new Master:
-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000080
-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000081
-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000082
-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000083
-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000084
-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000085
-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000086
-rw-------. 1 postgres postgres 16777216 Jan 11 18:57 000000060000038500000087

Thanks
Dylan

Reply via email to