On Mon, 23 Sep 2019, 00:46 Shital A, <brightuser2...@gmail.com> wrote:
> > Hello, > > We have setup active-passive cluster using streaming replication on Rhe > 7.5. We are testing pacemaker for automated failover. > We are seeing below issues with the setup : > > 1. When a failoveris triggered when data is being added to the primary by > killing primary (killall -9 postgres), the standby doesnt come up in sync. > On pacemaker, the crm_mon -Afr shows standby in disconnected and HS:alone > state. > > On postgres, we see below error: > > < 2019-09-20 17:07:46.266 IST > LOG: entering standby mode > < 2019-09-20 17:07:46.267 IST > LOG: database system was not properly > shut down; automatic recovery in progress > < 2019-09-20 17:07:46.270 IST > LOG: redo starts at 1/680A2188 > < 2019-09-20 17:07:46.370 IST > LOG: consistent recovery state reached at > 1/6879D9F8 > < 2019-09-20 17:07:46.370 IST > LOG: database system is ready to accept > read only connections > cp: cannot stat > '/var/lib/pgsql/9.6/data/archivedir/000000010000000100000068': No such file > or directory > < 2019-09-20 17:07:46.751 IST > LOG: statement: select pg_is_in_recovery() > < 2019-09-20 17:07:46.782 IST > LOG: statement: show > synchronous_standby_names > < 2019-09-20 17:07:50.993 IST > LOG: statement: select pg_is_in_recovery() > < 2019-09-20 17:07:53.395 IST > LOG: started streaming WAL from primary > at 1/68000000 on timeline 1 > < 2019-09-20 17:07:53.436 IST > LOG: invalid contrecord length 2662 at > 1/6879D9F8 > < 2019-09-20 17:07:53.438 IST > FATAL: terminating walreceiver process > due to administrator command > cp: cannot stat '/var/lib/pgsql/9.6/data/archivedir/00000002.history': No > such file or directory > cp: cannot stat > '/var/lib/pgsql/9.6/data/archivedir/000000010000000100000068': No such file > or directory > > When we try to restart postgres on the standby, using pg_ctl restart, the > standby start syncing. > > > 2. After standby syncs using pg_ctl restart as mentioned above, we found > out that 1-2 records are missing on the standby. > > Need help to check: > 1. why the standby fails to start in the first place and complains about > missing logs? > 2. can record mismatch be a problem related to failover not successful? > > If you have faced this issue/have knowledge, please let us know. > > replication is async. > recovery.conf file has restore_command that uses cp > > > Thanks. > Hello Team, Any ideas? Thanks.. > >