At Tue, 10 Dec 2019 10:40:53 -0800, Ashwin Agrawal <aagra...@pivotal.io> wrote in > On Tue, Dec 10, 2019 at 3:06 AM jiankang liu <liujk1...@gmail.com> wrote: > > > Start Walreceiver completely before shut down it on standby server. > > > > The walreceiver will be shut down, when read an invalid record in the > > WAL streaming from master.And then, we retry from archive/pg_wal again. > > > > After that, we start walreceiver in RequestXLogStreaming(), and read > > record from the WAL streaming. But before walreceiver starts, we read > > data from file which be streamed over and present in pg_wal by last > > time, because of walrcv->receivedUpto > RecPtr and the wal is actually > > flush on disk. Now, we read the invalid record again, what the next to > > do? Shut down the walreceiver and do it again. > > > > I am missing something here, if walrcv->receivedUpto > RecPtr, why are we > getting / reading invalid record?
I bet on that the standby is connecting to a wrong master. For example, something like happens when the master has been reinitalized from a backup and experienced another history, then the standby was initialized from the reborn master but the stale archive files on the standby are left alone. Anyway that cannot happen on correctly running replication set and what to do in the case is starting from a new basebackup of the master, making sure to erase stale archive files if any. About the proposed fix, it doesn't seem to cause start process to rewind WAL to that LSN. Even if that happens, it leads to no better than a broken database. regards. -- Kyotaro Horiguchi NTT Open Source Software Center