At Thu, 24 Oct 2019 17:37:52 +0800 (CST), Thunder <thund...@126.com> wrote in > Thanks for replay.I feel confused about snapshot. > > At 2019-10-23 11:51:19, "Kyotaro Horiguchi" <horikyota....@gmail.com> wrote: > >Hello. > > > >At Tue, 22 Oct 2019 20:42:21 +0800 (CST), Thunder <thund...@126.com> wrote > >in > >> Update the patch. > >> > >> 1. The STANDBY_SNAPSHOT_PENDING state is set when we replay the first > >> XLOG_RUNNING_XACTS and the sub transaction ids are overflow. > >> 2. When we log XLOG_RUNNING_XACTS in master node, can we assume that all > >> xact IDS < oldestRunningXid are considered finished? > > > >Unfortunately we can't. Standby needs to know that the *standby's* > >oldest active xid exceeds the pendig xmin, not master's. And it is > >already processed in ProcArrayApplyRecoveryInfo. We cannot assume that > > >the oldest xids are not same on the both side in a replication pair. > > > This issue occurs when master does not commit the transaction which has lots > of sub transactions, while we restart or create a new standby node. > The standby node can not provide service because of this issue. > Can the standby have any active xid while it can not provide service?
The problem is not xid, but snapshot, information on what xids are not committed yet on the master. Standby cannot deterine what rows should be visible without the information. The xid list is maintained using incoming commit records and vanishes on restart. So the restarted standby needs non-subxid-overflown XLOG_RUNNING_XACTS to make sure the xid list is complete. > >> 3. If we can assume this, when we replay XLOG_RUNNING_XACTS and change > >> standbyState to STANDBY_SNAPSHOT_PENDING, can we record oldestRunningXid > >> to a shared variable, like procArray->oldest_running_xid? > >> 4. In standby node when call GetSnapshotData if > >> procArray->oldest_running_xid is valid, can we set xmin to be > >> procArray->oldest_running_xid? > >> > >> Appreciate any suggestion to this issue. So, somehow we need to complete the KnownAssignedTransactionIds even if there's any subxid-overflown transactions. As mentioned upthread, I think we have at least the following choices. - Send back the complete xid list for START REPLICATION command from walreceiver. - The first XLOG_RUNNING_XACTS after a standby comes in while subxid-overflown transaction lives. I think the first is better. Any suggestions? -- Kyotaro Horiguchi NTT Open Source Software Center