Hi, The thread below http://archives.postgresql.org/message-id/f37e975c-908f-858e-707f-058d3b1eb214%402ndquadrant.com describes an issue in logical decoding that arises because xl_running_xacts' contents aren't necessarily coherent with the contents of the WAL, because RecordTransactionCommit() / RecordTransactionAbort() don't have any interlock against the procarray. That means xl_running_xacts can contain transactions assumed to be running, that already have their commit/abort records WAL logged.
I think that's not just problematic in logical decoding, but also Hot-Standby. Consider the following: ProcArrayApplyRecoveryInfo() gets an xl_running_xacts record that's not suboverflowed, and thus will change to STANDBY_SNAPSHOT_READY. In that case it'll populate the KnownAssignedXids machinery using KnownAssignedXidsAdd(). Once STANDBY_SNAPSHOT_READY, CheckRecoveryConsistency() will signal postmaster to allow connections. For HS, a snapshot will be built by GetSnapshotData() using KnownAssignedXidsGetAndSetXmin(). That in turn uses the transactions currently known to be running, to populate the snapshot. Now, if transactions have committed before (in the "earlier LSN" sense) the xl_running_xacts record, ExpireTreeKnownAssignedTransactionIds() in xact_redo_commit() will already have run. Which means we'll assume already committed transactions are still running. In other words, the snapshot is corrupted. Luckily this'll self-correct over time, fixed by ExpireOldKnownAssignedTransactionIds(). Am I missing something that protects against the above scenario? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers