On Wed, May 26, 2010 at 1:24 PM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > On 26/05/10 20:10, Kevin Grittner wrote: >> >> Heikki Linnakangas<heikki.linnakan...@enterprisedb.com> wrote: >> >>> One way to do that would be to refrain from flushing the commit >>> record to disk on the master until the standby has acknowledged >>> it. >> >> I'm not clear on the benefit of doing that, versus flushing the >> commit record and then waiting for responses. Either way some >> databases will commit before others -- what is the benefit of having >> the master lag? > > Hmm, I was going to answer that that way no other transactions can see the > transaction as committed before it has been safely replicated, but I now > realize that you could also flush, but refrain from releasing the entry from > procarray until the standby acknowledges the commit, so the transaction > would look like in-progress to other transactions in the master until that. > > Although, if the master crashes at that point, and quickly recovers, you > could see the last transactions committed on the master before they're > replicated to the standby.
No matter what you do, there's going to be corner cases where one node thinks the transaction committed and the other node doesn't know. At any given time, we're either in a state where a crash and restart on the master will replay the commit record, or we're not. And also, but somewhat independently, we're in a state where a crash on the standby will replay the commit record, or we're not. Each of these is dependent on a disk write, and there's no way to guarantee that both of those disk writes succeed or both of them fail. Now, in theory, maybe you could have a system where we don't have a fixed definition of who the master is. If either server crashes or if they lose communication, both crash. If they both come back up, they agree on who has the higher LSN on disk and both roll forward to that point, then designate one server to be the master. If one comes back up and can't reach the other, it appeals to the clusterware for help. The clusterware is then responsible for shooting one node in the head and telling the other node to carry on as the sole survivor. When, eventually, the dead node is resurrected, it *discards* any WAL written after the point from which the new master restarted. Short of that, I don't think "abort the transaction" is a recovery mechanism for when we can't get hold of a standby. We're going to have to commit locally first and then we can decide how long to wait for an ACK that a standby has also committed the same transaction remotely. We can wait not at all, forever, or for a while and then declare the other guy dead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers