On Mon, Jul 9, 2012 at 1:30 PM, Shaun Thomas <stho...@optionshouse.com> wrote: > > 1. Slave wants to be synchronous with master. Master wants replication on at > least one slave. They have this, and are happy. > 2. For whatever reason, slave crashes or becomes unavailable. > 3. Master notices no more slaves are available, and operates in standalone > mode, accumulating WAL files until a suitable slave appears. > 4. Slave finishes rebooting/rebuilding/upgrading/whatever, and re-subscribes > to the feed. > 5. Slave stays in degraded sync (asynchronous) mode until it is caught up, > and then switches to synchronous. This makes both master and slave happy, > because *intent* of synchronous replication is fulfilled. >
So if I get this straight, what you are saying is "be asynchronous replication unless someone is around, in which case be synchronous" is the mode you want. I think if your goal is zero-transaction loss then you would want to rethink this, and that was the goal of SR: two copies, no matter what, before COMMIT returns from the primary. However, I think there is something you are stating here that has a finer point on it: right now, there is no graceful way to attenuate the speed of commit on a primary to ensure bounded lag of an *asynchronous* standby. This is a pretty tricky definition: consider if you bring a standby on-line from archive replay and it shows up in streaming with pretty high lag, and stops all commit traffic while it reaches the bounded window of what "acceptable" lag is. That sounds pretty terrible, too. How does DBRD handle this? It seems like the catchup phase might be interesting prior art. On first inspection, the best I can come up with something like "if the standby is making progress and it fails to make progress in convergence, attenuate the primary's speed of COMMIT until convergence is projected to occur in a projected time" or something like that. Relatedly, this is related to one of the one of the ugliest problems I have with continuous archiving: there is no graceful way to attenuate the speed of operations to prevent backlog that can fill up the disk containing pg_xlog. It also makes it very hard to very strictly bound the amount of data that can remain outstanding and unarchived. To get around this, I was planning on very carefully making use of the status messages supplied that inform synchronous replication to block and unblock operations, but perhaps a less strained interface is possible with some kind of cooperation from Postgres. -- fdr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers