* Simon Riggs <[EMAIL PROTECTED]> [080910 06:18]: > We have a number of choices, at the point of failure: > * Does the whole primary server stay up (probably)?
The only sane choice is the one the admin makes. Any "predetermined" choice PG makes can (and will) be wrong in some situations. > * Do we continue to allow new transactions in degraded mode? (which > increases the risk of transaction loss if we continue at that time). > (The answer sounds like it will be "of course, stupid" but this cluster > may be part of an even higher level HA mechanism, so the answer isn't > always clear). The only sane choice is the one the admin makes. Any "predetermined" choice PG makes can (and will) be wrong in some situations. > * For each transaction that is trying to commit: do we want to wait > forever? If not, how long? If we stop waiting, do we throw ERROR, or do > we say, lets get on with another transaction. The only sane choice is the one the admin makes. Any "predetermined" choice PG makes can (and will) be wrong in some situations. > If the server is up, yet all connections in a session pool are stuck > waiting for their last commits to complete then most sysadmins would > agree that the server is actually "down". Since no useful work is > happening, or can be initiated - even read only. We don't need to > address that issue in the same way for all transactions, is all I'm > saying. Sorry to sound like a broken record here, but the whole point is to guarantee data safety. You can only start trading ACID for HA if you have the ACID guarantees in the first place (and for replication, this means across the cluster, including slaves) So in that light, I think it's pretty obvious that if a slave is considered part of an active synchronous replication cluster, in the face of "network lag", or even network failure, the master *must* pretty much halt all new commits in their tracks until that slave acknowledges the commit. Yes that's going to cause a backup. That's the cost of a synchronous replication. But that means the admin has to be able to control whether a slave is part of an active synchronous replication cluster or not. I hope that control eventually is a lot more than a GUC that says "when a slave is X seconds behind, abandon him). I'ld dream of a "replication" interface where I could add new slaves on the fly (and a nice tool that pg_start_backup()/sync/apply WAL to sync then subscribe), get slave status (maybe syncing/active/abandoned), and some average latency (i.e. something like svctm of iostat on your WAL disk) and some way to control the slave degradation from active to abandoned (like the above GUC, or maybe a callout/hook/script that runs when latency > X, etc, or both). And for async replication, you just have a "proxy" slave which does nothing but subscribe to your master, always acknowledge WAL right away so the master doesn't wait, and keep a local backlog of WAL it's sending out to many clients. This proxy slave doesn't slow down the master, but can feed clients accross slow WAN links (that may not have the burst bandwidth to keep up with bursty master writes, but have agregate bandwidth to keep pretty close to the master), or networks that drop out for a period, etc. -- Aidan Van Dyk Create like a god, [EMAIL PROTECTED] command like a king, http://www.highrise.ca/ work like a slave.
signature.asc
Description: Digital signature