Greg Smith <g...@2ndquadrant.com> writes: > I don't see this as needing any implementation any more complicated than > the usual way such timeouts are handled. Note how long you've been > trying to reach the standby. Default to -1 for forever. And if you hit > the timeout, mark the standby as degraded and force them to do a proper > resync when they disconnect. Once that's done, then they can re-enter > sync rep mode again, via the same process a new node would have done so.
Well, actually, that's *considerably* more complicated than just a timeout. How are you going to "mark the standby as degraded"? The standby can't keep that information, because it's not even connected when the master makes the decision. ISTM that this requires 1. a unique identifier for each standby (not just role names that multiple standbys might share); 2. state on the master associated with each possible standby -- not just the ones currently connected. Both of those are perhaps possible, but the sense I have of the discussion is that people want to avoid them. Actually, #2 seems rather difficult even if you want it. Presumably you'd like to keep that state in reliable storage, so it survives master crashes. But how you gonna commit a change to that state, if you just lost every standby (suppose master's ethernet cable got unplugged)? Looks to me like it has to be reliable non-replicated storage. Leaving aside the question of how reliable it can really be if not replicated, it's still the case that we have noplace to put such information given the WAL-is-across-the-whole-cluster design. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers