On 08/08/2014 09:51 AM, Tom Lane wrote: >> AFAIK we don't _have_ a fancy negotiation system in the protocol, with >> back-and-forth exchanges of capabilities information. > > Maybe it's time to invent that. It would be positively foolish to > create any such behavior without a protocol version bump anyway.
I was hoping it'd be easier to sneak a new message type in without a full protocol bump. As you can imagine that makes it a ... rather larger job. Still, if it's unsafe to do it that way... > Although as I said, I don't think embedding knowledge of LSNs at the > protocol level is a good thing to begin with. As I said upthread, it need not necessarily be an LSN. A commit timestamp would do the job too, if information about the last-replayed commit timestamp was accessible on the downstream node. It needs to be a sequence identifier that can be matched against pg_replication_slots / pg_stat_replication or passed to a function on the downstream end to say "wait until we're replayed to this point". For streaming replication there's only one upstream, so there's no need to identify it. For BDR you'd also have to identify the upstream node of interest - probably by slot ID, or by (sysid, tlid, dboid) tuple. In the end, it can be an opaque magic cookie. It really doesn't matter, so long as what the client receives is a value it can pass to another Pg instance and say "wait until you've replayed up to this, please" or "have you replayed up to this yet?". > Is it really necessary that this information be pushed to the client on every commit, as opposed to the client asking for it occasionally? I think so, yes, though I'd be glad to be proved wrong. For the purpose of transparent failover (BDR) at least, the server currently being written to can go away at any moment, and you should know exactly what you're up to in order to make it safe to continue on another server. Consider, for a multi-master configuration where two servers replicate to each other: On a connection to server1: INSERT INTO bird(id, parrot) VALUES (1, 'African Grey'); [client grabs magic cookie for server replay state] INSERT INTO bird(id, parrot) VALUES (2, 'Lorikkeet'); [server1 sends changes to server2, which is behind on replay and still working on catching up] [server1 dies abruptly] [client drops connection to server1, connects to server2] -- Correct spelling UPDATE bird SET parrot = 'Lorikeet' WHERE id = 2; If the INSERT from server1 hasn't replayed on server2 yet this will fail. Other anomalies can be worse and cause lost updates, etc. To protect against this the client needs a way to wait, after connecting to server2, until it's caught up with the state of server1. That's what I'm talking about here. In this case, if you used a periodic progress indicator requested by the client, you'd still get the same error, because you'd wait until the first INSERT but not the second. So yes, the client needs this info at every commit. That means that enabling client-side fail-over won't be free, especially for lots of small transactions. It'll be cheaper if Pg can push the info with the commit confirmation instead of the client having to request it afterwards though. (Note that the client risks waiting forever if server1 didn't send the required commits before it died, but that's where application policy decisions come in). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers