On Wed, Aug 28, 2013 at 10:58 AM, Ants Aasma <a...@cybertec.at> wrote: > I currently see the following courses of action: > > 1. Do nothing about the inconsistency, use a transient global counter > for master commit order and commit record LSN for slaves. > Pro: doesn't change any semantics > Con: we are not making any progress towards cluster wide snapshots > or even serializable transactions on slaves. > > 2. Create a new WAL record type that is inserted when a transaction > becomes visible. LSN of this record determines transaction visibility > order. Async transactions can be optimized to skip this record. This > record does not need to be flushed. > Pro: cluster wide consistency, replication method agnostic > Con: one extra WAL record insertion per writing transaction. (32 > bytes of WAL per tx) > > 3. Use a transient global counter on master, send xid-csn pairs to > slave via a side channel on the replication connection. > Pro: Less overhead than WAL records > Con: replication protocol needs (possibly invasive) changes, WAL > shipping based replication can't use this mechanism, lots of extra > code required. > > 4. Make the choice between 1 and 2 user configurable (it seems to me > that it could even be changed without a restart). > > Thoughts?
I think approach #2 is dead on arrival, at least as a default policy. It essentially amounts to requiring two commit records per transaction rather than one, and I think that has no chance of being acceptable. It's not just or even primarily the *volume* of WAL that I'm concerned about so much as the feeling that hitting WAL twice rather than once at the end of a transaction that may have only written one or two WAL records to begin with is going to slow things down pretty substantially, especially in high-concurrency scenarios. I wouldn't entirely dismiss the idea of changing the user-visible semantics. In addition to a WAL insertion pointer and a WAL flush pointer, you'd have a WAL snapshot pointer, which could run ahead of the flush pointer if the transactions were all asynchronous, but which for synchronous transactions could not advance faster than the flush pointer. Only users running a mix of synchronous_commit=on and synchronous_commit=off would be harmed, and maybe we could convince ourselves that's OK. Still, there's no doubt that there is a downside there. Therefore, I'm inclined to suggest that you implement #1. If, at a later time, we want to make progress on the issue of cluster-wide snapshot consistency, you could implement #2 or #3 as an optional feature that can be turned on via some flag. However, I would recommend against trying to do that in the initial patch; I think that doing either #2 or #3 is really a separate feature, and I think if you try to incorporate all of that code into the main CSN patch it's just going to be a distraction from what figures to be a very complicated patch even in minimal form. If you did choose to implement #2 as an option at some point, it would probably be worth optimizing for the case where commit ordering and visibility ordering match, and try to find a design where you only need the extra WAL record when the orderings don't match. I'm not sure exactly how to do that, but it might be worth investigating. I don't think that's enough to save #2 as a default behavior, but it might make it more palatable as an option. I agree with what others have said insofar as it would be nifty if we could use the commit LSN as the commit sequence number. But I think you've put your finger on why that's not likely to work out well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers