On Mon, 28 Oct 2024 at 17:51, Peter Eisentraut <pe...@eisentraut.org> wrote: > This is something I hacked together on the way back from pgconf.eu. > It's highly experimental. > > The idea is to do the equivalent of pg_wal_replay_wait() on the protocol > level, so that it is ideally fully transparent to the application code. > The application just issues queries, and they might be serviced by a > primary or a standby, but there is always a correct ordering of reads > after writes.
The idea is great, I have been wanting something like this for a long time. For future proofing it might be a good idea to not require the communicated-waited value to be a LSN. In a sharded database a Lamport timestamp would allow for sequential consistency. Lamport timestamp is just some monotonically increasing value that is eagerly shared between all communicating participants, including clients. For a single cluster LSNs work fine for this purpose. But with multiple shards LSNs will not work, unless arranged as a vector clock which is what I think Matthias proposed. Even without sharding LSN might not be a final choice. Right now on the primary the visibility order is not LSN order. So if a connection does synchronous_commit = off commit, the write location is not even going to see the commit. By publishing the end of the commit record it would be better. But I assume at some point we would like to have a consistent visibility order, which quite likely means using something other than LSN as the logical clock. I see the patch names the field LSN, but on the protocol level and for the client library this is just an opaque 127 byte token. So basically I'm thinking the naming could be more generic. And for a complete Lamport timestamp implementation we would need the capability of extracting the last seen value and another set-if-greater update operation. -- Ants Aasma www.cybertec-postgresql.com