Hi, On 2021-07-22 21:17:56 +0000, Bossart, Nathan wrote: > AFAICT there are a variety of ways that the aforementioned problem may > occur: > 1. Server restarts: As noted in the docs [2], "waiting transactions > will be marked fully committed once the primary database > recovers." I think there are a few options for handling this, > but the simplest would be to simply failover anytime the primary > server shut down. My proposal may offer other ways of helping > with this. > 2. Backend crashes: If a backend crashes, the postmaster process > will restart everything, leading to the same problem described in > 1. However, this behavior can be prevented with the > restart_after_crash parameter [3]. > 3. Client disconnections: During waits for synchronous replication, > interrupt processing is turned off, so disconnected clients > actually don't seem to cause a problem. The server will still > wait for synchronous replication to complete prior to making the > transaction visible on the primary. > 4. Query cancellations and backend terminations: This appears to be > the only gap where there is no way to avoid potential data loss, > and it is the main target of my proposal. > > Instead of blocking query cancellations and backend terminations, I > think we should allow them to proceed, but we should keep the > transactions marked in-progress so they do not yet become visible to > sessions on the primary. Once replication has caught up to the > the necessary point, the transactions can be marked completed, and > they would finally become visible.
I think there's two aspects making this proposal problematic: First, from the user experience side of things, the issue is that this seems to propose violating read-your-own-writes. Within a single connection to a single node. Which imo is *far* worse than seeing writes that haven't yet been acknowledged as replicated after a query cancel. Second, on the implementation side, I think this proposal practically amounts to internally converting plain transaction commits into 2PC prepare/commit. With all the associated overhead (two WAL entries/flushes per commit, needing a separate set of procarray entries to hold the resources for the the prepared-but-not-committed transactions, potential for running out of the extra procarray slots). What if a user rapidly commits-cancels in a loop? You'll almost immediately run out of procarray slots to represent all those "not really committed" transactions. I think there's benefit in optionally turning all transactions into 2PC ones, but I don't see it being fast enough to be the only option. Greetings, Andres Freund