On Fri, 2 May 2025 at 15:00, Andrey Borodin <x4...@yandex-team.ru> wrote: > > Hi hackers! > > I want to revive attempts to fix some old edge cases of physical quorum > replication. > > Please find attached draft patches that demonstrate ideas. These patches are > not actually proposed code changes, I rather want to have a design consensus > first. [...] > 2. Do not allow to cancel locally written transaction > > The problem was discussed many times [0,1,2,3] with some agreement on taken > approach. But there was concerns that the solution is incomplete without > first patch in the current thread.
I'm trying to figure out where in the thread you find this this "some agreement". Could you reference the posts you're referring to? > Problem: user might try to cancel locally committed transaction and if we do > so we will show non-replicated data as committed. This leads to loosing data > with UPSERTs. Could you explain why specifically UPSERTs would lose data (vs any other user workload) in cancellations during SyncRepWaitForLSN? > The key change is how we process cancels in SyncRepWaitForLSN(). I personally think we should rather move to CSN-based snapshots on both primary and replica (with LSN as CSN), and make visibility of other transactions depend on how much persistence your session wants (SQL spec permitting, of course). I.e., if you have synchronous_commit=remote_apply, you wait with sending the commit success message until you have confirmation that your commit LSN has been applied on the configured amount of replicas, and snapshots are taken based on the latest LSN that is known to be applied everywhere, but if you have synchronous_commit=off, you could read the commits (even those committed in s_c=remote_apply sessions) immediately after they've been included in the logs (potentially with some added slack to account for system state updates). Similarly, all snapshots you create in a backend with synchronous_commit=remote_apply would use the highest LSN which is remotely applied according to the applicable rules, while synchronous_commit=off implies "all transactions which have been logged as committed". Changing synchronous_commit to a value that requires higher persistence level would cause the backend to wait for its newest snapshot LSN to reach that persistence level; IMO an acceptable trade-off for switching s_c at runtime. This is based on the assumption that if you don't want your commit to be very durable, you probably also don't care as much about the durability of the data you can see, and if you want your commits to be very durable, you probably want to see only very durable data. This would also unify the commit visibility order between primary and secondary nodes, and would allow users to have session-level 'wait for LSN x to be persistent' with much reduced lock times. (CC-ed to Ants, given his interest in this topic) Kind regards, Matthias van de Meent Neon (https://neon.tech)