On Thu, Aug 4, 2022 at 1:42 PM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Mon, Jul 25, 2022 at 4:20 PM Andrey Borodin <x4...@yandex-team.ru> wrote: > > > > > 25 июля 2022 г., в 14:29, Bharath Rupireddy > > > <bharath.rupireddyforpostg...@gmail.com> написал(а): > > > > > > Hm, after thinking for a while, I tend to agree with the above > > > approach - meaning, query cancel interrupt processing can completely > > > be disabled in SyncRepWaitForLSN() and process proc die interrupt > > > immediately, this approach requires no GUC as opposed to the proposed > > > v1 patch upthread. > > GUC was proposed here[0] to maintain compatibility with previous behaviour. > > But I think that having no GUC here is fine too. If we do not allow > > cancelation of unreplicated backends, of course. > > > > >> > > >> And yes, we need additional complexity - but in some other place. > > >> Transaction can also be locally committed in presence of a server crash. > > >> But this another difficult problem. Crashed server must not allow data > > >> queries until LSN of timeline end is successfully replicated to > > >> synchronous_standby_names. > > > > > > Hm, that needs to be done anyways. How about doing as proposed > > > initially upthread [1]? Also, quoting the idea here [2]. > > > > > > Thoughts? > > > > > > [1] > > > https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=jf2cfayv1mr7fzd8tr4ym5+oweyg1s...@mail.gmail.com > > > [2] 2) Wait for sync standbys to catch up upon restart after the crash or > > > in the next txn after the old locally committed txn was canceled. One > > > way to achieve this is to let the backend, that's making the first > > > connection, wait for sync standbys to catch up in ClientAuthentication > > > right after successful authentication. However, I'm not sure this is > > > the best way to do it at this point. > > > > > > I think ideally startup process should not allow read only connections in > > CheckRecoveryConsistency() until WAL is not replicated to quorum al least > > up until new timeline LSN. > > We can't do it in CheckRecoveryConsistency() unless I'm missing > something. Because, the walsenders (required for sending the remaining > WAL to sync standbys to achieve quorum) can only be started after the > server reaches a consistent state, after all walsenders are > specialized backends.
Continuing on the above thought (I inadvertently clicked the send button previously): A simple approach would be to check for quorum in PostgresMain() before entering the query loop for (;;) for non-walsender cases. A disadvantage of this would be that all the backends will be waiting here in the worst case if it takes time for achieving the sync quorum after restart - roughly we can do the following in PostgresMain(), of course we need locking mechanism so that all the backends whoever reaches here will wait for the same lsn: if (sync_replicaion_defined == true && shmem->wait_for_sync_repl_upon_restart == true) { SyncRepWaitForLSN(pg_current_wal_flush_lsn(), false); shmem->wait_for_sync_repl_upon_restart = false; } Thoughts? -- Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/