On Wed, 19 Aug 2020 at 21:41, Fujii Masao <masao.fu...@oss.nttdata.com> wrote: > > > > On 2020/08/12 15:32, Masahiko Sawada wrote: > > On Wed, 12 Aug 2020 at 14:06, Asim Praveen <pa...@vmware.com> wrote: > >> > >> > >> > >>> On 11-Aug-2020, at 8:57 PM, Robert Haas <robertmh...@gmail.com> wrote: > >>> > >>> I think this gets to the root of the issue. If we check the flag > >>> without a lock, we might see a slightly stale value. But, considering > >>> that there's no particular amount of time within which configuration > >>> changes are guaranteed to take effect, maybe that's OK. However, there > >>> is one potential gotcha here: if the walsender declares the standby to > >>> be synchronous, a user can see that, right? So maybe there's this > >>> problem: a user sees that the standby is synchronous and expects a > >>> transaction committing afterward to provoke a wait, but really it > >>> doesn't. Now the user is unhappy, feeling that the system didn't > >>> perform according to expectations. > >> > >> Yes, pg_stat_replication reports a standby in sync as soon as walsender > >> updates priority of the standby to something other than 0. > >> > >> The potential gotcha referred above doesn’t seem too severe. What is the > >> likelihood of someone setting synchronous_standby_names GUC with either > >> “*” or a standby name and then immediately promoting that standby? If the > >> standby is promoted before the checkpointer on master gets a chance to > >> update sync_standbys_defined in shared memory, commits made during this > >> interval on master may not make it to standby. Upon promotion, those > >> commits may be lost. > > > > I think that if the standby is quite behind the primary and in case of > > the primary crashes, the likelihood of losing commits might get > > higher. The user can see the standby became synchronous standby via > > pg_stat_replication but commit completes without a wait because the > > checkpointer doesn't update sync_standbys_defined yet. If the primary > > crashes before standby catching up and the user does failover, the > > committed transaction will be lost, even though the user expects that > > transaction commit has been replicated to the standby synchronously. > > And this can happen even without the patch, right? > > I think you're right. This issue can happen even without the patch. > > Maybe we should not mark the standby as "sync" whenever sync_standbys_defined > is false even if synchronous_standby_names is actually set and walsenders have > already detect that?
It seems good. I guess that we can set 'async' to sync_status and 0 to sync_priority when sync_standbys_defined is not true regardless of walsender's actual priority value. We print the message "standby \"%s\" now has synchronous standby priority %u" in SyncRepInitConfig() regardless of sync_standbys_defined but maybe it's fine as the message isn't incorrect and it's DEBUG1 message. > Or we need more aggressive approach; > make the checkpointer update sync_standby_priority values of > all the walsenders? ISTM that the latter looks overkill... I think so too. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services