SyncRepGetSyncStandbysPriority() vs. SIGHUP

Noah Misch Wed, 05 Feb 2020 23:46:40 -0800

Buildfarm runs have triggered the assertion at the end of
SyncRepGetSyncStandbysPriority():


 sysname  │      snapshot       │    branch     │                               
             bfurl
──────────┼─────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────
 hoverfly │ 2019-11-22 12:15:08 │ HEAD          │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2019-11-22%2012%3A15%3A08
 hoverfly │ 2019-11-07 17:19:12 │ REL9_6_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2019-11-07%2017%3A19%3A12
 nightjar │ 2019-08-13 23:04:41 │ REL_10_STABLE │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2019-08-13%2023%3A04%3A41
 skink    │ 2018-11-28 21:03:35 │ HEAD          │ 
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2018-11-28%2021%3A03%3A35

On my development system, this delay injection reproduces the failure:

--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -399,6 +399,8 @@ SyncRepInitConfig(void)
 {
    int         priority;
 
+   pg_usleep(100 * 1000);

SyncRepInitConfig() is the function responsible for updating, after SIGHUP,
the sync_standby_priority values that SyncRepGetSyncStandbysPriority()
consults.  The assertion holds if each walsender's sync_standby_priority (in
shared memory) accounts for the latest synchronous_standby_names GUC value.
That ceases to hold for brief moments after a SIGHUP that changes the
synchronous_standby_names GUC value.

I think the way to fix this is to nominate one process to update all
sync_standby_priority values after SIGHUP.  That process should acquire
SyncRepLock once per ProcessConfigFile(), not once per walsender.  If
walsender startup occurs at roughly the same time as a SIGHUP, the new
walsender should avoid computing sync_standby_priority based on a GUC value
different from the one used for the older walsenders.

Would anyone like to fix this?  I could add it to my queue, but it would wait
a year or more.

Thanks,
nm

SyncRepGetSyncStandbysPriority() vs. SIGHUP

Reply via email to