At Wed, 30 Oct 2019 10:45:11 +0900, Michael Paquier <mich...@paquier.xyz> wrote in > On Tue, Oct 29, 2019 at 07:50:01PM +0900, Kyotaro Horiguchi wrote: > > At Fri, 25 Oct 2019 15:18:34 +0800, "Dongming Liu" > > <lingce....@alibaba-inc.com> wrote in > >> I recently discovered two possible bugs about synchronous replication. > >> > >> 1. SyncRepCleanupAtProcExit may delete an element that has been deleted > >> SyncRepCleanupAtProcExit first checks whether the queue is detached, if it > >> is not detached, > >> acquires the SyncRepLock lock and deletes it. If this element has been > >> deleted by walsender, > >> it will be deleted repeatedly, SHMQueueDelete will core with a segment > >> fault. > >> > >> IMO, like SyncRepCancelWait, we should lock the SyncRepLock first and then > >> check > >> whether the queue is detached or not. > > > > I think you're right here. > > Indeed. Looking at the surroundings we expect some code paths to hold > SyncRepLock in exclusive or shared mode but we don't actually check > that the lock is hold. So let's add some assertions while on it.
I looked around closer. If we do that strictly, other functions like SyncRepGetOldestSyncRecPtr need the same Assert()s. I think static functions don't need Assert() and caution in their comments would be enough. On the other hand, the similar-looking code in SyncRepInitConfig and SyncRepUpdateSyncStandbysDefined seems safe since AFAICS it doesn't have (this kind of) racing condition on wirtes. It might need a comment like that. Or, we could go to (apparently) safer-side by applying the same amendment to it. SyncRepReleaseWaiters reads MyWalSnd->sync_standby_priority without holding SyncRepLock, which could lead to a message with wrong priority. I'm not sure it matters, though. > > This is not right. It is in transaction commit so it is in a > > HOLD_INTERRUPTS section. ProcessInterrupt does not respond to > > cancel/die interrupts thus the ereport should return. > > Yeah. There is an easy way to check after that: InterruptHoldoffCount > needs to be strictly positive. > > My suggestions are attached. Any thoughts? Seems reasonable for holdoffs. The same assertion would be needed in more places but it's another issue. By the way while I was looking this, I found a typo. Please find the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index a21f7d3347..16aee1de4c 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -1065,8 +1065,8 @@ SyncRepUpdateSyncStandbysDefined(void) /* * If synchronous_standby_names has been reset to empty, it's futile - * for backends to continue to waiting. Since the user no longer - * wants synchronous replication, we'd better wake them up. + * for backends to continue waiting. Since the user no longer wants + * synchronous replication, we'd better wake them up. */ if (!sync_standbys_defined) {