On 03.10.2013 00:14, Merlin Moncure wrote:
On Wed, Oct 2, 2013 at 3:43 PM, Ants Aasma <a...@cybertec.at> wrote:
On Wed, Oct 2, 2013 at 10:37 PM, Merlin Moncure <mmonc...@gmail.com> wrote:
On Wed, Oct 2, 2013 at 9:45 AM, Ants Aasma <a...@cybertec.at> wrote:
I haven't reviewed the code in as much detail to say if there is an
actual race here, I tend to think there's probably not, but the
specific pattern that I had in mind is that with the following actual
code:

hm.  I think there *is* a race.  2+ threads could race to the line:

LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;

and simultaneously set the value of LocalRecoveryInProgress to false,
and both engage InitXLOGAccess, which is destructive.   The move
operation is atomic, but I don't think there's any guarantee the reads
to xlogctl->SharedRecoveryInProgress are ordered between threads
without a lock.

The backend is single-threaded, so this is moot.

InitXLOGAccess only writes backend local variables, so it can't be
destructive. In fact, it calls GetRedoRecPtr(), which does a spinlock
acquisition cycle, which should be a full memory barrier. A read
barrier after accessing SharedRecoveryInProgress is good enough, and
it seems to be necessary to avoid a race condition for
XLogCtl->ThisTimeLineID.

Admittedly the race is so unprobable that in practice it probably
doesn't matter. I just wanted to be spell out the correct way to do
unlocked accesses as it can get quite confusing. I found Herb Sutter's
"atomic<> weapons" talk very useful, thinking about the problem in
terms of acquire and release makes it much clearer to me.

If we don't care about multiple calls to InitXLOGAccess() (for a
backend) then we can get away with just a barrier.  That's a pretty
weird assumption though (even if 'improbable') and I think it should
be well documented in the code, particularly since previous versions
went though such trouble to do a proper check->set->init.

I'm still leaning on my earlier idea (recovery4.patch) since it keeps
the old semantics by exploiting the fact that the call site of
interest (only) does not require a correct answer (that is, for the
backend to think it's in recovery when it's not).  That just defers
the heap prune for a time and you don't have to mess around with
barriers at all or be concerned about present or future issues caused
by spurious extra calls to InitXLOGAccess().  The other code paths to
RecoveryInProgress() appear not to be interesting from a spinlock
perspective.

Hmm. All callers of RecoveryInProgress() must be prepared to handle the case that RecoveryInProgress() returns true, but the system is no longer in recovery. No matter what locking we do in RecoveryInProgress(), the startup process might finish recovery just after RecoveryInProgress() has returned.

What about the attached? It reads the shared variable without a lock or barrier. If it returns 'true', but the system in fact just exited recovery, that's OK. As explained above, all the callers must tolerate that anyway. But if it returns 'false', then it performs a full memory barrier, which should ensure that it sees any other shared variables as it is after the startup process cleared SharedRecoveryInProgress (notably, XLogCtl->ThisTimeLineID).

- Heikki
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
old mode 100644
new mode 100755
index a95149b..a9bfdfc
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7370,10 +7370,7 @@ RecoveryInProgress(void)
 		/* use volatile pointer to prevent code rearrangement */
 		volatile XLogCtlData *xlogctl = XLogCtl;
 
-		/* spinlock is essential on machines with weak memory ordering! */
-		SpinLockAcquire(&xlogctl->info_lck);
 		LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
-		SpinLockRelease(&xlogctl->info_lck);
 
 		/*
 		 * Initialize TimeLineID and RedoRecPtr when we discover that recovery
@@ -7382,7 +7379,15 @@ RecoveryInProgress(void)
 		 * this, see also LocalSetXLogInsertAllowed.)
 		 */
 		if (!LocalRecoveryInProgress)
+		{
+			/*
+			 * If we just exited recovery, make sure we read any other global
+			 * state after SharedRecoveryInProgress (for machines with weak
+			 * memory ordering).
+			 */
+			pg_memory_barrier();
 			InitXLOGAccess();
+		}
 
 		return LocalRecoveryInProgress;
 	}
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to