Robert Haas <robertmh...@gmail.com> writes: > On Sun, Aug 7, 2011 at 1:47 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Any protocol of that sort has *obviously* got a race condition between >> changes of the latch state and changes of the separate flag state, if run >> in a weak-memory-ordering machine.
> On the flip side, I'm not sure that anyone ever really expected that a > latch could be safely used this way. Normally, I'd expect the flag to > be some piece of state protected by an LWLock, and I think that ought > to be OK provided that the lwlock is released before setting or > checking the latch. Maybe we should try to document the contract here > a little better; I think it's just that there must be a full memory > barrier (such as LWLockRelease) in both processes involved in the > interaction. Heikki argued the same thing in http://archives.postgresql.org/message-id/4cea5a0f.1030...@enterprisedb.com but I don't think anyone has actually done the legwork to verify that current uses are safe. So [ ... warms up grep ... ] Currrent callers of WaitLatch(OrSocket): XLogPageRead waits on XLogCtl->recoveryWakeupLatch latch is set by WakeupRecovery, which is called by process' own signal handlers (OK) and XLogWalRcvFlush. The latter is OK because there's a spinlock protecting the transmitted data. pgarch_MainLoop waits on mainloop_latch non-issue because latch is process-local WalSndLoop waits on MyWalSnd->latch latch is set by signal handlers and WalSndWakeup. The latter is OK because it's called right after XLogFlush (which certainly locks whatever it touches) or a spinlock release. SyncRepWaitForLSN waits on MyProc->waitLatch latch is set by signal handlers and SyncRepWakeQueue. The latter appears unsafe at first glance, because it sets the shared variable (thisproc->syncRepState) and immediately sets the latch. However, the receiver does this curious dance: syncRepState = MyProc->syncRepState; if (syncRepState == SYNC_REP_WAITING) { LWLockAcquire(SyncRepLock, LW_SHARED); syncRepState = MyProc->syncRepState; LWLockRelease(SyncRepLock); } which I believe makes it safe, though rather underdocumented; if a race does occur, the receiver will acquire the lock and recheck, and the lock acquisition will block until the caller of SyncRepWakeQueue has released SyncRepLock, and that release will flush any pending writes to shared memory. Conclusions: (1) 9.1 does not have a problem of this sort. (2) Breathing hard on any of this code could break it. (3) I'm not going to hold still for not inserting a memory barrier into SetLatch, once we have the code. Unsubstantiated performance worries are not a sufficient argument not to --- as a wise man once said, you can make the code arbitrarily fast if it doesn't have to give the right answer. (4) In the meantime, we'd better document that it's caller's responsibility to ensure that the flag variable is adequately protected by locking. I think SyncRepWaitForLSN could use a warning comment, too. I will go do that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers