Pursuant to the recent discussion about bugs 6200 and 6245, I went trawling through all the WAL redo routines looking for bugs such as inadequate locking or transiently trashing shared buffers. Here's what I found:
* As we already knew, RestoreBkpBlocks is broken because it transiently trashes a shared buffer, which another process could be accessing while holding only a pin. * seq_redo() has the same disease, since we allow SELECT * FROM sequences. * Everything else seems to be free of that specific issue; in particular the index-related replay routines are at fairly low risk since we don't have any coding rules allowing index pages to be examined without holding a buffer lock. * There are assorted replay routines that assume they can whack fields of ShmemVariableCache around without any lock. However, it's pretty inconsistent; about half do it like that, while the other half assume that they can read ShmemVariableCache without lock but should acquire lock to modify it. I think the latter coding rule is a whole lot safer in the presence of Hot Standby and should be adopted throughout. * Same goes for MultiXactSetNextMXact and MultiXactAdvanceNextMXact. It's not entirely clear to me that no read-only transaction can ever examine the shared-memory variables they change. In any case, if there is in fact no other process examining those variables, there can be no contention for the lock so it should be cheap to get. * Not exactly a race condition, but: tblspc_redo does ereport(ERROR) if it fails to clean out tablespace directories. This seems to me to be the height of folly, especially when the failure is more or less an expected case. If the error occurs the database is dead in the water, because that error is actually a PANIC and will recur on subsequent restart attempts. Therefore there is no way to recover short of manual intervention to clean out the non-empty directory. And why are we pulling the fire alarm like this? Well, uh, it's because we might fail to recover some disk space in the dropped tablespace. Seems to me to be a lot better to just elog(LOG) and move on. This is quite analogous to the case of failing to unlink a file after commit --- wasting disk space might be bad, but it's very much the lesser evil compared to this. Barring objections I'm going to fix all this stuff and back-patch as far as 9.0 where hot standby was added. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers