On Tue, Aug 20, 2019 at 5:02 AM Thomas Munro <thomas.mu...@gmail.com> wrote: > 3. UndoLogDiscard() uses DiscardBuffer() to invalidate any currently > unpinned buffers, and marks as BM_DISCARDED any that happen to be > pinned right now, so they can't be immediately invalidated. Such > buffers are never written back and are eligible for reuse on the next > clock sweep, even if they're written into by a backend that managed to > do that when we were trying to discard.
This is definitely more concurrent, but it might be *too much* concurrency. Suppose that backend #1 is inserting a row and updating the transaction header for the previous transaction; meanwhile, backend #2 is discarding the previous transaction. It could happen that backend #1 locks the transaction header for the previous transaction and is all set to log the insertion ... but then gets context-switched out. Now backend #2 swoops in and logs the discard. Backend #1 now wakes up and finishes logging a change to a page that, according to the logic of the WAL stream, no longer exists. It's probably possible to make this work by ignoring WAL references to discarded pages during replay, but that seems a bit dangerous. At least, it loses some sanity checking that you might like to have. It seems to me that you can avoid this if you require that a backend that wants to set BM_DISCARDED to acquire at least a shared content lock before doing so. If you do that, then once a backend acquires content lock(s) on the page(s) containing the transaction header for the purposes of updating it, it can notice that the BM_DISCARDED flag is set and choose not to update those pages after all. I think that would be a smart design choice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company