On Fri, Nov 10, 2023 at 2:28 AM Andres Freund <and...@anarazel.de> wrote: > > On 2023-11-08 13:10:34 +0530, Bharath Rupireddy wrote: > > > > + /* > > > > + * The fact that we acquire WALBufMappingLock while > > > > reading the WAL > > > > + * buffer page itself guarantees that no one else > > > > initializes it or > > > > + * makes it ready for next use in > > > > AdvanceXLInsertBuffer(). However, we > > > > + * need to ensure that we are not reading a page that > > > > just got > > > > + * initialized. For this, we look at the needed page > > > > header. > > > > + */ > > > > + phdr = (XLogPageHeader) page; > > > > + > > > > + /* Return, if WAL buffer page doesn't look valid. */ > > > > + if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC && > > > > + phdr->xlp_pageaddr == (ptr - (ptr % > > > > XLOG_BLCKSZ)) && > > > > + phdr->xlp_tli == tli)) > > > > + break; > > > > > > I don't think this code should ever encounter a page where this is not the > > > case? We particularly shouldn't do so silently, seems that could hide all > > > kinds of problems. > > > > I think it's possible to read a "just got initialized" page with the > > new approach to read WAL buffer pages without WALBufMappingLock if the > > page is read right after it is initialized and xlblocks is filled in > > AdvanceXLInsertBuffer() but before actual WAL is written. > > I think the code needs to make sure that *never* happens. That seems unrelated > to holding or not holding WALBufMappingLock. Even if the page header is > already valid, I don't think it's ok to just read/parse WAL data that's > concurrently being modified. > > We can never allow WAL being read that's past > XLogBytePosToRecPtr(XLogCtl->Insert->CurrBytePos) > as it does not exist.
Agreed. Erroring out in XLogReadFromBuffers() if passed in WAL is past the CurrBytePos is an option. Another cleaner way is to just let the caller decide what it needs to do (retry or error out) - fill an error message in XLogReadFromBuffers() and return as-if nothing was read or return a special negative error code like XLogDecodeNextRecord so that the caller can deal with it. Also, reading CurrBytePos with insertpos_lck spinlock can come in the way of concurrent inserters. A possible way is to turn both CurrBytePos and PrevBytePos 64-bit atomics so that XLogReadFromBuffers() can read CurrBytePos without any lock atomically and leave it to the caller to deal with non-existing WAL reads. > And if the to-be-read LSN is between > XLogCtl->LogwrtResult->Write and XLogBytePosToRecPtr(Insert->CurrBytePos) > we need to call WaitXLogInsertionsToFinish() before copying the data. Agree to wait for all in-flight insertions to the pages we're about to read to finish. But, reading XLogCtl->LogwrtRqst.Write requires either XLogCtl->info_lck spinlock or WALWriteLock. Maybe turn XLogCtl->LogwrtRqst.Write a 64-bit atomic and read it without any lock, rely on WaitXLogInsertionsToFinish()'s return value i.e. if WaitXLogInsertionsToFinish() returns a value >= Insert->CurrBytePos, then go read that page from WAL buffers. Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com