Hi, On 2021-08-30 15:51:54 -0400, alvhe...@alvh.no-ip.org wrote: > On 2021-Aug-28, Andres Freund wrote: > > > While rebasing the aio patchset ontop of HEAD I noticed that this commit > > added > > another atomic operation to XLogWrite() with archiving enabled. The WAL > > write > > path is really quite hot, and at least some of the > > NotifySegmentsReadyForArchive() calls are done while WALWriteLock is held. > > > > I think we should at least try to make the fast-path where no segment > > boundaries were crossed use no atomic operations. > > I think the best way to achieve this is is to rely completely on > walwriter doing the segment notification, so that the WAL write done by > backend would only do a latch set.
When were you thinking of doing the latch sets? Adding a latch set for every XLogWrite() wouldn't be cheap either. Both because syscalls under a lock aren't free and because waking up walsender even more often isn't free (we already have a few threads about reducing the signalling frequency). There's also the question of what to do with single user mode. We shouldn't just skip creating .ready files there... Although, the more I think about, the more I am confused about the trailing if (XLogArchivingActive()) NotifySegmentsReadyForArchive(LogwrtResult.Flush); in XLogWrite(). Shouldn't that at the very least be inside the "If asked to flush, do so" branch? Outside that and the finishing_seg branch LogwrtResult.Flush won't have moved, right? So the call to NotifySegmentsReadyForArchive() can't do anything, no? Nor does it seem like we'd ever need to call NotifySegmentsReadyForArchive() if we started writing on the current page - flushRecPtr can't move across a segment boundary in that case. I hadn't yet realized that this commit doesn't just make XLogWrite() more expensive, it also makes XLogInsertRecord() more expensive :(. Adding two divisions to XLogInsertRecord() isn't nice, especially as it happens even if !XLogArchivingActive(). I can't really convince myself this deals correctly with multiple segment spanning records and with records spanning more than one segment? It'd be easier to understand if the new XLogCtlData variables were documented... If there's one record from segment s0 to s1 and one from s1 to s4, and wal_buffers is big enough to contain them all, the first record will set earliestSegBoundary = s1 the second latestSegBoundary = s4. When s1 is fully written out, NotifySegmentsReadyForArchive() will set earliestSegBoundary = latestSegBoundary = s4 and create .reaady for .s1 - ok. But when when s2 is flushed, we'll afaict happily create .ready files for s2, s3 despite s4 not yet being written, because earliestSegBoundary is now s4. I think there's other issues as well. The more I look at this commit, the less I believe it's right. The whole approach here of delaying .ready creation for these types of segments seems wrong to me. Doesn't the exact same problem also exist for streaming rep - which one can also use to maintain a PITR archive? walsender sends up to the flush location, and pg_receivewal's FindStreamingStart() will afaict just continue receiving from after that point. Greetings, Andres Freund