On 8/19/21, 5:42 AM, "Dipesh Pandit" <dipesh.pan...@gmail.com> wrote: >> Should we have XLogArchiveNotify(), writeTimeLineHistory(), and >> writeTimeLineHistoryFile() enable the directory scan instead? Else, >> we have to exhaustively cover all such code paths, which may be >> difficult to maintain. Another reason I am bringing this up is that >> my patch for adjusting .ready file creation [0] introduces more >> opportunities for .ready files to be created out-of-order. > > XLogArchiveNotify() notifies Archiver when a log segment is ready for > archival by creating a .ready file. This function is being called for each > log segment and placing a call to enable directory scan here will result > in directory scan for each log segment.
Could we have XLogArchiveNotify() check the archiver state and only trigger a directory scan if we detect that we are creating an out-of- order .ready file? > There is one possible scenario where it may run into a race condition. If > archiver has just finished archiving all .ready files and the next anticipated > log segment is not available then in this case archiver takes the fall-back > path to scan directory. It resets the flag before it begins directory scan. > Now, if a directory scan is enabled by a timeline switch or .ready file > created > out of order in parallel to the event that the archiver resets the flag then > this > might result in a race condition. But in this case also archiver is > eventually > going to perform a directory scan and the desired file will be archived as > part > of directory scan. Apart of this I can't think of any other scenario which > may > result into a race condition unless I am missing something. What do you think about adding an upper limit to the number of files we can archive before doing a directory scan? The more I think about the directory scan flag, the more I believe it is a best-effort tool that will remain prone to race conditions. If we have a guarantee that a directory scan will happen within the next N files, there's probably less pressure to make sure that it's 100% correct. On an unrelated note, do we need to add some extra handling for backup history files and partial WAL files? Nathan