On Wed, Oct 19, 2022 at 8:58 AM Kyotaro Horiguchi <horikyota....@gmail.com> wrote: > > Unbounded number of sequential crash-restarts itself is a more serious > problem..
Agree. The basic_archive module currently leaves temp files around even for normal restarts of the cluster, which is bad IMO. > An archive module could clean up them at startup or at the first call > but that might be dangerous (since archive directory is I think > thought as external resource). The archive module must be responsible for cleaning up the temp file that it creates. One way to do it is in the archive module's shutdown callback, this covers most of the cases, but not all. > Honestly, I'm not sure about a reasonable scenario where simultaneous > archivings of a same file is acceptable, though. I feel that we should > not allow simultaneous archiving of the same segment by some kind of > interlocking. In other words, we might should serialize duplicate > archiving of asame file. In a typical production environment, there's some kind of locking (for instance lease files) that allow/disallow file creation/deletion/writes/reads which guarantees that the same file isn't put into a directory (can be archive location) many times. And as you rightly said archive_directory is something external to postgres and we really can't deal with concurrent writers writing/creating the same files. Even if we somehow try to do it, it makes things complicated. This is true for any PGDATA directories. However, the archive module implementers can choose to define such a locking strategy. > In another direction, the current code allows duplicate simultaneous > copying to temporary files with different names then the latest > renaming wins. We reach the almost same result (on Linuxen (or > POSIX?)) by unlinking the existing tempfile first then create a new > one with the same name then continue. Even if the tempfile were left > alone after a crash, that file would be unlinked at the next trial of > archiving. But I'm not sure how this can be done on Windows.. In the > first place I'm not sure that the latest-unconditionally-wins policy > is appropriate or not, though. We really can't just unlink the temp file because it has pid and timestamp in the filename and it's hard to determine the temp file that we created earlier. As far as the basic_archive module is concerned, we ought to keep it simple. I still think the simplest we can do is to use the basic_archive's shutdown_cb to delete (in most of the cases, but not all) the left-over temp file that the module is dealing with as-of-the-moment and add a note about the users dealing with concurrent writers to the basic_archive.archive_directory like the attached v2 patch. > > A simple server restart while the basic_archive module is copying > > to/from temp file would leave the file behind, see[1]. I think we can > > fix this by defining shutdown_cb for the basic_archive module, like > > the attached patch. While this helps in most of the crashes, but not > > all. However, this is better than what we have right now. > > ShutdownWalRecovery() does the similar thing, but as you say this one > covers rather narrow cases than that since RestoreArchiveFile() > finally overwrites the left-alone files at the next call for that > file. We're using unique temp file names in the basic_archive module so we can't overwrite the same upon restart. > # The patch seems forgetting to clear the tmepfilepath *after* a > # successful renaming. It does so at the beginning of basic_archive_file() which is sufficient. > And I don't see how the callback is called. call_archive_module_shutdown_callback()->basic_archive_shutdown(). Please see the attached v2 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
v2-0001-Remove-leftover-temporary-files-via-basic_archive.patch
Description: Binary data