On 12/4/18, 1:36 AM, "Michael Paquier" <mich...@paquier.xyz> wrote: > Okay, here is an updated patch for this stuff, which does the following: > - Check for a WAL segment if it has a ".ready" status file, an orphaned > status file is removed only on ENOENT. > - If durable_unlink fails, retry 3 times. If there are too many > failures, the archiver gives up on the orphan status file removal. If > the removal works correctly, the archiver moves on to the next file.
Thanks for the updated patch! The code looks good to me, the patch applies cleanly and builds without warnings, and it seems to work well in my manual tests. I just have a few wording suggestions. + /* + * In the event of a system crash, archive status files may be + * left behind as their removals are not durable, cleaning up + * orphan entries here is the cheapest method. So check that + * the segment trying to be archived still exists. If it does + * not, its orphan status file is removed. + */ I would phrase this comment this way: Since archive_status files are not durably removed, a system crash could leave behind .ready files for WAL segments that have already been recycled or removed. In this case, simply remove the orphan status file and move on. + ereport(WARNING, + (errmsg("removed orphan archive status file %s", + xlogready))); I think we should put quotes around the file name like we do elsewhere in pgarch_ArchiverCopyLoop(). + ereport(WARNING, + (errmsg("failed removal of \"%s\" too many times, will try again later", + xlogready))); I'd suggest mirroring the log statement for failed archiving commands and saying something like, "removing orphan archive status file \"%s\" failed too many times, will try again later." IMO that makes it clearer what is failing and why we are removing it in the first place. Nathan