I picked up the patch and verified both fixes on 8.3.7.

In one test, Handles to two different WAL files were being held by two 
different backends.  The WAL files were renamed to .deleted after I forced a 
switch xlog.  Eventually the .deleted files disappeared.  In one case the 
backend exited.  In the other, the backend moved on to the latest WAL file.

In another test, I opened a WAL file so that it could not be renamed or 
deleted.  The appropriate error was logged and the .done file remained.  The 
error is logged quite frequently.  When released the WAL file it was soon 
deleted.

If you get into a case where the rename works but the unlink fails (I don't see 
how this could happen in real life, except possibly for a race condition with 
AV software), you will have a situation where there is a .done file that does 
not match any WAL logs, and you will have a .deleted file that won't get 
cleaned up.

I couldn't reproduce this, so I faked it by adding a .done file back into the 
archive_status folder after it was deleted.  The orphaned .done file doesn't 
cause any trouble.  It doesn't get cleaned up, it doesn't generate any log 
messages, and it doesn't interfere with WAL file recycling or removal (unlike 
the trouble that is caused by orphaned .ready files).

The patch looks good.

Thank-you,

-Luke

> -----Original Message-----
> From: Heikki Linnakangas [mailto:heikki.linnakan...@enterprisedb.com]
> Sent: Thursday, September 10, 2009 5:44 AM
> Cc: Tom Lane; Luke Koops; pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] BUG #5038: WAL file is pending deletion
> in pg_xlog folder, this interferes with WAL archiving.
>
> Heikki Linnakangas wrote:
> > Tom Lane wrote:
> >> Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> writes:
> >>> No, it's a backend that's holding the file open, with
> FILE_SHARE_DELETE.
> >> If that's the only case we care about covering, then
> rename might be
> >> enough.  I was just wondering what it would take to solve the more
> >> general problem of something holding it open with the
> wrong flags at
> >> the time we want to get rid of it.
> >
> > Yes, that's a separate problem, and I think we should
> address that too.
> > That's what I thought was going on in OP's case at first,
> the patch I
> > posted in my first reply should address that.
> >
> > I'll try to reproduce that case too, and verify that the
> patch fixes it.
>
> Ok, I've committed a patch along those lines. The file is now
> renamed before unlinking (on Windows), and the return code of
> rename() and
> unlink() is checked, so that we don't delete the .done file
> if the WAL file deletion failed. This fixes both scenarios,
> the one OP reported with another backend keeping the file
> open, and the one where a different process keeps a file open
> without FILE_SHARE_DELETE.
>
> I considered making failure to rename or delete a WARNING
> instead of ERROR, so that RemoveOldXLogFiles() would still
> clean up any other old WAL files. However, when a file is
> recycled, we throw an error anyway if the rename fails in
> InstallXLogFileSegment(), so it doesn't seem like it would
> buy us much.
>
> BTW, it seems that errno is not set on Windows when rename
> fails, but we still try to print the OS error message in
> InstallXLogFileSegment().
> When I tested the case where another process is keeping the
> file locked, for example, I got this:
>
> ERROR:  could not rename file
> "pg_xlog/000000010000000100000073" to
> "pg_xlog/000000010000000100000092" (initialization of log
> file 1, segment 146): No such file or directory
>
> even though the file clearly exists, it's just locked. I'm
> not sure where errno is coming from in that case, and if we
> should do something about that, but that exceeds my appetite
> for fixing Windows issues right now.
>
> --
>   Heikki Linnakangas
>   EnterpriseDB   http://www.enterprisedb.com
>

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to