On 09/05/2012 09:42 PM, Bruce Momjian wrote:
On Wed, Sep 5, 2012 at 09:07:05PM -0400, Andrew Dunstan wrote:
OK, I worked with Andrew on this issue, and have applied the attached
patch which explains what is happening in this case. Andrew's #ifndef
WIN32 was the correct fix. I consider this issue closed.
It looks like we still have problems in this area :-( see
<http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pitta&dt=2012-09-05%2023%3A05%3A16>
Now it looks like somehow the fopen on the log file that isn't
commented out is failing. But the identical code worked on the same
machine on HEAD. SO this does rather look like a timing issue.
Investigating ...
Yes, that is very odd. It is also right after the code we just changed
to use binary mode to split the pg_dumpall file, split_old_dump().
The code is doing pg_ctl -w stop, then starting a new postmaster with
pg_ctl -w start. Looking at the pg_ctl.c code (that you wrote), what
pg_ctl -w stop does is to wait for the postmaster.pid file to disappear,
then it returns complete. I suppose it is possible that the pid file is
getting removed, pg_ctl is returning done, but the pg_ctl binary is
still running, holding open those log files.
I guess the buildfarm is showing us the problems in pg_upgrade, as it
should. I think you might be right that we need to add a sleep(1) at
the end of stop_postmaster on Windows, and document it is to give the
postmaster time to release its log files.
Icky. I wish there were some nice portable flock() mechanism we could use.
I just re-ran the test on the same machine, same code, same everything
as the reporte3d failure, and it passed, so it definitely looks like
it's a timing issue.
I'd be inclined to put a loop around that fopen() to try it once every
second for, say, 5 seconds.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers