On 09/05/2012 10:07 PM, Bruce Momjian wrote:
On Wed, Sep 5, 2012 at 10:04:07PM -0400, Andrew Dunstan wrote:
On 09/05/2012 09:42 PM, Bruce Momjian wrote:
On Wed, Sep 5, 2012 at 09:07:05PM -0400, Andrew Dunstan wrote:
OK, I worked with Andrew on this issue, and have applied the attached
patch which explains what is happening in this case. Andrew's #ifndef
WIN32 was the correct fix. I consider this issue closed.
It looks like we still have problems in this area :-( see
<http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pitta&dt=2012-09-05%2023%3A05%3A16>
Now it looks like somehow the fopen on the log file that isn't
commented out is failing. But the identical code worked on the same
machine on HEAD. SO this does rather look like a timing issue.
Investigating ...
Yes, that is very odd. It is also right after the code we just changed
to use binary mode to split the pg_dumpall file, split_old_dump().
The code is doing pg_ctl -w stop, then starting a new postmaster with
pg_ctl -w start. Looking at the pg_ctl.c code (that you wrote), what
pg_ctl -w stop does is to wait for the postmaster.pid file to disappear,
then it returns complete. I suppose it is possible that the pid file is
getting removed, pg_ctl is returning done, but the pg_ctl binary is
still running, holding open those log files.
I guess the buildfarm is showing us the problems in pg_upgrade, as it
should. I think you might be right that we need to add a sleep(1) at
the end of stop_postmaster on Windows, and document it is to give the
postmaster time to release its log files.
Icky. I wish there were some nice portable flock() mechanism we could use.
I just re-ran the test on the same machine, same code, same
everything as the reporte3d failure, and it passed, so it definitely
looks like it's a timing issue.
I'd be inclined to put a loop around that fopen() to try it once
every second for, say, 5 seconds.
Yes, good idea.
Suggested patch attached.
cheers
andrew
diff --git a/contrib/pg_upgrade/exec.c b/contrib/pg_upgrade/exec.c
index 99f5006..f84d857 100644
--- a/contrib/pg_upgrade/exec.c
+++ b/contrib/pg_upgrade/exec.c
@@ -63,7 +63,25 @@ exec_prog(const char *log_file, const char *opt_log_file,
if (written >= MAXCMDLEN)
pg_log(PG_FATAL, "command too long\n");
- if ((log = fopen_priv(log_file, "a")) == NULL)
+#ifdef WIN32
+ {
+ /*
+ * Try to open the log file a few times in case the
+ * server takes a bit longer than we'd like to release it.
+ */
+ int iter;
+ for (iter = 0; iter < 5; iter++)
+ {
+ log = fopen_priv(log_file, "a");
+ if (log != NULL || iter == 4)
+ break;
+ sleep(1);
+ }
+ }
+#else
+ log = fopen_priv(log_file, "a");
+#endif
+ if (log == NULL)
pg_log(PG_FATAL, "cannot write to log file %s\n", log_file);
#ifdef WIN32
fprintf(log, "\n\n");
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers