Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

David Steele Thu, 27 Jul 2023 01:19:05 -0700

Hi Thomas,

On 7/26/23 06:06, Thomas Munro wrote:

While chatting to Robert and Andres about all this, a new idea came
up.  Or, rather, one of the first ideas that was initially rejected,
now resurrected to try out a suggestion of Andres’s on how to
de-pessimise it.  Unfortunately, it also suffers from Windows-specific
problems that I originally mentioned at the top of this thread but
had since repressed.  Arrrghgh.


First, the good news:

We could write out a whole new control file, and durable_rename() it
into place.  We don’t want to do that in general, because we don’t
want to slow down UpdateMinRecoveryPoint().  The new concept is to do
that only if a backup is in progress.  That requires a bit of
interlocking with backup start/stop (ie when runningBackups is
changing in shmem, we don’t want to overlap with UpdateControlFile()'s
decision on how to do it).  Here is a patch to try that out.  No more
weasel wording needed for the docs; basebackup and low-level file
system backup should always see an atomic control file (and
occasionally also copy a harmless pg_control.tmp file).  Then we only
need the gross retry-until-stable hack for front-end programs.

I like the approach in these patches better than the last patch set. Myonly concern would be possible performance regression on standbys (whendoing backup from standby) since pg_control can be written veryfrequently to update min recovery point.

I've made a first pass through the patches and they look generallyreasonable (and back patch-able).


One thing:

+ sendFileWithContent(sink, XLOG_CONTROL_FILE,
+                     (char *) control_file, sizeof(*control_file),
+                     &manifest);

I wonder if we should pad pg_control out to 8k so it remains the samesize as now? Postgres doesn't care, but might look odd to users, and isarguably a change in behavior that should not be back patched.

And the bad news:

Provided we can reasonably address the Windows issues this seems to bethe way to go.


Regards,
-David

Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

Reply via email to