Re: standby promotion can create unreadable WAL

Robert Haas Wed, 24 Aug 2022 05:14:06 -0700

On Wed, Aug 24, 2022 at 4:40 AM Kyotaro Horiguchi
<horikyota....@gmail.com> wrote:
> Me, too.  There are two ways to deal with this, I think. One is start
> writing new records from abortedContRecPtr as if it were not
> exist. Another is copying WAL file up to missingContRecPtr. Since the
> first segment of the new timeline doesn't need to be identcal to the
> last one of the previous timeline, so I think the former way is
> cleaner.


I agree, mostly because that gets us back to the way all of this
worked before the contrecord stuff went in. This case wasn't broken
then, because the breakage had to do with it being unsafe to back up
and rewrite WAL that might have already been shipped someplace, and
that's not an issue when we're first creating a totally new timeline.
It seems safer to me to go back to the way this worked before the fix
went in than to change over to a new system.

Honestly, in a vacuum, I might prefer to get rid of this thing where
the WAL segment gets copied over from the old timeline to the new, and
just always switch TLIs at segment boundaries. And while we're at it,
I'd also like TLIs to be 64-bit random numbers instead of integers
assigned in ascending order. But those kinds of design changes seem
best left for a future master-only development effort. Here, we need
to back-patch the fix, and should try to just unbreak what's currently
broken.

> XLogInitNewTimeline or near seems to be be the place for fix
> to me. Clearing abortedRecPtr and missingContrecPtr just before the
> call to findNewestTimeLine will work?

Hmm, yeah, that seems like a good approach.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: standby promotion can create unreadable WAL

Reply via email to