Re: Race condition in recovery?

Dilip Kumar Thu, 27 May 2021 00:18:02 -0700

On Thu, May 27, 2021 at 12:09 PM Kyotaro Horiguchi
<[email protected]> wrote:
>
> At Thu, 27 May 2021 11:44:47 +0530, Dilip Kumar <[email protected]> wrote 
> in
> > Maybe we can somehow achieve that without a broken archive command,
> > but I am not sure how it is enough to just delete WAL from pg_wal?  I
> > mean my original case was that
> > 1. Got the new history file from the archive but did not get the WAL
> > file yet which contains the checkpoint after TL switch
> > 2. So the standby2 try to stream using new primary using old TL and
> > set the wrong TL in expectedTLEs
> >
> > But if you are not doing anything to stop archiving WAL files or to
> > guarantee that WAL has come to archive and you deleted those then I am
> > not sure how we are reproducing the original problem.
>
> Thanks for the reply!
>
> We're writing at the very beginning of the switching segment at the
> promotion time. So it is guaranteed that the first segment of the
> newer timline won't be archived until the rest almost 16MB in the
> segment is consumed or someone explicitly causes a segment switch
> (including archive timeout).


I agree

> > BTW, I have also tested your script and I found below log, which shows
> > that standby2 is successfully able to select the timeline2 so it is
> > not reproducing the issue.  Am I missing something?
>
> standby_2? My last one 026_timeline_issue_2.pl doesn't use that name
> and uses "standby_1 and "cascade".  In the ealier ones, standby_4 and
> 5 (or 3 and 4 in the later versions) are used in ths additional tests.
>
> So I think it shold be something different?

Yeah, I tested with your patch where you had a different test case,
with "026_timeline_issue_2.pl", I am able to reproduce the issue.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Race condition in recovery?

Reply via email to