> > Although I've not tested it because I don't have good way to reproduce > the problem
I use GDB to reproduce the issue by killing the primary node with kill -9 when the standby’s flushedlsn was at the begining of a WAL segment. However, this causes to request the primary to stream data from the > requested location again, even if the standby has already valid data. It > could be wasteful, for example, when applying WAL record is delayed due to > conflict with recovery or recovery_min_apply_delay. I had overlooked this problem, thanks for pointing that out. I think that comparing walrcv->receiveStart with recptr would be a better approach. Specifically, if walrcv->receiveStart is greater than recptr, then reset walrcv->flushedUpto. This way, we can prevent fetching WAL data from the primary when the standby already has valid data.A new patch has been submitted. Regards, Pixian Shi Yugo NAGATA <nag...@sraoss.co.jp> 于2024年10月9日周三 14:46写道: > On Mon, 30 Sep 2024 15:14:54 +0800 > px shi <spxlyy...@gmail.com> wrote: > > > Thanks for responding. > > > > > > > It is odd that the standby server crashes when > > > > replication fails because the standby would keep retrying to get the > > > > next record even in such case. > > > > > > As I mentioned earlier, when replication fails, it retries to establish > > streaming replication. At this point, the value of *walrcv->flushedUpto > *is > > not necessarily the data actually flushed to disk. However, the startup > > process mistakenly believes that the latest flushed LSN is > > *walrcv->flushedUpto* and attempts to open the corresponding WAL file, > > which doesn't exist, leading to a file open failure and causing the > startup > > process to PANIC. > > Although I've not tested it because I don't have good way to reproduce the > problem, > the patch seems to fix the issue by rewinding walrcv->flushedUpto to the > requested > WAL location always when the streaming is requested to start. However, > this causes > to request the primary to stream data from the requested location again, > even if > the standby has already valid data. It could be wasteful, for example, > when applying WAL > record is delayed due to conflict with recovery or > recovery_min_apply_delay. > > It might be better if if could notice that there is not requested record > in the primary's > timeline before requesting the streaming, but I don't have a good solution > for now. > > Regards, > Yugo Nagata > > > Regards, > > Pixian Shi > > > > Yugo NAGATA <nag...@sraoss.co.jp> 于2024年9月30日周一 13:47写道: > > > > > On Wed, 21 Aug 2024 09:11:03 +0800 > > > px shi <spxlyy...@gmail.com> wrote: > > > > > > > Yugo Nagata <nag...@sraoss.co.jp> 于2024年8月21日周三 00:49写道: > > > > > > > > > > > > > > > > > > > > Is s1 a cascading standby of s2? If otherwise s1 and s2 is the > > > standbys > > > > > of > > > > > > the primary server respectively, it is not surprising that s2 has > > > > > progressed > > > > > > far than s1 when the primary fails. I believe that this is the > case > > > you > > > > > should > > > > > > use pg_rewind. Even if flushedUpto is reset as proposed in your > > > patch, > > > > > s2 might > > > > > > already have applied a WAL record that s1 has not processed yet, > and > > > > > there > > > > > > would be no gurantee that subsecuent applys suceed. > > > > > > > > > > > > > > Thank you for your response. In my scenario, s1 and s2 is the > standbys > > > of > > > > the primary server respectively, and s1 a synchronous standby and s2 > is > > > an > > > > asynchronous standby. You mentioned that if s2's replay progress is > ahead > > > > of s1, pg_rewind should be used. However, what I'm trying to address > is > > > an > > > > issue where s2 crashes during replay after s1 has been promoted to > > > primary, > > > > even though s2's progress hasn't surpassed s1. > > > > > > I understood your point. It is odd that the standby server crashes when > > > replication fails because the standby would keep retrying to get the > > > next record even in such case. > > > > > > Regards, > > > Yugo Nagata > > > > > > > > > > > Regards, > > > > Pixian Shi > > > > > > > > > -- > > > Yugo NAGATA <nag...@sraoss.co.jp> > > > > > > -- > Yugo NAGATA <nag...@sraoss.co.jp> >
v2-0001-Fix-walreceiver-set-incorrect-flushedUpto-when-sw.patch
Description: Binary data