> > How often does your primary node crash, and then not recover due to WALs > corruption or WALs not existing? > > If it's _ever_ happened, you should _fix that_ instead of rolling your own > WAL archival.process. >
I once encountered a case where the recovery process failed to restore to the latest LSN due to missing WAL files in the archive. The root cause was multiple failovers between primary and standby. During one of the switchovers, the primary crashed before completing the archiving of all WAL files. When the standby was promoted to primary, it began archiving WAL files for the new timeline, resulting in a gap between the WAL files of the two timelines. Moreover, no base backup was taken during this period. Ron Johnson <ronljohnso...@gmail.com> 于2025年8月13日周三 10:11写道: > How often does your primary node crash, and then not recover due to WALs > corruption or WALs not existing? > > If it's _ever_ happened, you should _fix that_ instead of rolling your own > WAL archival.process. > > On Tue, Aug 12, 2025 at 10:05 PM px shi <spxlyy...@gmail.com> wrote: > >> Hi, Adrian >> >> Given that you are using a less then capable storage solution(S3) why do >>> you think pushing the WAL from the standby to S3 would perform any >>> better then what is happening with the primary WAL? >>> >> >> I mean that archive_mode is set to on in primary and set to always in >> standby. >> This way, even if the primary crashes, the standby can still archive WAL >> files that the primary did not archive. >> >> The solution is to use a more capable storage platform. >>> >> >> However, I believe that even if we use a more capable storage platform, >> it is still impossible to archive WAL files in real time. As long as >> real-time archiving cannot be achieved, there will always be some WAL files >> that are not archived if the primary node crashes. >> >> Adrian Klaver <adrian.kla...@aklaver.com> 于2025年8月13日周三 00:14写道: >> >>> On 8/12/25 01:24, px shi wrote: >>> > >>> > 1) What is the current archiving setup on the primary and why is >>> > lagging? >>> > >>> > The archive command uses pgBackRest to archive to S3. Because it is >>> > uploaded to S3, the archiving speed is slow, which has caused lagging. >>> > >>> > 2) Have you looked at archiving off the standby node while it is in >>> > standby per: >>> > >>> > Yes, archiving on the standby node is disabled. Is it recommended to >>> > share the WAL archive between the primary and standby nodes to avoid >>> > interruptions in archiving? >>> >>> Given that you are using a less then capable storage solution(S3) why do >>> you think pushing the WAL from the standby to S3 would perform any >>> better then what is happening with the primary WAL? >>> >>> The solution is to use a more capable storage platform. >>> >>> > >>> > Adrian Klaver <adrian.kla...@aklaver.com >>> > <mailto:adrian.kla...@aklaver.com>> 于2025年8月8日周五 23:23写道: >>> > >>> >>> -- >>> Adrian Klaver >>> adrian.kla...@aklaver.com >>> >> > > -- > Death to <Redacted>, and butter sauce. > Don't boil me, I'm still alive. > <Redacted> lobster! >