Re: Questions about the continuity of WAL archiving

px shi Tue, 12 Aug 2025 19:24:37 -0700

>
> How often does your primary node crash, and then not recover due to WALs
> corruption or WALs not existing?
>
> If it's _ever_ happened, you should _fix that_ instead of rolling your own
> WAL archival.process.
>


 I once encountered a case where the recovery process failed to restore to
the latest LSN due to missing WAL files in the archive. The root cause was
multiple failovers between primary and standby. During one of the
switchovers, the primary crashed before completing the archiving of all WAL
files. When the standby was promoted to primary, it began archiving WAL
files for the new timeline, resulting in a gap between the WAL files of the
two timelines. Moreover, no base backup was taken during this period.


Ron Johnson <[email protected]> 于2025年8月13日周三 10:11写道：

> How often does your primary node crash, and then not recover due to WALs
> corruption or WALs not existing?
>
> If it's _ever_ happened, you should _fix that_ instead of rolling your own
> WAL archival.process.
>
> On Tue, Aug 12, 2025 at 10:05 PM px shi <[email protected]> wrote:
>
>> Hi, Adrian
>>
>> Given that you are using a less then capable storage solution(S3) why do
>>> you think pushing the WAL from the standby to S3 would perform any
>>> better then what is happening with the primary WAL?
>>>
>>
>> I mean that archive_mode is set to on in primary and set to always in
>> standby.
>> This way, even if the primary crashes, the standby can still archive WAL
>> files that the primary did not archive.
>>
>> The solution is to use a more capable storage platform.
>>>
>>
>>  However, I believe that even if we use a more capable storage platform,
>> it is still impossible to archive WAL files in real time. As long as
>> real-time archiving cannot be achieved, there will always be some WAL files
>> that are not archived if the primary node crashes.
>>
>> Adrian Klaver <[email protected]> 于2025年8月13日周三 00:14写道：
>>
>>> On 8/12/25 01:24, px shi wrote:
>>> >
>>> >     1) What is the current archiving setup on the primary and why is
>>> >     lagging?
>>> >
>>> >   The archive command uses pgBackRest to archive to S3. Because it is
>>> > uploaded to S3, the archiving speed is slow, which has caused lagging.
>>> >
>>> >     2) Have you looked at archiving off the standby node while it is in
>>> >     standby per:
>>> >
>>> > Yes, archiving on the standby node is disabled. Is it recommended to
>>> > share the WAL archive between the primary and standby nodes to avoid
>>> > interruptions in archiving?
>>>
>>> Given that you are using a less then capable storage solution(S3) why do
>>> you think pushing the WAL from the standby to S3 would perform any
>>> better then what is happening with the primary WAL?
>>>
>>> The solution is to use a more capable storage platform.
>>>
>>> >
>>> > Adrian Klaver <[email protected]
>>> > <mailto:[email protected]>> 于2025年8月8日周五 23:23写道：
>>> >
>>>
>>> --
>>> Adrian Klaver
>>> [email protected]
>>>
>>
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>

Re: Questions about the continuity of WAL archiving

Reply via email to