> > > Justin <zzzzz.g...@gmail.com> 于2025年8月13日周三 10:51写道: > >> >> >> On Tue, Aug 12, 2025 at 10:24 PM px shi <spxlyy...@gmail.com> wrote: >> >>> How often does your primary node crash, and then not recover due to WALs >>>> corruption or WALs not existing? >>>> >>>> If it's _ever_ happened, you should _fix that_ instead of rolling your >>>> own WAL archival.process. >>>> >>> >>> I once encountered a case where the recovery process failed to restore >>> to the latest LSN due to missing WAL files in the archive. The root cause >>> was multiple failovers between primary and standby. During one of the >>> switchovers, the primary crashed before completing the archiving of all WAL >>> files. When the standby was promoted to primary, it began archiving WAL >>> files for the new timeline, resulting in a gap between the WAL files of the >>> two timelines. Moreover, no base backup was taken during this period. >>> >>> >> >> I am not sure what the problem is here either, other than something >> seriously wrong with configuration with PostgreSQL and PgBackrest. >> >> The replica should be receiving the WAL via a replication slot using >> Streaming. Meaning the primary will keep the WAL until the replica is >> caught up, if the replica becomes disconnected due to >> max_slot_wal_keep_size aka wal_keep_segments is exceeded the replicas >> recovery_command can take offer and fetch from the WAL Archive to catch the >> replica up. This assumes hot_feedback is on so the WAL replay won't become >> delayed due to snapshot locks on the replica. >> >> If all the above is true the replica should never lag behind unless the >> disk IO layer is way undersized compared to the Primary. S3 is being >> talked about so it makes me wonder about DISK IO configuration on the >> primary vs the replica. I see this causing lag under high load where the >> replica IO layer is the bottleneck. >> >> If PgBackrest can't keep up with WAL archiving, as others have stated >> you need to configure Asynchronous Archiving. The number of workers depends >> on the load. I have a server running 8 parallel workers to archive 1TB of >> WAL daily.... And another server during maintenance tasks >> generates around 10,000 WAL files in about 2 hours using 6 PgBAckrest >> workers All to S3 buckets. >> >> The above statement makes me wonder if there is some kind of High >> Availability monitor running like pg_autofailover, that is promoting a >> replica then converting the former primary to a replica of the recently >> "promoted replica" >> >> If the above matches to what is happening, it is very easy to mess up the >> configuration for WAL archiving and backups. Part of the process of >> promoting a replica is to make sure WAL archiving is working. The replica >> after being promoted immediately kicks of autovacuum to rebuild things like >> FSM which generates a lot of WAL files. >> >> If you are losing WAL files the configuration is wrong somewhere.. >> >> Just not enough information on the series of events and the configuration >> to tell what the root cause is other than miss-configuration. >> >> >> Thanks >> Justin >> > On Wed, Aug 13, 2025 at 1:48 AM px shi <spxlyy...@gmail.com> wrote:
> Here’s a scenario: The latest WAL file on the primary node is > 0000000100000000000000AF, and the standby node has also received up to > 0000000100000000000000AF. However, the latest WAL file that has been > successfully archived from the primary is only 0000000100000000000000A1 > (WAL files from A2 to AE have not yet been archived). If the primary > crashes at this point, triggering a failover, the new primary will start > generating and archiving WAL on a new timeline (2), beginning with > 0000000200000000000000AF. It will not backfill the missing WAL files from > timeline 1 (0000000100000000000000A2 to 0000000100000000000000AE). As a > result, while the new primary does not have any local WAL gaps, the archive > directory will contain a gap in that WAL range. > I’m not sure if I explained it clearly. > > This will happen if the replica is lagging and promoted before the replica has had a chance to catch up. This is working correctly to the design intent. There are several tools available to tell us if the replica is sync before promoting. In the above case a lagging Replica was promoted, it stops looking at the previous timeline and will NOT look for the missing WAL files from the previous timeline. The replica does not even know they exist anymore. The data in the previous timeline is not accessible anymore from the Promoted Replica; it is working on a new timeline. The only place the old timeline/missed WAL files are accessible is on the crashed primary, it never archived or streamed the WAL files to the replica. Promoting an out of sync/lagging replica will result in loss of data. Does this answer the question here?