Hi folks, I am facing a strange issue, Pgbackrest backup fails for DIFF or INCR backups but not Full backup, with the * error WAL file cannot be archived before 60000 ms timeout.*
The pgbackrest " *stanza check* " command *sometimes succeeds, but sometimes fails.* I don't know why * PG is unable to* *copy WAL files from pg_wal to /data/myarchive_di*r* in real time*. I always observed a delay of a few minutes for a wal file from pg_wal to appear in /data/my_archive_dir. I'hv observed in the postgresql.conf (checkpoint_timeout = 5 m, max_wal_size = 16 GB, wal_keep_size=15GB, min_wal_size=80MB etc.) and I found pg_wal dir size is 16 GB on disk (du -h -d 1 ) and /data/archive = 1.2 T /dev/mapper/rhel_bcga68-data 5.0T 1.8T 3.3T 35% /data Can we suspect the 5 M is the reason for the WAL archiving delay ? Backup to a remote RepoServer for INCR or DIFF backup always fails, I found a full backup always succeeds!!! What is the ideal value needed to be set for "*checkpoint_timeout*" ? *Or this doesn't have any impact on pgbackrest failure ?* *archive_command = 'pgbackrest --stanza=My_Repo archive-push %p && cp %p /data/archive/%f' * >From postgresql logs I am seeing this .. ERROR: [082]: unable to push WAL file '000000010000026300000002' to the archive asynchronously after 60 second(s) HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for errors. INFO: archive-push command end: aborted with exception [082] 2025-05-02 12:15:17 IST LOG: archive command failed with exit code 82 2025-05-02 12:15:17 IST DETAIL: The failed archive command was: pgbackrest --stanza=My_Repo archive-push pg_wal/000000010000026300000002 && cp pg_wal/000000010000026300000002 /data/archive/000000010000026300000002 INFO: archive-push command begin 2.52.1: [pg_wal/000000010000026300000002] --archive-async --compress-type=zst --exec-id=2848559-384cf49c --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=6 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo top output on DB cluster: top - 12:37:00 up 66 days, 17:24, 2 users, load average: 4.04, 4.72, 4.56 Tasks: 902 total, 4 running, 897 sleeping, 0 stopped, 1 zombie %Cpu(s): 7.4 us, 1.7 sy, 0.0 ni, 89.9 id, 0.4 wa, 0.2 hi, 0.4 si, 0.0 st MiB Mem : 31837.6 total, 706.1 free, 15243.0 used, 24741.0 buff/cache MiB Swap: 8060.0 total, 6634.0 free, 1426.0 used. 16608.9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2839363 postgre+ 20 0 8965608 7.2g 7.1g S 70.2 23.0 2:02.61 postgres 2864108 postgre+ 20 0 8967848 7.1g 7.1g S 64.9 22.8 0:30.04 postgres 2865547 postgre+ 20 0 8965432 7.1g 7.1g S 39.1 22.8 0:32.30 postgres 2865752 postgre+ 20 0 8964352 6.9g 6.9g S 16.6 22.3 0:32.94 postgres Model name: Intel(R) Xeon(R) Gold 6430 BIOS Model name: Intel(R) Xeon(R) Gold 6430 CPU family: 6 Model: 143 Thread(s) per core: 1 Core(s) per socket: 16 These are vCPUs (16 nos) , OS RHEL 9, postgres 16, pgbackrest 2.52.1 *Any hints most welcome to find the root cause / troubleshoot the pgbackrest failures for DIFF/ INCR backups.* Thank you Krishane *For More Inputs: * For more info : you can see the full backup success here. but a diff backup fails. full backup: 20250505-070204F timestamp start/stop: 2025-05-05 07:02:04+05:30 / 2025-05-05 22:11:23+05:30 wal start/stop: 000000010000026F00000066 / 000000010000027300000045 database size: 503.1GB, database backup size: 503.1GB repo1: backup size: 79GB When I try diff backup it always fails. [root@Repo ~]# tail -f /var/log/pgbackrest/My_Repo-backup.log stack trace: command/archive/find.c:walSegmentFind:191:(this: {WalSegmentFind}, walSegment: {"000000010000027B0000006C"}) command/backup/backup.c:backupArchiveCheckCopy:(backupData: {BackupData}, manifest: {Manifest}) command/backup/backup.c:cmdBackup:(void) main.c:main:(debug log level required for parameters) -------------------------------------------------------------------- 2025-05-07 15:47:49.760 P00 INFO: backup command end: aborted with exception [082] 2025-05-07 15:47:49.760 P00 DEBUG: command/exit::exitSafe: => 82 2025-05-07 15:47:49.860 P00 DEBUG: main::main: => 82 ^C [ root@Repo ~ ~]# date Wednesday 07 May 2025 04:06:37 PM IST *The postgres log says* =316781-61d82f85 --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=3 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo INFO: pushed WAL file '000000010000027B00000082' to the archive asynchronously INFO: archive-push command end: completed successfully (37506ms) INFO: archive-push command begin 2.52.1: [pg_wal/000000010000027B00000083] --archive-async --compress-type=zst --exec-id=317334-27a30b57 --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=3 --repo1-host=10.x.y.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo ERROR: [082]: unable to push WAL file '000000010000027B00000083' to the archive asynchronously after 60 second(s) HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for errors. INFO: archive-push command end: aborted with exception [082] 2025-05-07 16:22:56 IST LOG: archive command failed with exit code 82 [ root@Repo ~ ~]#