checkpoint_timeout parameter & WAL archive delay, pgbackrest fails

KK CHN Fri, 02 May 2025 00:14:44 -0700

Hi folks,
My pgbackrest  backup  on one of  my RepoServer fails.   The backup fails
some times with the  error WAL file cannot be archived before 60000 ms
timeout.


The pgbackrest  stanza check command is sometimes successful, but sometimes
fails.

I don't know why    PG is unable to   copy  WAL files from   pg_wal to
/data/myarchive_dir in real time. I always  observed a delay of around 10
minutes for a wal file in pg_wal to appear in  /data/my_archive_dir.

On investigation I'hv observed that  our DB admin has put
checkpoint_timeout = 10 m  in the  postgresql.conf  file.

I think this causes the WAL archiving delay and  subsequently my
pgbackrest   fails  while trying to backup the DB  to a remote RepoServer.

What the ideal value needed to be set for  "checkpoint_timeout"  to
overcome this issue.  I don't want  pgbackrest backup fails due to this
parameter ?.    ( Is it possible to set a very minimum value for
checkpoint_timeout  what is the minimum value  or can I put   0  ? )


archive_command = 'pgbackrest --stanza=My_Repo archive-push %p && cp %p
/data/archive/%f'


>From postgresql logs  I am seeing this ..

ERROR: [082]: unable to push WAL file '000000010000026300000002' to the
archive asynchronously after 60 second(s)
       HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for
errors.
INFO: archive-push command end: aborted with exception [082]
2025-05-02 12:15:17 IST LOG:  archive command failed with exit code 82
2025-05-02 12:15:17 IST DETAIL:  The failed archive command was: pgbackrest
--stanza=My_Repo archive-push pg_wal/000000010000026300000002 && cp
pg_wal/000000010000026300000002 /data/archive/000000010000026300000002
INFO: archive-push command begin 2.52.1: [pg_wal/000000010000026300000002]
--archive-async --compress-type=zst --exec-id=2848559-384cf49c
--log-level-console=info --log-level-file=debug --log-level-stderr=info
--pg1-path= /var/lib/postgres/16/data   --pg-version-force=16
--process-max=6 --repo1-host=10.50.12.202 --repo1-host-user=pgbackrest
--spool-path=/var/spool/pgbackrest --stanza=My_Repo

top  output   on DB cluster:

top - 12:37:00 up 66 days, 17:24,  2 users,  load average: 4.04, 4.72, 4.56

Tasks: 902 total,   4 running, 897 sleeping,   0 stopped,   1 zombie
%Cpu(s):  7.4 us,  1.7 sy,  0.0 ni, 89.9 id,  0.4 wa,  0.2 hi,  0.4 si,
 0.0 st
MiB Mem :  31837.6 total,    706.1 free,  15243.0 used,  24741.0 buff/cache
MiB Swap:   8060.0 total,   6634.0 free,   1426.0 used.  16608.9 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+
COMMAND
2839363 postgre+  20   0 8965608   7.2g   7.1g S  70.2  23.0   2:02.61
postgres
2864108 postgre+  20   0 8967848   7.1g   7.1g S  64.9  22.8   0:30.04
postgres
2865547 postgre+  20   0 8965432   7.1g   7.1g S  39.1  22.8   0:32.30
postgres
2865752 postgre+  20   0 8964352   6.9g   6.9g S  16.6  22.3   0:32.94
postgres



Model name:            Intel(R) Xeon(R) Gold 6430
    BIOS Model name:     Intel(R) Xeon(R) Gold 6430
    CPU family:          6
    Model:               143
    Thread(s) per core:  1
    Core(s) per socket:  16

These are vCPUs    (16 nos) , OS RHEL 9,  postgres 16

Any hints on how to make  pgbackrest take backup properly are much
appreciated.


Thanks,
Krishane

checkpoint_timeout parameter & WAL archive delay, pgbackrest fails

Reply via email to