restore_command return code behaviour

Jean-Christophe Arnu Thu, 24 Jul 2025 14:18:48 -0700

Dear hackers,

We encountered an issue with restore_command when using scp on a v16
version. When SCP cannot connect to a host, it returns a return code of 255
(I won't discuss the decision to use such a return code). The return code of
the restore_command is tested at [1] by calling wait_result_is_any_signal()
[2]. If the code is greater than 125 or 128, a FATAL error level is
generated in [1] in ereport, which leads to the standby shutting down [3]. This
is perfectly fine for us.


However, the documentation [4] states the following:
> It is important for the command to return a zero exit status only if it
succeeds. The command *will* be asked for file names that are not present
in the archive; it must return nonzero when so asked. (...)
> An exception is that if the command was terminated by a signal (other
than SIGTERM, which is used as part of a database server shutdown) or an
error by the shell (such as command not found), then recovery will abort
and the server will not start up.

Could we perhaps improve the documentation by stating that return codes
over 125 or (at least) 128 will lead to the server not starting?

This may help people better understand the behaviour of the restore_command
and quickly solve these kinds of issues without having to examine the
source code.

If you agree, we can suggest a patch for the documentation.

Any thoughts would be much appreciated!

Thank you


[1]
https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/access/transam/xlogarchive.c#L268
[2]
https://github.com/postgres/postgres/blob/master/src/common/wait_error.c#L126
[3]
https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/utils/error/elog.c#L560-L591
[4]
https://www.postgresql.org/docs/18/runtime-config-wal.html#GUC-RESTORE-COMMAND
-- 
Jean-Christophe Arnu

restore_command return code behaviour

Reply via email to