Dear hackers, We encountered an issue with restore_command when using scp on a v16 version. When SCP cannot connect to a host, it returns a return code of 255 (I won't discuss the decision to use such a return code). The return code of the restore_command is tested at [1] by calling wait_result_is_any_signal() [2]. If the code is greater than 125 or 128, a FATAL error level is generated in [1] in ereport, which leads to the standby shutting down [3]. This is perfectly fine for us.
However, the documentation [4] states the following: > It is important for the command to return a zero exit status only if it succeeds. The command *will* be asked for file names that are not present in the archive; it must return nonzero when so asked. (...) > An exception is that if the command was terminated by a signal (other than SIGTERM, which is used as part of a database server shutdown) or an error by the shell (such as command not found), then recovery will abort and the server will not start up. Could we perhaps improve the documentation by stating that return codes over 125 or (at least) 128 will lead to the server not starting? This may help people better understand the behaviour of the restore_command and quickly solve these kinds of issues without having to examine the source code. If you agree, we can suggest a patch for the documentation. Any thoughts would be much appreciated! Thank you [1] https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/access/transam/xlogarchive.c#L268 [2] https://github.com/postgres/postgres/blob/master/src/common/wait_error.c#L126 [3] https://github.com/postgres/postgres/blob/REL_16_STABLE/src/backend/utils/error/elog.c#L560-L591 [4] https://www.postgresql.org/docs/18/runtime-config-wal.html#GUC-RESTORE-COMMAND -- Jean-Christophe Arnu