Hi,

this should probably be for pgsql-hackers, but https://www.postgresql.org/list/ 
mentioned 'You must try elsewhere first!', and this list was second best...


I wanted to point you to this github issue:

https://github.com/wal-g/wal-g/issues/1126


Basically, Postgres only knows of 3 types of return codes:

0: No problem, next WAL file...

1 - 125: End of timeline? Ok, lets stop recovery and go online

>=126: Ouch, big problem. Better not proceed, but error out with a FAIL instead


Looking at https://tldp.org/LDP/abs/html/exitcodes.html exit codes beyond 125 
is all OS related.

Like 'Permission problem or command is not an executable', or 'Control-C is 
fatal error signal 2'.


I would assume that exit code 78 would be a better choice to distinguish errors 
for the restore_command which are not os-related, but still would be better 
ending in 'Ouch, big problem. Better not proceed, but error out with a FAIL 
instead'.


I think I will work on a fix for wal-g to better distinguish in exit codes, but 
since all I currently can do is exit with a code >= 126, I wanted to bring this 
to the postgres community too.

Furthermore, this is beyond wal-g, basically for everything that runs as a 
restore_command...

Would you consider another exit code to the list so that restore_commands don't 
need to exit with error codes that where meant to signal OS-level issues?


I wanted to end with this quote from the second link I pointed to:

Ending a script with exit 127 would certainly cause confusion when 
troubleshooting (is the error code a "command not found" or a user-defined 
one?).

However, many scripts use an exit 1 as a general bailout-upon-error.

Since exit code 1 signifies so many possible errors, it is not particularly 
useful in debugging.

Which to me is not just for 127, but for all exit codes beyond 125...


Thanks.

Reply via email to