When using pg_basebackup with WAL streaming (-X stream), we have observed on a number of times in production that the streaming child exited prematurely (to no fault of the code it seems, most likely due to network middleboxes), which cause the backup to fail but only after it has run to completion. On long running backups this can consume a lot of time before it’s noticed.
By trapping the failure of the streaming process we can instead exit early to allow the user to fix and/or restart the process. The attached adds a SIGCHLD handler for Unix, and catch the returnvalue from the Windows thread, in order to break out early from the main loop. It still needs a test, and proper testing on Windows, but early feedback on the approach would be appreciated. -- Daniel Gustafsson https://vmware.com/
0001-Quick-exit-on-log-stream-child-exit-in-pg_basebackup.patch
Description: Binary data