Hi, Here are some improvements we can make to pg_receivewal that were emanated after working with it in production environments:
1) As a user, I, sometimes, want my pg_receivewal to start streaming from the LSN that I provide as an input i.e. startpos instead of it calculating the stream start position 1) from its target directory or 2) from its replication slot's restart_lsn or 3) after sending IDENTIFY_SYSTEM on to the primary. This will particularly be useful when the primary is down for some time (for whatever reasons) and the WAL files that are required by the pg_receivewal may have been removed by it (I know this situation is a bit messy, but it is quite possible in production environments). Then, the pg_receivewal will calculate the start position from its target directory and request the primary with it, which the primary may not have. I have to intervene and manually delete/move the WAL files in the pg_receivewal target directory and restart the pg_receivewal so that it can continue. Instead, if pg_receivewal can accept a startpos as an option, it can just go ahead and stream from the primary. 2) Currently, RECONNECT_SLEEP_TIME is 5sec - but I may want to have more reconnect time as I know that the primary can go down at any time for whatever reasons in production environments which can take some time till I bring up primary and I don't want to waste compute cycles in the node on which pg_receivewal is running and I should be able to just set it to a higher value, say 5 min or so, after which pg_receivewal can try to perform StreamLog(); and attempt connection to primary. Thoughts? Regards, Bharath Rupireddy.