Hi, Currently, the server shuts down with a FATAL error (added by commit [1]) when the recovery target isn't reached. This can cause a server availability problem, especially in case of disaster recovery (geo restores) where the primary was down and the user is doing a PITR on a server lying in another region where it had missed to receive few of the last WAL files required to reach the recovery target. In this case, users might want the server to be available rather than a no server. With the commit [1], there's no way to achieve what users wanted.
There can be many reasons for the last few WAL files not reaching the target server where the user is performing the PITR. The primary may have been down before archiving the last few WAL files to the archive locations, or archive command fails for whatever reasons or network latency from primary to archive location and archive location to the target server, or recovery command on the target server fails or users may have chosen some wrong/futuristic recovery targets etc. If the PITR fails with FATAL error and we may ask them to restart the server, but imagine the wastage of compute resources - if there are a 1 TB of WAL files to be replayed and just last 16MB WAL file is missing, everything has to be replayed from the beginning. Here's a proposal(and a patch) to have a GUC so that users can choose either to emit a warning and promote or shutdown with FATAL error (as default) when recovery target isn't reached. In reality, users can choose to shutdown with FATAL error, if strict consistency is the necessity, otherwise they can choose to get promoted, if availability is preferred. There is some discussion around this idea in [2]. Thoughts? [1] - commit dc788668bb269b10a108e87d14fefd1b9301b793 Author: Peter Eisentraut <pe...@eisentraut.org> Date: Wed Jan 29 15:43:32 2020 +0100 Fail if recovery target is not reached Before, if a recovery target is configured, but the archive ended before the target was reached, recovery would end and the server would promote without further notice. That was deemed to be pretty wrong. With this change, if the recovery target is not reached, it is a fatal error. Based-on-patch-by: Leif Gunnar Erlandsen <l...@lako.no> Reviewed-by: Kyotaro Horiguchi <horikyota....@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b65383...@lako.no [2] - https://www.postgresql.org/message-id/b334d61396e6b0657a63dc38e16d429703fe9b96.camel%40j-davis.com Regards, Bharath Rupireddy.
v1-0001-Allow-users-to-choose-what-happens-when-recovery-.patch
Description: Binary data