Hello, Lief, Peter.

At Thu, 21 Nov 2019 12:50:18 +0000, "Leif Gunnar Erlandsen" <l...@lako.no> 
wrote in 
> Adding another patch which is not only for recovery_target_time but also for 
> xid, name and lsn.
> 
> > After studying this a bit more, I think the current behavior is totally 
> > bogus and needs a serious
> > rethink.
> > 
> > If you specify a recovery target and it is reached, recovery pauses 
> > (depending on
> > recovery_target_action).
> > 
> > If you specify a recovery target and it is not reached when the end of the 
> > archive is reached
> > (i.e., restore_command fails), then recovery ends and the server is 
> > promoted, without any further
> > information. This is clearly wrong in multiple ways.
> 
> Yes, that is why I have created the patch.

It seems premising to be used in prepeated trial-and-error recovery by
well experiecned operators. When it is used, I think that the target
goes back gradually through repetitions so anyway we need to start
from a clean backup for each repetition, in the expected
usage. Unintended promotion doesn't harm in the case.

In this persipective, I don't think the behavior is totally wrong but
FATAL'ing at EO-WAL before target seems good to do.

> > I think what we should do is if we specify a recovery target and we don't 
> > reach it, we should
> > ereport(FATAL). Somewhere around
> > 
> If recovery pauses or a FATAL error is reported, is not important, as long as 
> it is possible to get some more WAL and continue recovery. Pause has the 
> benefit of the possibility to inspect tables in the database.
> 
> > in StartupXLOG(), where we already check for other conditions that are 
> > undesirable at the end of
> > recovery. Then a user can make fixes either by getting more WAL files to 
> > restore and adjusting the
> > recovery target and starting again. I don't think pausing is the right 
> > behavior, but perhaps an
> > argument could be made to offer it as a nondefault behavior.
> 
> Pausing was choosen in the patch as pause was the expected behaivior if 
> target was reached.
> 
> And the patch does not interfere with any other functionality as far as I 
> know.

With the current behavior, if server promotes without stopping as told
by target_action variables, it is a sign that something's wrong. But
if server pauses before reaching target, operators may overlook the
message if they don't know of the behavior. And if server poses in the
case, I think there's nothing to do.

So +1 for FATAL.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


Reply via email to