> On Feb 26, 2019, at 09:26, Robert Haas <robertmh...@gmail.com> wrote:
>
> On Tue, Feb 26, 2019 at 12:20 PM Fujii Masao <masao.fu...@gmail.com> wrote:
>> So, let me clarify the situations;
>>
>> (1) If backup_label and recovery.signal exist, the recovery starts safely.
>> This is the normal case of recovery from the base backup.
>>
>> (2)If backup_label.pending and recovery.signal exist, as described above,
>> PANIC error happens at the start of recovery. This case can happen
>> if the operator forgets to rename backup_label.pending, i.e.,
>> operation mistake. So, after PANIC, the operator needs to fix her or
>> his mistake (i.e., rename backup_label.pending) and restart
>> the recovery.
>>
>> (3) If backup_label.pending exists but recovery.signal doesn't, the server
>> ignores (or removes) backup_label.pending and do the recovery
>> starting the pg_control's REDO location. This case can happen if
>> the server crashes while an exclusive backup is in progress.
>> So crash-in-the-middle-of-backup doesn't prevent the recovery from
>> starting in this idea.
>
> The if-conditions for 1 and 2 appear to be the same, which is confusing.
I believe #1 is when backup_label (no .pending) exists, #2 is when
backup_label.pending (with .pending) exists.
At the absolute minimum, this discussion has convinced me that we need to
create a wiki page to accurately describe the failure scenarios for both
exclusive and non-exclusive backups, and the recovery actions for them. If it
exists already, my search attempts weren't successful. If it doesn't, I'm
happy to start one.
--
-- Christophe Pettus
x...@thebuild.com