Re: Stronger safeguard for archive recovery not to miss data

Fujii Masao Tue, 30 Mar 2021 10:12:09 -0700



On 2021/03/26 22:14, David Steele wrote:

On 3/25/21 9:23 PM, Fujii Masao wrote:


On 2021/03/25 23:21, David Steele wrote:

On 1/25/21 3:55 AM, Laurenz Albe wrote:

On Mon, 2021-01-25 at 08:19 +0000, [email protected] wrote:

I think you should pst another patch where the second, now superfluous, error
message is removed.


Updated. This patch showed no failure during regression tests
and has been aligned by pgindent.


Looks good to me.
I'll set it to "ready for committer" again.


Fujii, does the new patch in [1] address your concerns?


No. I'm still not sure if this patch is good idea... I understand
why this safeguard is necessary. OTOH I'm afraid it increases
a bit the risk that users get unstartable database, i.e., lose whole database.
But maybe I'm concerned about rare case and my opinion is minority one.
So I'd like to hear more opinions about this patch.


After reviewing the patch I am +1. I think allowing corruption in recovery by 
default is not a good idea. There is currently a warning but I don't think that 
is nearly strong enough and is easily missed.

Also, "data may be missing" makes this sound like an acceptable situation. What 
is really happening is corruption is being introduced that may make the cluster unusable 
or at the very least lead to errors during normal operation.


Ok, now you, Osumi-san and Laurenz agree to this change
while I'm only the person not to like this. So unless we can hear
any other objections to this change, probably we should commit the patch.

If we want to allow recovery to play past this point I think it would make more 
sense to have a GUC (like ignore_invalid_pages [1]) that allows recovery to 
proceed and emits a warning instead of fatal.

Looking the patch, I see a few things:

1) Typo in the tests:

This protection shold apply to recovery mode

should be:

This protection should apply to recovery mode

2) I don't think it should be the job of this patch to restructure the if 
conditions, even if it is more efficient. It just obscures the purpose of the 
patch.

+1

So, I would revert all the changes in xlog.c except changing the warning to an 
error:

-        ereport(WARNING,
-                (errmsg("WAL was generated with wal_level=minimal, data may be 
missing"),
-                 errhint("This happens if you temporarily set wal_level=minimal 
without taking a new base backup.")));
+            ereport(FATAL,
+                    (errmsg("WAL was generated with wal_level=minimal, cannot 
continue recovering"),
+                     errdetail("This happens if you temporarily set 
wal_level=minimal on the server."),
+                     errhint("Run recovery again from a new base backup taken after 
setting wal_level higher than minimal")));

I guess that users usually encounter this error because they have not
taken base backups yet after setting wal_level to higher than minimal
and have to use the old base backup for archive recovery. So I'm not sure
how much only this HINT is helpful for them. Isn't it better to append
something like "If there is no such backup, recover to the point in time
before wal_level is set to minimal even though which cause data loss,
to start the server." into HINT?

The following error will never be thrown because of the patch.
So IMO the following code should be removed.

                if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
                        ereport(ERROR,
                                        (errmsg("hot standby is not possible because wal_level 
was not set to \"replica\" or higher on the primary server"),
                                         errhint("Either set wal_level to 
\"replica\" on the primary, or turn off hot_standby here.")));

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Stronger safeguard for archive recovery not to miss data

Reply via email to