Re: Offline enabling/disabling of data checksums

Fabien COELHO Thu, 21 Mar 2019 00:18:49 -0700


Bonjour Michaël,

On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:

I think that the motivation/risks should appear before the solution. "As xyz
..., ...", or there at least the logical link should be outlined.

It is not clear for me whether the following sentences, which seems specific
to "pg_rewind", are linked to the previous advice, which seems rather to
refer to streaming replication?


Do you have a better idea of formulation?

I can try, but I must admit that I'm fuzzy about the actual issue. Isthere a problem on a streaming replication with inconsistent checksumsettings, or not?

You seem to suggest that the issue is more about how some commands orbackup tools operate on a cluster.


I'll reread the thread carefully and will make a proposal.

Imagine for example a primary-standby with checksums disabled: [...]


Yep, that's cool.

Should not disabling in reverse order be safe? the checksum are not checked
afterwards?
I don't quite understand your comment about the ordering. If all thestandbys are destroyed first, then enabling/disabling checksums happensat a single place.

Sure. I was suggesting that disabling on replicated clusters is possiblysafer, but do not know the detail of replication & checksumming withenough precision to be that sure about it.

After the reboot, some data files are not fully updated with their
checksums, although the controlfiles tells that they are. It should then
fail after a restart when a no-checksum page is loaded?

What am I missing?


Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two.

The fact that other commands are exposed to the same potential risk is nota very good argument not to fix it.

I am not saying that it is not a problem in some cases, but I am sayingthat this is not a problem that this patch should solve.

As solving the issue involves exchanging two lines and turning one booleanparameter to true, I do not see why it should not be done. Fixing theissue takes much less time than writing about it...


And if other commands can be improved fine with me.

If we were to do something about that, it could make sense to makefsync_pgdata() smarter so as the control file is flushed last there, ordefine flush strategies there.

ISTM that this would not work: The control file update can only be done*after* the fsync to describe the cluster actual status, otherwise it isjust a question of luck whether the cluster is corrupt on an crash whilefsyncing. The enforced order of operation, with a barrier in between, isthe important thing here.


--
Fabien.

Re: Offline enabling/disabling of data checksums

Reply via email to