Hi, Am Freitag, den 22.03.2019, 17:37 +0900 schrieb Michael Paquier: > On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote: > > Don't we need a big warning that the cluster must not be started during > > operation of pg_checksums as well, now that we don't disallow it? > > The same applies to pg_rewind and pg_basebackup, so I would classify > that as a pilot error.
How would it apply to pg_basebackup? The cluster is running while the base backup is taken and I believe the control file is written at the end so you can't start another instance off the backup directory until the base backup has finished. It would apply to pg_rewind, but pg_rewind's runtime is not scaling with cluster size, does it? pg_checksums will run for hours on large clusters so the window of errors is much larger and I don't think you can easily compare the two. > How would you formulate that in the docs if you add it. (I would try to make sure you can't start the cluster but that seems off the table for now) How about this: + <refsect1> + <title>Notes</title> + <para> + When enabling checksums in a cluster, the operation can potentially take a + long time if the data directory is large. During this operation, the + cluster or other programs that write to the data directory must not be + started or else data-loss will occur. + </para> + + <para> + When disabling or enabling checksums in a cluster of multiple instances, [...] Also, the following is not very clear to me: + If the event of a crash of the operating system while enabling or s/If/In/ + disabling checksums, the data folder may have checksums in an inconsistent + state, in which case it is recommended to check the state of checksums + in the data folder. How is the user supposed to check the state of checksums? Do you mean that if the user intended to enable checksums and the box dies in between, they should check whether checksums are actually enabled and re-run if not? Because it could also mean running pg_checksums --check on the cluster, which wouldn't work in that case as the control file has not been updated yet. Maybe it could be formulated like "If pg_checksums is aborted or killed in its operation while enabling or disabling checksums, the cluster will have the same state with respect of checksums as before the operation and pg_checksums needs to be restarted."? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz