Hi, Am Mittwoch, den 10.02.2021, 15:06 +0900 schrieb Michael Paquier: > On Tue, Feb 09, 2021 at 10:54:50AM +0200, Heikki Linnakangas wrote: > > (I may have said this before, but) My overall high-level impression of this > > patch is that it's really cmmplex for a feature that you use maybe once in > > the lifetime of a cluster. I'm happy to review but I'm not planning to > > commit this myself. I don't object if some other committer picks this up > > (Magnus?). > > I was just looking at the latest patch set as a matter of curiosity, > and I have a shared feeling.
I think this still would be a useful feature; not least for the online deactivation - having to shut down the instance is sometimes just not an option in production, even for just a few seconds. However, there is also the shoot-the-whole-database-into-WAL (at least, that is what happens, AIUI) issue which has not been discussed that much either, the patch allows throttling, but I think the impact on actual production workloads are not very clear yet. > I think that this is a lot of complication in-core for what would be a > one-time operation, particularly knowing that there are other ways to > do it already with the offline checksum tool, even if that is more > costly: > - Involve logical replication after initializing the new instance with > --data-checksums, or in an upgrade scenatio with pg_upgrade. Logical replication is still somewhat unpractical for such a (possibly) routine task, and I don't understand your pg_upgrade scenario, can expand on that a bit? > - Involve physical replication: stop the standby cleanly, enable > checksums on it and do a switchover. I would like to focus on this, so I changed the subject in order not to derail the online acivation patch thread. If this is something we support, then we should document it. I have to admit that this possiblity escaped me when we first committed offline (de)activation, it was brought to my attention via https://twitter.com/samokhvalov/status/1281312586219188224 and the following discussion. So if we think this (to recap: shut down the standby, run pg_checksums on it, start it up again, wait until it is back in sync, then switchover) is a safe way to activate checksums on a streaming replication setup, then we should document it I think. However, I have only seen sorta hand-waiving on this so far and no deeper analysis of what could possibly go wrong (but doesn't). Anybody did some further work/tests on this and/or has something written up to contribute to the documentation? Or do we think this is not appropriate to document? I think once we agree this is safe, it is not more complicated than the rsync-the-standby-after-pg_upgrade recipe we did document. > Another thing we could do is to improve pg_checksums with a parallel > mode. The main design question would be how to distribute the I/O, > and that would mean balancing at least across tablespaces. Right. I thought about this a while ago, but didn't have time to work on it so far. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz