Re: Offline enabling/disabling of data checksums

2019-03-29 Thread Michael Paquier
On Tue, Mar 26, 2019 at 01:41:38PM +0100, Fabien COELHO wrote: >> I am not sure that "checksum status" is a correct term. It seems to >> me that "same configuration for data checksums as before the tool ran" >> or something like that would be more correct. > > Possibly, I cannot say. I have put

Re: Offline enabling/disabling of data checksums

2019-03-26 Thread Fabien COELHO
Bonjour Michaël, Here is an attempt at improving the Notes. [...] So, the ordering of the notes for each paragraph is as follows: 1) Replication issues when mixing different checksum setups across nodes. 2) Consistency of the operations if killed. 3) Don't start Postgres while the operatio

Re: Offline enabling/disabling of data checksums

2019-03-26 Thread Michael Paquier
On Sat, Mar 23, 2019 at 02:14:02PM +0100, Fabien COELHO wrote: > Here is an attempt at improving the Notes. > > Mostly it is a reordering from more important (cluster corruption) to less > important (if interrupted a restart is needed), some reordering from problem > to solutions instead of soluti

Re: Offline enabling/disabling of data checksums

2019-03-23 Thread Fabien COELHO
Bonjour Michaël, Here is an attempt at improving the Notes. Mostly it is a reordering from more important (cluster corruption) to less important (if interrupted a restart is needed), some reordering from problem to solutions instead of solution/problem/solution, some sentence simplification.

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 03:18:26PM +0100, Fabien COELHO wrote: > Attached is a quick patch about "pg_rewind", so that the control file is > updated after everything else is committed to disk. Could you start a new thread about that please? This one has already been used for too many things. -- Mi

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Sat, Mar 23, 2019 at 08:16:07AM +0900, Michael Paquier wrote: > And committed the main part. I'll look after the --no-sync part in a > bit. --no-sync is committed as well now. -- Michael signature.asc Description: PGP signature

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 07:02:36PM +0100, Fabien COELHO wrote: > Indeed it does, and it is done in update_controlfile if the last argument is > true. Basically update_controlfile latest version always fsync the control > file, unless explicitely told not to do so. The options to do that are > reall

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 02:59:31PM +0100, Fabien COELHO wrote: > On write(), the error message is not translatable whereas it is for all > others. Fixed. > I agree that a BIG STRONG warning is needed about not to start the cluster > under pain of possible data corruption. I still think that preve

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Fabien COELHO
Hello Christoph, - pg_log(PG_PROGRESS, "syncing target data directory\n"); - syncTargetDirectory(); Doesn't the control file still need syncing? Indeed it does, and it is done in update_controlfile if the last argument is true. Basically update_controlfile latest version alway

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Christoph Berg
Re: Fabien COELHO 2019-03-22 > Attached is a quick patch about "pg_rewind", so that the control file is > updated after everything else is committed to disk. > update_controlfile(datadir_target, progname, &ControlFile_new, do_sync); > > - pg_log(PG_PROGRESS, "syncing target data direc

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Fabien COELHO
Done the switch for this case. For pg_rewind actually I think that this is an area where its logic could be improved a bit. So first the data folder is synced, and then the control file is updated. Attached is a quick patch about "pg_rewind", so that the control file is updated after every

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Fabien COELHO
Bonjour Michaël, Does that look fine to you? Mostly. Patch v9 part 1 applies cleanly, compiles, global and local check ok, doc build ok. On write(), the error message is not translatable whereas it is for all others. I agree that a BIG STRONG warning is needed about not to start the cl

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 10:04:02AM +0100, Michael Banck wrote: > How about this: > > + > + Notes > + > + When enabling checksums in a cluster, the operation can potentially take a > + long time if the data directory is large. During this operation, the > + cluster or other programs tha

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Banck
Hi, Am Freitag, den 22.03.2019, 17:37 +0900 schrieb Michael Paquier: > On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote: > > Don't we need a big warning that the cluster must not be started during > > operation of pg_checksums as well, now that we don't disallow it? > > The same appl

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Paquier
On Fri, Mar 22, 2019 at 09:13:43AM +0100, Michael Banck wrote: > Don't we need a big warning that the cluster must not be started during > operation of pg_checksums as well, now that we don't disallow it? The same applies to pg_rewind and pg_basebackup, so I would classify that as a pilot error.

Re: Offline enabling/disabling of data checksums

2019-03-22 Thread Michael Banck
Hi, Am Freitag, den 22.03.2019, 09:27 +0900 schrieb Michael Paquier: > I have added in the docs a warning about a host crash while doing the > operation, with a recommendation to check the state of the checksums > on the data folder should it happen, and the previous portion of the > docs about c

Re: Offline enabling/disabling of data checksums

2019-03-21 Thread Michael Paquier
On Thu, Mar 21, 2019 at 08:17:32AM +0100, Fabien COELHO wrote: > I can try, but I must admit that I'm fuzzy about the actual issue. Is there > a problem on a streaming replication with inconsistent checksum settings, or > not? > > You seem to suggest that the issue is more about how some commands

Re: Offline enabling/disabling of data checksums

2019-03-21 Thread Fabien COELHO
Anyway, as this stuff is very useful for upgrade scenarios a-la-pg_upgrade, for backup validation and as it does not produce false positives, I would really like to get something committed for v12 in its simplest form... Fine with me, the detailed doc is not a showstopper and can be improve

Re: Offline enabling/disabling of data checksums

2019-03-21 Thread Fabien COELHO
Bonjour Michaël, On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote: I think that the motivation/risks should appear before the solution. "As xyz ..., ...", or there at least the logical link should be outlined. It is not clear for me whether the following sentences, which seems sp

Re: Offline enabling/disabling of data checksums

2019-03-21 Thread Michael Paquier
On Thu, Mar 21, 2019 at 07:59:24AM +0900, Michael Paquier wrote: > Please note that we do that in other tools as well and we live fine > with that as pg_basebackup, pg_rewind just to name two. I am not > saying that it is not a problem in some cases, but I am saying that > this is not a problem th

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Michael Paquier
On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote: > I think that the motivation/risks should appear before the solution. "As xyz > ..., ...", or there at least the logical link should be outlined. > > It is not clear for me whether the following sentences, which seems specific > to "p

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO
Michaël-san, I think that a clear warning not to run any cluster command in parallel, under pain of possible cluster corruption, and possibly other caveats about replication, should appear in the documentation. I still have the following extra documentation in my notes: Ok, it should have b

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Michael Paquier
On Wed, Mar 20, 2019 at 10:38:36AM +0100, Fabien COELHO wrote: > Hmmm… so nothing:-) The core of the feature is still here, fortunately. > I think that a clear warning not to run any cluster command in parallel, > under pain of possible cluster corruption, and possibly other caveats about > repli

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO
Michaël-san, In short, you keep the main feature with: - No tweaks with postmaster.pid. - Rely just on the control file indicating an instance shutdown cleanly. - No tweaks with the system ID. - No renaming of the control file. Hmmm… so nothing:-) I think that this feature is useful, in comp

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO
Hallo Andres, [...] pg_upgrade in link mode intentionally wants to *permanently* disable a cluster. And it explicitly writes a log message about it. That's not a case to draw inferrence for this case. Ok. My light knowledge of pg_upgrade inner working does not extend to this level of prec

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Andres Freund
Hi, On 2019-03-20 07:55:39 +0100, Fabien COELHO wrote: > > And you're basically adding it because Fabien doesn't like > > postmaster.pid and wants to invent another lockout mechanism in this > > thread. > > I did not suggest to rename the control file, but as it is already done by > another comma

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Fabien COELHO
Hallo Andres, And you're basically adding it because Fabien doesn't like postmaster.pid and wants to invent another lockout mechanism in this thread. I did not suggest to rename the control file, but as it is already done by another command it did not look like a bad idea in itself, or at le

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Paquier
On Wed, Mar 20, 2019 at 08:09:07AM +0900, Michael Paquier wrote: > In short, you keep the main feature with: > - No tweaks with postmaster.pid. > - Rely just on the control file indicating an instance shutdown > cleanly. > - No tweaks with the system ID. > - No renaming of the control file. FWIW,

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Paquier
On Tue, Mar 19, 2019 at 09:47:17AM -0700, Andres Freund wrote: > I'm not sure it needs to be this patch's responsibility to come up with > a scheme here at all however. pg_rewind, pg_resetwal, pg_upgrade all > don't really have a lockout mechanism, and it hasn't caused a ton of > problems. I think

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Andres Freund
Hi, On 2019-03-19 17:30:16 +0100, Michael Banck wrote: > Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund: > > On 2019-03-19 17:08:17 +0100, Michael Banck wrote: > > > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund: > > > > On 2019-03-19 16:55:12 +0100, Michael Banck

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Banck
Hi, Am Dienstag, den 19.03.2019, 09:13 -0700 schrieb Andres Freund: > On 2019-03-19 17:08:17 +0100, Michael Banck wrote: > > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund: > > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote: > > > > Am Dienstag, den 19.03.2019, 08:36 -0700 sch

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Andres Freund
Hi, On 2019-03-19 17:08:17 +0100, Michael Banck wrote: > Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund: > > On 2019-03-19 16:55:12 +0100, Michael Banck wrote: > > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund: > > > > On 2019-03-18 17:13:01 +0900, Michael Paquie

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Banck
Hi, Am Dienstag, den 19.03.2019, 09:00 -0700 schrieb Andres Freund: > On 2019-03-19 16:55:12 +0100, Michael Banck wrote: > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund: > > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote: > > > > +/* > > > > + * Locations of persistent an

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Andres Freund
On 2019-03-19 16:55:12 +0100, Michael Banck wrote: > Hi, > > Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund: > > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote: > > > +/* > > > + * Locations of persistent and temporary control files. The control > > > + * file gets renamed in

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Banck
Hi, Am Dienstag, den 19.03.2019, 08:36 -0700 schrieb Andres Freund: > On 2019-03-18 17:13:01 +0900, Michael Paquier wrote: > > +/* > > + * Locations of persistent and temporary control files. The control > > + * file gets renamed into a temporary location when enabling checksums > > + * to preven

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Andres Freund
Hi, On 2019-03-18 17:13:01 +0900, Michael Paquier wrote: > +/* > + * Locations of persistent and temporary control files. The control > + * file gets renamed into a temporary location when enabling checksums > + * to prevent a parallel startup of Postgres. > + */ > +#define CONTROL_FILE_PATH

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Fabien COELHO
Ok, this might not work, because of the following, less likely, race condition: postmaster opens control file RW pg_checksums moves control file, postmater open file handle follows ... So ISTM that we really need some locking to have something clean. We are talking about complicating a m

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Banck
Hi, Am Dienstag, den 19.03.2019, 11:48 +0100 schrieb Fabien COELHO: > Moving the controlfile looks like an effective way to prevent any > concurrent start, as the fs operation is probably atomic and especially if > external tools uses the same trick. However this is not the case yet, eg > "pg_r

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Michael Paquier
On Tue, Mar 19, 2019 at 11:48:25AM +0100, Fabien COELHO wrote: > Moving the controlfile looks like an effective way to prevent any concurrent > start, as the fs operation is probably atomic and especially if external > tools uses the same trick. However this is not the case yet, eg > "pg_resetwal"

Re: Offline enabling/disabling of data checksums

2019-03-19 Thread Fabien COELHO
Bonjour Michaël, Please find attached an updated patch set, I have rebased that stuff on top of my recent commits to refactor the control file updates. Patch applies cleanly, compiles, make check-world seems ok, doc build ok. It would help if the patch includes a version number. I assume tha

Re: Offline enabling/disabling of data checksums

2019-03-18 Thread Michael Paquier
On Fri, Mar 15, 2019 at 01:37:27PM +0100, Michael Banck wrote: > Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier: >> Perhaps having them under --verbose makes more sense? > > Well if we think it is essential in order to tell the user what happened > in the case of an error, it shou

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO
Bonjour Michaël, Here are my notes about the fixes: Thanks for the fixes. - pg_resetwal got broken because the path to the control file was incorrect. Running tests of pg_upgrade or the TAP tests of pg_resetwal showed the failure. Hmmm… I thought I had done that with "make check-world":-

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier
On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote: > I kept the initial no-parameter function which calls the new one with 4 > parameters, though, because it looks more homogeneous this way in the > backend code. This is debatable. From a compatibility point of view, your position actu

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO
You have one error at the end of update_controlfile(), where close() could issue a frontend-like error for the backend, calling exit() on the way. That's not good. (No need to send a new patch, I'll fix it myself.) Indeed. I meant to merge the "if (close(fd))", but ended merging the error

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier
On Sun, Mar 17, 2019 at 12:44:39PM +0100, Fabien COELHO wrote: > I could remove the two "catalog/" includes from pg_resetwal, I assume that > you meant these ones. Not exactly. What I meant is that if you try to call directly fsync_fname and fsync_parent_path from file_utils.h, then you get into

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO
Michaël-san, The issue here is that trying to embed directly the fsync routines from file_utils.c into pg_resetwal.c messes up the inclusions because pg_resetwal.c includes backend-side includes, which themselves touch fd.h :( In short your approach avoids some extra mess with the include depe

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Michael Paquier
On Sun, Mar 17, 2019 at 10:01:20AM +0100, Fabien COELHO wrote: > The implementation basically removes a lot of copy paste and calls the new > update_controlfile function instead. I like removing useless code:-) Yes, I spent something like 10 minutes looking at that code yesterday and I agree that

Re: Offline enabling/disabling of data checksums

2019-03-17 Thread Fabien COELHO
Bonjour Michaël-san, Yes, that would be nice, for now I have focused. For pg_resetwal yes we could do it easily. Would you like to send a patch? Here is a proposal for "pg_resetwal". The implementation basically removes a lot of copy paste and calls the new update_controlfile function ins

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi, Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier: > On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote: > > 1. There's a typo in line 578 which makes it fail to compile: > > > > > src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first > > > use in t

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote: > 1. There's a typo in line 578 which makes it fail to compile: > > |src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use > in this function) > | }y I am wondering where you got this one. My local branch doe

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi, Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier: > I have been able to grab some time to incorporate the feedback gathered > on this thread, and please find attached a new version of the patch to > add --enable/--disable. Some more feedback: 1. There's a typo in line 578 whi

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi, Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander: > On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov wrote: > > One new question from me: how about replication? > > Case: primary+replica, we shut down primary and enable checksum, and > > "started streaming WAL from primary" wi

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 09:52:11AM +0100, Magnus Hagander wrote: > As I said, that's a big hammer. I'm all for having a better solution. But I > don't think it's acceptable not to have *any* defense against it, given how > bad corruption it can lead to. Hm... It looks that my arguments are not co

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 4:26 PM Michael Banck wrote: > Hi, > > Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander: > > Given that the failure is data corruption, I don't think big fat > > warning is enough. We should really make it impossible to start up the > > postmaster by mist

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 4:54 PM Michael Banck wrote: > Hi, > > Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander: > > On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg wrote: > > > Re: Magnus Hagander 2019-03-14 zmb8qck7ndmchey5...@mail.gmail.com> > > > > Are you suggesting we sho

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Fri, Mar 15, 2019 at 1:49 AM Michael Paquier wrote: > On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote: > > Are you suggesting we should support running with a master with checksums > > on and a standby with checksums off in the same cluster? That seems.. > Very > > fragile. > >

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 09:04:51AM +0100, Michael Banck wrote: > ISTM this would not run fsync_parent_path() unless the first fsync fails > which is not the intended use. I guess we need two ifs here? Yes, let's do that. Let's see if others have input to offer about the patch. This thread is gat

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi, Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier: > On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote: > > Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander: > > > One big-hammer method could be similar to what pg_upgrade does -- > > > temporarily re

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Paquier
On Fri, Mar 15, 2019 at 11:50:27AM +0900, Michael Paquier wrote: > - Rename the control file when beginning the enabling operation, with > a callback to rename the file back if the operation is interrupted. > > Does this make sense? Just before I forget... Please note that this handles interrupt

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Paquier
On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote: > Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander: >> One big-hammer method could be similar to what pg_upgrade does -- >> temporarily rename away the controlfile so postgresql can't start, and >> when done, put it ba

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Paquier
On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote: > Are you suggesting we should support running with a master with checksums > on and a standby with checksums off in the same cluster? That seems.. Very > fragile. Well, saying that it is supported is a too big term for that. What I

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Banck
Hi, Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander: > On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg wrote: > > Re: Magnus Hagander 2019-03-14 > > > > > Are you suggesting we should support running with a master with checksums > > > on and a standby with checksums off in the

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Banck
Hi, Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander: > Given that the failure is data corruption, I don't think big fat > warning is enough. We should really make it impossible to start up the > postmaster by mistake during the checksum generation. People don't > read the docume

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg wrote: > Re: Magnus Hagander 2019-03-14 zmb8qck7ndmchey5...@mail.gmail.com> > > Are you suggesting we should support running with a master with checksums > > on and a standby with checksums off in the same cluster? That seems.. > Very > > fragile. >

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Christoph Berg
Re: Magnus Hagander 2019-03-14 > Are you suggesting we should support running with a master with checksums > on and a standby with checksums off in the same cluster? That seems.. Very > fragile. The case "shut down master and standby, run pg_checksums on both, and start them again" should be sup

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 5:39 AM Michael Paquier wrote: > On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote: > > On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote: > > > The attached patch should do the above, on top of Michael's last > > > patchset. > > > > What you are

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 1:23 AM Michael Paquier wrote: > On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote: > > Enabling or disabling the checksums offline on the master quite clearly > > requires a rebuild of the standby, there is no other way (this is one of > > the reasons for th

Re: Offline enabling/disabling of data checksums

2019-03-14 Thread Michael Banck
Hi, Am Mittwoch, den 13.03.2019, 17:54 +0100 schrieb Magnus Hagander: > On Wed, Mar 13, 2019 at 4:51 PM Michael Banck > wrote: > > Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander: > > > I think this is dangerous enough that it needs to be enforced and not > > > documented. > >

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 08:56:33PM +0900, Michael Paquier wrote: > On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote: > > The attached patch should do the above, on top of Michael's last > > patchset. > > What you are doing here looks like a good defense in itself. More thoughts on th

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 12:24:21PM +0100, Magnus Hagander wrote: > Enabling or disabling the checksums offline on the master quite clearly > requires a rebuild of the standby, there is no other way (this is one of > the reasons for the online enabling in that patch, so I still hope we can > get tha

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote: > Seems good. And I think we need backpath this check to pg11. similar > to cross-version compatibility checks +fprintf(stderr, _("%s: data directory block size %d is different to compiled-in block size %d.\n"), +pr

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 4:51 PM Michael Banck wrote: > Hi, > > Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander: > > I think this is dangerous enough that it needs to be enforced and not > > documented. > > Changing the cluster ID might have some other side-effects, I think > ther

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 4:46 PM Michael Banck wrote: > Hi, > > Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander: > > On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov wrote: > > > One new question from me: how about replication? > > > Case: primary+replica, we shut down primary an

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Hi, Am Mittwoch, den 13.03.2019, 12:43 +0100 schrieb Magnus Hagander: > I think this is dangerous enough that it needs to be enforced and not > documented. Changing the cluster ID might have some other side-effects, I think there are several cloud-native 3rd party solutions that use the cluster I

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Hi, Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander: > On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov wrote: > > One new question from me: how about replication? > > Case: primary+replica, we shut down primary and enable checksum, and > > "started streaming WAL from primary" wi

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 02:43:39PM +0300, Sergei Kornilov wrote: > Seems good. And I think we need backpath this check to pg11. similar > to cross-version compatibility checks Good point raised, a backpatch looks adapted. It would be nice to get into something more dynamic, but pg_checksum_block

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 12:09:24PM +0100, Michael Banck wrote: > The attached patch should do the above, on top of Michael's last > patchset. What you are doing here looks like a good defense in itself. -- Michael signature.asc Description: PGP signature

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Sergei Kornilov
Hi, >>  > Also we support ./configure --with-blocksize=(not equals 8)? make >>  > check on HEAD fails for me. If we support this - i think we need >>  > recheck BLCKSZ between compiled pg_checksum and used in PGDATA >> >>  You mean if the backend and pg_checksums is built with different >>  blocks

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 12:40 PM Sergei Kornilov wrote: > Hi > > >> One new question from me: how about replication? > >> Case: primary+replica, we shut down primary and enable checksum, and > "started streaming WAL from primary" without any issue. I have master with > checksums, but replica with

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Sergei Kornilov
Hi >> One new question from me: how about replication? >> Case: primary+replica, we shut down primary and enable checksum, and >> "started streaming WAL from primary" without any issue. I have master with >> checksums, but replica without. >> Or cluster with checksums, then disable checksums on

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Hi, Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander: > > Also we support ./configure --with-blocksize=(not equals 8)? make > > check on HEAD fails for me. If we support this - i think we need > > recheck BLCKSZ between compiled pg_checksum and used in PGDATA > > You mean if the b

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov wrote: > Hi > > One new question from me: how about replication? > Case: primary+replica, we shut down primary and enable checksum, and > "started streaming WAL from primary" without any issue. I have master with > checksums, but replica without. >

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Hi, Am Mittwoch, den 13.03.2019, 11:47 +0100 schrieb Magnus Hagander: > On Wed, Mar 13, 2019 at 11:41 AM Michael Banck > wrote: > > I propose we re-read the control file for the enable case after we > > finished operating on all files and (i) check the instance is still > > offline and (ii) upda

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO
Hallo Michael, I propose we re-read the control file for the enable case after we finished operating on all files and (i) check the instance is still offline and (ii) update the checksums version from there. That should be a small but worthwhile change that could be done anyway. That looks li

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Sergei Kornilov
Hi One new question from me: how about replication? Case: primary+replica, we shut down primary and enable checksum, and "started streaming WAL from primary" without any issue. I have master with checksums, but replica without. Or cluster with checksums, then disable checksums on primary, but st

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Magnus Hagander
On Wed, Mar 13, 2019 at 11:41 AM Michael Banck wrote: > Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier: > > On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote: > > > I'm not sure of the punctuation logic on the help line: the first > sentence > > > does not end with a

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO
Hello, Yep. That is the issue I think is preventable by fsyncing updated data *then* writing & syncing the control file, and that should be done by pg_checksums. Well, pg_rewind works similarly: control file gets updated and then the whole data directory gets flushed. So it is basically pro

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Banck
Am Mittwoch, den 13.03.2019, 18:31 +0900 schrieb Michael Paquier: > On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote: > > I'm not sure of the punctuation logic on the help line: the first sentence > > does not end with a ".". I could not find an instance of this style in other > > help

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 10:44:03AM +0100, Fabien COELHO wrote: > Yep. That is the issue I think is preventable by fsyncing updated data > *then* writing & syncing the control file, and that should be done by > pg_checksums. Well, pg_rewind works similarly: control file gets updated and then the wh

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO
I do not think it is a good thing that two commands can write to the data directory at the same time, really. We don't prevent either a pg_resetwal and a pg_basebackup to run in parallel. That would be... Interesting. Yep, I'm trying again to suggest that this kind of thing should be pre

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 10:08:33AM +0100, Fabien COELHO wrote: > I'm not sure of the punctuation logic on the help line: the first sentence > does not end with a ".". I could not find an instance of this style in other > help on pg commands. I'd suggest "check data checksums (default)" would work >

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Fabien COELHO
Michaël-san, Now the set of patches is: - 0001, add --enable and --disable. I have tweaked a bit the patch so as "action" is replaced by "mode" which is more consistent with other tools like pg_ctl. pg_indent was also complaining about one of the new enum structures. Patch applies cleanly,

Re: Offline enabling/disabling of data checksums

2019-03-13 Thread Michael Paquier
On Wed, Mar 13, 2019 at 07:18:32AM +0100, Fabien COELHO wrote: > I probably can do that before next Monday. I'll prioritize reviewing the > latest instance of this patch, though. Thanks. The core code of the feature has not really changed with the last reviews, except for the tweaks in the variab

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Fabien COELHO
Bonjour Michaël, Yes, that would be nice, for now I have focused. For pg_resetwal yes we could do it easily. Would you like to send a patch? I probably can do that before next Monday. I'll prioritize reviewing the latest instance of this patch, though. This seem contradictory to me: you

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Michael Paquier
On Tue, Mar 12, 2019 at 09:44:03PM +0900, Michael Paquier wrote: > Yes, it does not matter much in practice, but other tools just don't > do that. Note that changing it can be actually annoying for a > backpatch if we don't have the --enable/--disable part, because git is > actually smart enough t

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Michael Paquier
On Tue, Mar 12, 2019 at 10:08:19PM +0100, Fabien COELHO wrote: > This refactoring patch is ok for me: applies, compiles, check is ok. > However, Am I right in thinking that the change should propagate to other > tools which manipulate the control file, eg pg_resetwal, postmaster… So that > there wo

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Fabien COELHO
Bonjour Michaël, Here is a partial review: - 0001 if a patch to refactor the routine for the control file update. I have made it backend-aware, and we ought to be careful with error handling, use of fds and such, something that v4 was not very careful about. This refactoring patch is ok for

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Michael Paquier
On Tue, Mar 12, 2019 at 11:13:46AM +0100, Michael Banck wrote: > I have a feeling it is project policy to return 0 from main(), and > exit(1) if a program aborts with an error. Yes, it does not matter much in practice, but other tools just don't do that. Note that changing it can be actually anno

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Sergei Kornilov
Hello Thank you for explain. I thought so. PS: I am not sure for now about patch status in CF app. Did not changed status regards, Sergei

Re: Offline enabling/disabling of data checksums

2019-03-12 Thread Michael Banck
Hi, Am Montag, den 11.03.2019, 14:11 + schrieb Sergei Kornilov: > > if (badblocks > 0) > > return 1; > > Small question: why return 1 instead of exit(1)? I have a feeling it is project policy to return 0 from main(), and exit(1) if a program aborts with an error. In the above case, the

  1   2   >