On 04/03/2018 02:05 PM, Magnus Hagander wrote: > On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <mag...@hagander.net > <mailto:mag...@hagander.net>> wrote: > > On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra > <tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>> > wrote: > > On 03/31/2018 05:05 PM, Magnus Hagander wrote: > > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra > > <tomas.von...@2ndquadrant.com > <mailto:tomas.von...@2ndquadrant.com> > <mailto:tomas.von...@2ndquadrant.com > <mailto:tomas.von...@2ndquadrant.com>>> wrote: > > > > ... > > > > I do think just waiting for all running transactions to > complete is > > fine, and it's not the first place where we use it - CREATE > SUBSCRIPTION > > does pretty much exactly the same thing (and CREATE INDEX > CONCURRENTLY > > too, to some extent). So we have a precedent / working code we > can copy. > > > > > > Thinking again, I don't think it should be done as part of > > BuildRelationList(). We should just do it once in the launcher > before > > starting, that'll be both easier and cleaner. Anything started after > > that will have checksums on it, so we should be fine. > > > > PFA one that does this. > > > > Seems fine to me. I'd however log waitforxid, not the oldest one. If > you're a DBA and you want to make the checksumming to proceed, > knowing > the oldest running XID is useless for that. If we log > waitforxid, it can > be used to query pg_stat_activity and interrupt the sessions > somehow. > > > Yeah, makes sense. Updated. > > > > > > And if you try this with a temporary table (not > hidden in transaction, > > > so the bgworker can see it), the worker will fail > with this: > > > > > > ERROR: cannot access temporary tables of other > sessions > > > > > > But of course, this is just another way how to crash > without updating > > > the result for the launcher, so checksums may end up > being enabled > > > anyway. > > > > > > > > > Yeah, there will be plenty of side-effect issues from that > > > crash-with-wrong-status case. Fixing that will at least > make things > > > safer -- in that checksums won't be enabled when not put > on all pages. > > > > > > > Sure, the outcome with checksums enabled incorrectly is a > consequence of > > bogus status, and fixing that will prevent that. But that > wasn't my main > > point here - not articulated very clearly, though. > > > > The bigger question is how to handle temporary tables > gracefully, so > > that it does not terminate the bgworker like this at all. > This might be > > even bigger issue than dropped relations, considering that > temporary > > tables are pretty common part of applications (and it also > includes > > CREATE/DROP). > > > > For some clusters it might mean the online checksum > enabling would > > crash+restart infinitely (well, until reaching MAX_ATTEMPTS). > > > > Unfortunately, try_relation_open() won't fix this, as the > error comes > > from ReadBufferExtended. And it's not a matter of simply > creating a > > ReadBuffer variant without that error check, because > temporary tables > > use local buffers. > > > > I wonder if we could just go and set the checksums anyway, > ignoring the > > local buffers. If the other session does some changes, > it'll overwrite > > our changes, this time with the correct checksums. But it > seems pretty > > dangerous (I mean, what if they're writing stuff while > we're updating > > the checksums? Considering the various short-cuts for > temporary tables, > > I suspect that would be a boon for race conditions.) > > > > Another option would be to do something similar to running > transactions, > > i.e. wait until all temporary tables (that we've seen at > the beginning) > > disappear. But we're starting to wait on more and more stuff. > > > > If we do this, we should clearly log which backends we're > waiting for, > > so that the admins can go and interrupt them manually. > > > > > > > > Yeah, waiting for all transactions at the beginning is pretty > simple. > > > > Making the worker simply ignore temporary tables would also be > easy. > > > > One of the bigger issues here is temporary tables are > *session* scope > > and not transaction, so we'd actually need the other session > to finish, > > not just the transaction. > > > > I guess what we could do is something like this: > > > > 1. Don't process temporary tables in the checksumworker, period. > > Instead, build a list of any temporary tables that existed > when the > > worker started in this particular database (basically anything > that we > > got in our scan). Once we have processed the complete > database, keep > > re-scanning pg_class until those particular tables are gone > (search by oid). > > > > That means that any temporary tables that are created *while* > we are > > processing a database are ignored, but they should already be > receiving > > checksums. > > > > It definitely leads to a potential issue with long running > temp tables. > > But as long as we look at the *actual tables* (by oid), we > should be > > able to handle long-running sessions once they have dropped > their temp > > tables. > > > > Does that sound workable to you? > > > > Yes, that's pretty much what I meant by 'wait until all > temporary tables > disappear'. Again, we need to make it easy to determine which > OIDs are > we waiting for, which sessions may need DBA's attention. > > I don't think it makes sense to log OIDs of the temporary > tables. There > can be many of them, and in most cases the connection/session is > managed > by the application, so the only thing you can do is kill the > connection. > > > Yeah, agreed. I think it makes sense to show the *number* of temp > tables. That's also a predictable amount of information -- logging > all temp tables may as you say lead to an insane amount of data. > > PFA a patch that does this. I've also added some docs for it. > > And I also noticed pg_verify_checksums wasn't installed, so fixed > that too. > > > PFA a rebase on top of the just committed verify-checksums patch. >
This seems OK in terms of handling errors in the worker and passing it to the launcher. I haven't managed to do any crash testing today, but code-wise it seems sane. It however still fails to initialize the attempts field after allocating the db entry in BuildDatabaseList, so if you try running with -DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this: WARNING: attempts = -1684366952 WARNING: attempts = 1010514489 WARNING: attempts = -1145390664 WARNING: attempts = 1162101570 I guess those are not the droids we're looking for? Likewise, I don't see where ChecksumHelperShmemStruct->abort gets initialized. I think it only ever gets set in launcher_exit(), but that does not seem sufficient. I suspect it's the reason for this behavior: test=# select pg_enable_data_checksums(10, 10); ERROR: database "template0" does not allow connections HINT: Allow connections using ALTER DATABASE and try again. test=# alter database template0 allow_connections = true; ALTER DATABASE test=# select pg_enable_data_checksums(10, 10); ERROR: could not start checksumhelper: already running test=# select pg_disable_data_checksums(); pg_disable_data_checksums --------------------------- (1 row) test=# select pg_enable_data_checksums(10, 10); ERROR: could not start checksumhelper: has been cancelled At which point the only thing you can do is restarting the cluster, which seems somewhat unnecessary. But perhaps it's intentional? Attached is a diff with a couple of minor comment tweaks, and correct initialization of the attempts field. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index 123638b..b0d082a 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -257,7 +257,7 @@ </para> <para> - When attempting to recover from corrupt data it may be necessarily to bypass the checksum + When attempting to recover from corrupt data it may be necessary to bypass the checksum protection in order to recover data. To do this, temporarily set the configuration parameter <xref linkend="guc-ignore-checksum-failure" />. </para> @@ -287,15 +287,17 @@ be visible to the process enabling checksums. It will also, for each database, wait for all pre-existing temporary tables to get removed before it finishes. If long-lived temporary tables are used in the application it may be necessary - to terminate these application connections to allow the checksummer process - to complete. + to terminate these application connections to allow the process to complete. + Information about open transactions and connections with temporary tables is + written to log. </para> <para> If the cluster is stopped while in <literal>inprogress</literal> mode, for any reason, then this process must be restarted manually. To do this, re-execute the function <function>pg_enable_data_checksums()</function> - once the cluster has been restarted. + once the cluster has been restarted. It is not possible to resume the work, + the process has to start from scratch. </para> <note> @@ -317,6 +319,7 @@ <para> <literal>template0</literal> is by default not accepting connections, to enable checksums you'll need to temporarily make it accept connections. + See <xref linkend="sql-alterdatabase" /> for details. </para> </note> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c index 67b9cd8..b76b268 100644 --- a/src/backend/access/transam/xlogfuncs.c +++ b/src/backend/access/transam/xlogfuncs.c @@ -700,6 +700,11 @@ pg_backup_start_time(PG_FUNCTION_ARGS) PG_RETURN_DATUM(xtime); } +/* + * Disables checksums for the cluster, unless already disabled. + * + * Has immediate effect - the checksums are set to off right away. + */ Datum disable_data_checksums(PG_FUNCTION_ARGS) { @@ -718,6 +723,12 @@ disable_data_checksums(PG_FUNCTION_ARGS) PG_RETURN_VOID(); } +/* + * Enables checksums for the cluster, unless already enabled. + * + * Supports vacuum-like cost-based throttling, to limit system load. + * Starts a background worker that updates checksums on existing data. + */ Datum enable_data_checksums(PG_FUNCTION_ARGS) { diff --git a/src/backend/postmaster/checksumhelper.c b/src/backend/postmaster/checksumhelper.c index 84a8cc8..e1c34e8 100644 --- a/src/backend/postmaster/checksumhelper.c +++ b/src/backend/postmaster/checksumhelper.c @@ -653,6 +653,8 @@ BuildDatabaseList(void) db->dboid = HeapTupleGetOid(tup); db->dbname = pstrdup(NameStr(pgdb->datname)); + elog(WARNING, "attempts = %d", db->attempts); + db->attempts = 0; DatabaseList = lappend(DatabaseList, db);