Hi, I had a closer look at v3 of the patch now.
On Sat, Mar 03, 2018 at 07:23:31PM +0100, Magnus Hagander wrote: > Attached is a rebased patch which removes this optimization, updates the > pg_proc entry for the new format, and changes pg_verify_checksums to use -r > instead of -o for relfilenode. The patch applies fine with minimal fuzz and compiles with no warnings; make check and the added isolation tests, as well as the added checksum tests pass. If I blindly run "SELECT pg_enable_data_checksums();" on new cluster, I get: |postgres=# SELECT pg_enable_data_checksums(); | pg_enable_data_checksums |-------------------------- | t |(1 row) | |postgres=# SHOW data_checksums ; | data_checksums |---------------- | inprogress |(1 row) However, inspecting the log one sees: |2018-03-10 14:15:57.702 CET [3313] ERROR: Database template0 does not allow connections. |2018-03-10 14:15:57.702 CET [3313] HINT: Allow connections using ALTER DATABASE and try again. |2018-03-10 14:15:57.702 CET [3152] LOG: background worker "checksum helper launcher" (PID 3313) exited with exit code 1 and the background worker is no longer running without any obvious hint to the client. I am aware that this is discussed already, but as-is the user experience is pretty bad, I think pg_enable_data_checksums() should either bail earlier if it cannot connect to all databases, or it should be better documented. Otherwise, if I allow connections to template0, the patch works as expected, I have not managed to break it so far. Some further review comments: > diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml > index f4bc2d4161..89afecb341 100644 > --- a/doc/src/sgml/wal.sgml > +++ b/doc/src/sgml/wal.sgml > @@ -230,6 +230,73 @@ [...] > + <para> > + Checksums can be enabled or disabled online, by calling the appropriate > + <link linkend="functions-admin-checksum">functions</link>. > + Disabling of checksums takes effect immediately when the function is > called. > + </para> > + > + <para> > + Enabling checksums will put the cluster in <literal>inprogress</literal> > mode. > + During this time, checksums will be written but not verified. In > addition to > + this, a background worker process is started that enables checksums on > all > + existing data in the cluster. Once this worker has completed processing > all > + databases in the cluster, the checksum mode will automatically switch to > + <literal>on</literal>. > + </para> > + > + <para> > + If the cluster is stopped while in <literal>inprogress</literal> mode, > for > + any reason, then this process must be restarted manually. To do this, > + re-execute the function <function>pg_enable_data_checksums()</function> > + once the cluster has been restarted. > + </para> Maybe document the above issue here, unless it is clear that the templat0-needs-to-allow-connections issue will go away before the patch is pushed. > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index 47a6c4d895..56aaa88de1 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c [...] > +void > +SetDataChecksumsOn(void) > +{ > + if (!DataChecksumsEnabledOrInProgress()) > + elog(ERROR, "Checksums not enabled or in progress"); > + > + LWLockAcquire(ControlFileLock, LW_EXCLUSIVE); > + > + if (ControlFile->data_checksum_version != > PG_DATA_CHECKSUM_INPROGRESS_VERSION) > + { > + LWLockRelease(ControlFileLock); > + elog(ERROR, "Checksums not in in_progress mode"); The string used in "SHOW data_checksums" is "inprogress", not "in_progress". [...] > @@ -7769,6 +7839,16 @@ StartupXLOG(void) > CompleteCommitTsInitialization(); > > /* > + * If we reach this point with checksums in inprogress state, we notify > + * the user that they need to manually restart the process to enable > + * checksums. > + */ > + if (ControlFile->data_checksum_version == > PG_DATA_CHECKSUM_INPROGRESS_VERSION) > + ereport(WARNING, > + (errmsg("checksum state is \"in progress\" with > no worker"), > + errhint("Either disable or enable checksums by > calling the pg_disable_data_checksums() or pg_enable_data_checksums() > functions."))); Again, string is "inprogress", not "in progress", not sure if that matters. > diff --git a/src/backend/postmaster/checksumhelper.c > b/src/backend/postmaster/checksumhelper.c > new file mode 100644 > index 0000000000..6aa71bcf30 > --- /dev/null > +++ b/src/backend/postmaster/checksumhelper.c > @@ -0,0 +1,631 @@ > +/*------------------------------------------------------------------------- > + * > + * checksumhelper.c > + * Backend worker to walk the database and write checksums to pages Backend or Background? [...] > +typedef struct ChecksumHelperShmemStruct > +{ > + pg_atomic_flag launcher_started; > + bool success; > + bool process_shared_catalogs; > + /* Parameter values set on start */ double space. [...] > +static bool > +ProcessSingleRelationFork(Relation reln, ForkNumber forkNum, > BufferAccessStrategy strategy) > +{ > + BlockNumber numblocks = RelationGetNumberOfBlocksInFork(reln, forkNum); > + BlockNumber b; > + > + for (b = 0; b < numblocks; b++) > + { > + Buffer buf = ReadBufferExtended(reln, forkNum, b, > RBM_NORMAL, strategy); > + Page page; > + PageHeader pagehdr; > + uint16 checksum; > + > + /* Need to get an exclusive lock before we can flag as dirty */ > + LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE); > + > + /* Do we already have a valid checksum? */ > + page = BufferGetPage(buf); > + pagehdr = (PageHeader) page; > + checksum = pg_checksum_page((char *) page, b); This looks like it does not take the segment number into account; however it is also unclear to me what the purpose of this is, as checksum is never validated against the pagehdr, and nothing is done with it. Indeed, I even get a compiler warning about pagehdr and checksum: git/postgresql/build/../src/backend/postmaster/checksumhelper.c: In function ‘ProcessSingleRelationFork’: git/postgresql/build/../src/backend/postmaster/checksumhelper.c:155:11: warning: variable ‘checksum’ set but not used [-Wunused-but-set-variable] uint16 checksum; ^~~~~~~~ git/postgresql/build/../src/backend/postmaster/checksumhelper.c:154:14: warning: variable ‘pagehdr’ set but not used [-Wunused-but-set-variable] PageHeader pagehdr; ^~~~~~~ I guess the above block running pg_checksum_page() is a leftover from previous versions of the patch and should be removed... > + /* > + * Mark the buffer as dirty and force a full page write. > + * We have to re-write the page to wal even if the checksum > hasn't > + * changed, because if there is a replica it might have a > slightly > + * different version of the page with an invalid checksum, > caused > + * by unlogged changes (e.g. hintbits) on the master happening > while > + * checksums were off. This can happen if there was a valid > checksum > + * on the page at one point in the past, so only when checksums > + * are first on, then off, and then turned on again. > + */ > + START_CRIT_SECTION(); > + MarkBufferDirty(buf); > + log_newpage_buffer(buf, false); > + END_CRIT_SECTION(); ... seeing how MarkBufferDirty(buf) is now run unconditionally. [...] > + /* > + * Initialize a connection to shared catalogs only. > + */ > + BackgroundWorkerInitializeConnection(NULL, NULL); > + > + /* > + * Set up so first run processes shared catalogs, but not once in every > db > + */ Comment should have a full stop like the above and below ones? > + ChecksumHelperShmem->process_shared_catalogs = true; > + > + /* > + * Create a database list. We don't need to concern ourselves with > + * rebuilding this list during runtime since any new created database > will > + * be running with checksums turned on from the start. > + */ [...] > + DatabaseList = BuildDatabaseList(); > + > + /* > + * If there are no databases at all to checksum, we can exit immediately > + * as there is no work to do. > + */ > + if (DatabaseList == NIL || list_length(DatabaseList) == 0) > + return; > + > + while (true) > + { > + List *remaining = NIL; > + ListCell *lc, > + *lc2; > + List *CurrentDatabases = NIL; > + > + foreach(lc, DatabaseList) > + { > + ChecksumHelperDatabase *db = (ChecksumHelperDatabase *) > lfirst(lc); > + > + if (ProcessDatabase(db)) > + { > + pfree(db->dbname); > + pfree(db); > + > + if > (ChecksumHelperShmem->process_shared_catalogs) > + > + /* > + * Now that one database has completed > shared catalogs, we > + * don't have to process them again . Stray space. > + */ > + > ChecksumHelperShmem->process_shared_catalogs = false; > + } > + else > + { > + /* > + * Put failed databases on the remaining list. > + */ > + remaining = lappend(remaining, db); > + } > + } > + list_free(DatabaseList); > + > + DatabaseList = remaining; > + remaining = NIL; > + > + /* > + * DatabaseList now has all databases not yet processed. This > can be > + * because they failed for some reason, or because the database > was > + * DROPed between us getting the database list and trying to > process DROPed looks wrong, and there's no other occurence of it in the source tree. DROPped looks even weirder, so maybe just "dropped"? > + * it. Get a fresh list of databases to detect the second case > with. That sentence looks unfinished or at least is unclear to me. > + * Any database that still exists but failed we retry for a > limited I'm not a native speaker, but this looks wrong to me as well, maybe "We retry any database that still exists but failed for a limited [...]"? > diff --git a/src/bin/pg_upgrade/controldata.c > b/src/bin/pg_upgrade/controldata.c > index 0fe98a550e..4bb2b7e6ec 100644 > --- a/src/bin/pg_upgrade/controldata.c > +++ b/src/bin/pg_upgrade/controldata.c > @@ -591,6 +591,15 @@ check_control_data(ControlData *oldctrl, > */ > > /* > + * If checksums have been turned on in the old cluster, but the > + * checksumhelper have yet to finish, then disallow upgrading. The user > + * should either let the process finish, or turn off checksums, before > + * retrying. > + */ > + if (oldctrl->data_checksum_version == 2) > + pg_fatal("transition to data checksums not completed in old > cluster\n"); > + > + /* > * We might eventually allow upgrades from checksum to no-checksum > * clusters. > */ See below about src/bin/pg_upgrade/pg_upgrade.h having data_checksum_version be a bool. I checked pg_ugprade (from master to master though), and could find no off-hand issues, i.e. it reported all issues correctly. > --- /dev/null > +++ b/src/bin/pg_verify_checksums/Makefile [...] > +check: > + $(prove_check) > + > +installcheck: > + $(prove_installcheck) If I run "make check" in src/bin/pg_verify_checksums, I get a fat perl error: |src/bin/pg_verify_checksums$ LANG=C make check |rm -rf '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install |/bin/mkdir -p '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install/log |make -C '../../..' DESTDIR='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install install >'/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build'/tmp_install/log/install.log 2>&1 |rm -rf '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums'/tmp_check |/bin/mkdir -p '/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums'/tmp_check |cd /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums && TESTDIR='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums' PATH="/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/tmp_install//bin:$PATH" LD_LIBRARY_PATH="/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/tmp_install//lib" PGPORT='65432' PG_REGRESS='/home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/src/bin/pg_verify_checksums/../../../src/test/regress/pg_regress' /usr/bin/prove -I /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/test/perl/ -I /home/mba/Projekte/OSS/PostgreSQL/git/postgresql/build/../src/bin/pg_verify_checksums t/*.pl |Cannot detect source of 't/*.pl'! at /usr/share/perl/5.24/TAP/Parser/IteratorFactory.pm line 261. | TAP::Parser::IteratorFactory::detect_source(TAP::Parser::IteratorFactory=HASH(0x55eed10df3e8), TAP::Parser::Source=HASH(0x55eed10bd358)) called at /usr/share/perl/5.24/TAP/Parser/IteratorFactory.pm line 211 | TAP::Parser::IteratorFactory::make_iterator(TAP::Parser::IteratorFactory=HASH(0x55eed10df3e8), TAP::Parser::Source=HASH(0x55eed10bd358)) called at /usr/share/perl/5.24/TAP/Parser.pm line 472 | TAP::Parser::_initialize(TAP::Parser=HASH(0x55eed10df328), HASH(0x55eed0ea2e90)) called at /usr/share/perl/5.24/TAP/Object.pm line 55 | TAP::Object::new("TAP::Parser", HASH(0x55eed0ea2e90)) called at /usr/share/perl/5.24/TAP/Object.pm line 130 | TAP::Object::_construct(TAP::Harness=HASH(0x55eed09176b0), "TAP::Parser", HASH(0x55eed0ea2e90)) called at /usr/share/perl/5.24/TAP/Harness.pm line 852 | TAP::Harness::make_parser(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Scheduler::Job=HASH(0x55eed0fdc708)) called at /usr/share/perl/5.24/TAP/Harness.pm line 651 | TAP::Harness::_aggregate_single(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Aggregator=HASH(0x55eed091e520), TAP::Parser::Scheduler=HASH(0x55eed0fdc6a8)) called at /usr/share/perl/5.24/TAP/Harness.pm line 743 | TAP::Harness::aggregate_tests(TAP::Harness=HASH(0x55eed09176b0), TAP::Parser::Aggregator=HASH(0x55eed091e520), "t/*.pl") called at /usr/share/perl/5.24/TAP/Harness.pm line 558 | TAP::Harness::__ANON__() called at /usr/share/perl/5.24/TAP/Harness.pm line 571 | TAP::Harness::runtests(TAP::Harness=HASH(0x55eed09176b0), "t/*.pl") called at /usr/share/perl/5.24/App/Prove.pm line 546 | App::Prove::_runtests(App::Prove=HASH(0x55eed090b0c8), HASH(0x55eed0d79cf0), "t/*.pl") called at /usr/share/perl/5.24/App/Prove.pm line 504 | App::Prove::run(App::Prove=HASH(0x55eed090b0c8)) called at /usr/bin/prove line 13 |Makefile:39: recipe for target 'check' failed |make: *** [check] Error 2 > diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c > b/src/bin/pg_verify_checksums/pg_verify_checksums.c > new file mode 100644 > index 0000000000..a4bfe7284d > --- /dev/null > +++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c [...] > + if (DataDir == NULL) > + { > + if (optind < argc) > + DataDir = argv[optind++]; > + else > + DataDir = getenv("PGDATA"); > + } > + > + /* Complain if any arguments remain */ > + if (optind < argc) > + { > + fprintf(stderr, _("%s: too many command-line arguments (first > is \"%s\")\n"), > + progname, argv[optind]); > + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), > + progname); > + exit(1); > + } > + > + if (DataDir == NULL) > + { > + fprintf(stderr, _("%s: no data directory specified\n"), > progname); > + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), > progname); > + exit(1); > + } Those two if (DataDir == NULL) checks could maybe be put together into one block. > diff --git a/src/include/postmaster/checksumhelper.h > b/src/include/postmaster/checksumhelper.h > new file mode 100644 > index 0000000000..7f296264a9 > --- /dev/null > +++ b/src/include/postmaster/checksumhelper.h > @@ -0,0 +1,31 @@ > +/*------------------------------------------------------------------------- > + * > + * checksumhelper.h > + * header file for checksum helper deamon "deamon" is surely wrong (it'd be "daemon"), but maybe "(background) worker" is better? > diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h > index 85dd10c45a..bd46bf2ce6 100644 > --- a/src/include/storage/bufpage.h > +++ b/src/include/storage/bufpage.h > @@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader; > */ > #define PG_PAGE_LAYOUT_VERSION 4 > #define PG_DATA_CHECKSUM_VERSION 1 > +#define PG_DATA_CHECKSUM_INPROGRESS_VERSION 2 I am not very sure about the semantics of PG_DATA_CHECKSUM_VERSION being 1, but I assumed it was a version, like, if we ever decide to use a different checksumming algorithm, we'd bump it to 2. Now PG_DATA_CHECKSUM_INPROGRESS_VERSION is defined to 2, which I agree is convenient, but is there some strategy what to do about this in case the PG_DATA_CHECKSUM_VERSION needs to be increased? In any case, src/bin/pg_upgrade/pg_upgrade.h has bool data_checksum_version; in the ControlData struct, which might need updating? That's all for now. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.ba...@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer