Hi,
The patch looks good to me at a high level. Some notes below. I didn't
read through the whole thread, so sorry if some of these have been
discussed already.
+void
+ShutdownChecksumHelperIfRunning(void)
+{
+ if
(pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+ {
+ /* Launcher not started, so nothing to shut down */
+ return;
+ }
+
+ ereport(ERROR,
+ (errmsg("checksum helper is currently running, cannot
disable checksums"),
+ errhint("Restart the cluster or wait for the worker to
finish.")));
+}
Is there no way to stop the checksum helper once it's started? That
seems rather user-unfriendly. I can imagine it being a pretty common
mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to
realize that you forgot to set the cost limit, and that it's hurting
queries too much. At that point, you want to abort.
+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.
I can't parse this sentence.
+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);
I find the name of the DataChecksumsEnabledOrInProgress() function a bit
long. And doing this in PageIsVerified looks a bit weird:
> if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())
I think I'd prefer functions like:
/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()
+ <literal>template0</literal> is by default not accepting connections, to
+ enable checksums you'll need to temporarily make it accept connections.
This was already discussed, and I agree with the other comments that
it's bad.
+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+ cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+ RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+ PARALLEL RESTRICTED;
pg_[enable|disable]_checksums() functions return a bool. It's not clear
to me when they would return what. I'd suggest marking them as 'void'
instead.
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
*/
#define PG_PAGE_LAYOUT_VERSION 4
#define PG_DATA_CHECKSUM_VERSION 1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION 2
This seems like a weird place for these PG_DATA_CHECKSUM_* constants.
They're not actually stored in the page header, as you might think. I
think the idea was to keep PG_DATA_CHECKSUM_VERSION close to
PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk
format. I think it was a bit weird even before this patch, but now it's
worse. At least a better comment would be in order, or maybe move these
elsewhere.
Feature request: it'd be nice to update the 'ps status' with
set_ps_display, to show a rudimentary progress indicator. Like the name
and block number of the relation being processed. It won't tell you how
much is left to go, but at least it will give a warm fuzzy feeling to
the DBA that something is happening.
I didn't review the offline tool, pg_verify_checksums.
- Heikki