Hi,

The patch looks good to me at a high level. Some notes below. I didn't read through the whole thread, so sorry if some of these have been discussed already.

+void
+ShutdownChecksumHelperIfRunning(void)
+{
+       if 
(pg_atomic_unlocked_test_flag(&ChecksumHelperShmem->launcher_started))
+       {
+               /* Launcher not started, so nothing to shut down */
+               return;
+       }
+
+       ereport(ERROR,
+                       (errmsg("checksum helper is currently running, cannot 
disable checksums"),
+                        errhint("Restart the cluster or wait for the worker to 
finish.")));
+}

Is there no way to stop the checksum helper once it's started? That seems rather user-unfriendly. I can imagine it being a pretty common mistake to call pg_enable_data_checksums() on a 10 TB cluster, only to realize that you forgot to set the cost limit, and that it's hurting queries too much. At that point, you want to abort.

+ * This is intended to create the worklist for the workers to go through, and
+ * as we are only concerned with already existing databases we need to ever
+ * rebuild this list, which simplifies the coding.

I can't parse this sentence.

+extern bool DataChecksumsEnabledOrInProgress(void);
+extern bool DataChecksumsInProgress(void);
+extern bool DataChecksumsDisabled(void);

I find the name of the DataChecksumsEnabledOrInProgress() function a bit long. And doing this in PageIsVerified looks a bit weird:

> if (DataChecksumsEnabledOrInProgress() && !DataChecksumsInProgress())

I think I'd prefer functions like:

/* checksums should be computed on write? */
bool DataChecksumNeedWrite()
/* checksum should be verified on read? */
bool DataChecksumNeedVerify()


+     <literal>template0</literal> is by default not accepting connections, to
+     enable checksums you'll need to temporarily make it accept connections.

This was already discussed, and I agree with the other comments that it's bad.

+CREATE OR REPLACE FUNCTION pg_enable_data_checksums (
+        cost_delay int DEFAULT 0, cost_limit int DEFAULT 100)
+  RETURNS boolean STRICT VOLATILE LANGUAGE internal AS 'enable_data_checksums'
+  PARALLEL RESTRICTED;

pg_[enable|disable]_checksums() functions return a bool. It's not clear to me when they would return what. I'd suggest marking them as 'void' instead.

--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -194,6 +194,7 @@ typedef PageHeaderData *PageHeader;
  */
 #define PG_PAGE_LAYOUT_VERSION         4
 #define PG_DATA_CHECKSUM_VERSION       1
+#define PG_DATA_CHECKSUM_INPROGRESS_VERSION            2

This seems like a weird place for these PG_DATA_CHECKSUM_* constants. They're not actually stored in the page header, as you might think. I think the idea was to keep PG_DATA_CHECKSUM_VERSION close to PG_PAGE_LAYOUT_VERSION, because the checksums affected the on-disk format. I think it was a bit weird even before this patch, but now it's worse. At least a better comment would be in order, or maybe move these elsewhere.

Feature request: it'd be nice to update the 'ps status' with set_ps_display, to show a rudimentary progress indicator. Like the name and block number of the relation being processed. It won't tell you how much is left to go, but at least it will give a warm fuzzy feeling to the DBA that something is happening.

I didn't review the offline tool, pg_verify_checksums.

- Heikki


Reply via email to