On Wed, Mar 11, 2020 at 08:18:23AM +0100, Julien Rouhaud wrote: > The cfbot reported a build failure, so here's a rebased v2 which also contains > the pg_stat_report_failure() call and extra log info.
+ * - if a block is not found in shared_buffers, the LWLock is relased and the + * block is read from disk without taking any lock. If an error is detected, + * the read block will be discarded and retrieved again while holding the + * LWLock. This is because an error due to concurrent write is possible but + * very unlikely, so it's better to have an optimistic approach to limit + * locking overhead This can be risky with false positives, no? With a large amount of shared buffer eviction you actually increase the risk of torn page reads. Instead of a logic relying on partition mapping locks, which could be unwise on performance grounds, did you consider different approaches? For example a kind of pre-emptive lock on the page in storage to prevent any shared buffer operation to happen while the block is read from storage, that would act like a barrier. + * Vacuum's GUCs are used to avoid consuming too much resources while running + * this tool. Shouldn't this involve separate GUCs instead of the VACUUM ones? I guess that this leads to the fact that this function may be better as a contrib module, with the addition of some better-suited APIs in core (see paragraph above). +Run + make check +or + make installcheck Why is installcheck mentioned here? I don't think that it is appropriate to place the SQL-callable part in the existing checksum.c. I would suggest instead a new file, say checksumfuncs.c in src/backend/utils/adt/, holding any SQL functions for checksums. -SUBDIRS = perl regress isolation modules authentication recovery subscription +SUBDIRS = perl regress isolation modules authentication check_relation \ + recovery subscription It seems to me that this test would be a good fit for src/test/modules/test_misc/. +static void +check_all_relations(TupleDesc tupdesc, Tuplestorestate *tupstore, + ForkNumber forknum) Per the argument of bloat, I think that I would remove check_all_relation() as this function could take a very long time to run, and just make the SQL function strict. + * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to + * disk either before the end of the next checkpoint or during recovery in + * case of unsafe shutdown Wouldn't it actually be a good thing to check that the page on storage is fine in this case? This depends on the system settings and the checkpoint frequency, but checkpoint_timeout can be extended up to 1 day. And plenty of things could happen to the storage in one day, including a base backup that includes a corrupted page on storage, that this function would not be able to detect. + * - if a block is otherwise found in shared_buffers, an IO lock is taken on + * the block and the block is then read from storage, ignoring the block in + * shared_buffers Yeah, I think that you are right here to check the page on storage anyway. + * we detect if a block is in shared_buffers or not. See get_buffer() + * comments for more details about the locking strategy. get_buffer() does not exist in your patch, check_get_buffer() does. + * - if a block is not found in shared_buffers, the LWLock is relased and the [...] + * To avoid torn page and possible false postives when reading data, and Typos. + if (!DataChecksumsEnabled()) + elog(ERROR, "Data checksums are not enabled"); Note that elog() is for the class of errors which are never expected, and here a caller of pg_check_relation() with checksums disabled can trigger that. So you need to call ereport() with ERRCODE_FEATURE_NOT_SUPPORTED. + * - if a block is dirty in shared_buffers, it's ignored as it'll be flushed to + * disk either before the end of the next checkpoint or during recovery in + * case of unsafe shutdown Not sure that the indentation is going to react well on that part of the patch, perhaps it would be better to add some "/*-------" at the beginning and end of the comment block to tell pgindent to ignore this part? Based on the feedback gathered on this thread, I guess that you should have a SRF returning the list of broken blocks, as well as NOTICE messages. Another thing to consider is the addition of a range argument to only check a certain portion of the blocks, say one segment file at a time, etc. Fine by me to not include in the first flavor of the patch. The patch needs documentation. -- Michael
signature.asc
Description: PGP signature