On Mon, Sep 07, 2020 at 09:38:30AM +0200, Julien Rouhaud wrote: > Did you mean creating a new checksumfuncs.c file? I don't find any > such file in the current tree.
Your patch adds checksumfuncs.c, so the subroutines grabbing a given block could just be moved there. > I'm not sure I understand. Unless I missed something this approach > *cannot* raise a false positive. What it does is force a 2nd check > with stronger lock *to make sure it's actually a corruption*, so we > don't raise false positive. The only report that can happen in this > 1st loop is if smgread raises an error, which AFAICT can only happen > (at least with mdread) if the whole block couldn't be read, which is a > sign of a very bad problem. This should clearly be reported, as this > cannot be caused by the locking heuristics used here. We don't know how much this optimization matters though? Could it be possible to get an idea of that? For example, take the case of one relation with a fixed size in a read-only workload and a read-write workload (as long as autovacuum and updates make the number of relation blocks rather constant for the read-write case), doing a checksum verification in parallel of multiple clients working on the relation concurrently. Assuming that the relation is fully in the OS cache, we could get an idea of the impact with multiple (shared_buffers / relation size) rates to make the eviction more aggressive? The buffer partition locks, knowing that NUM_BUFFER_PARTITIONS caps that, should be the bottleneck, still it seems to me that it would be good to see if we have a difference. What do you think? -- Michael
signature.asc
Description: PGP signature