On Fri, Oct 30, 2020 at 11:30:28AM +0900, Michael Paquier wrote: > Playing with dd and generating random pages, this detects random > corruptions, making use of a wait/retry loop if a failure is detected. > As mentioned upthread, this is a double-edged sword, increasing the > number of retries reduces the changes of false positives, at the cost > of making regression tests longer. This stuff uses up to 5 retries > with 100ms of sleep for each page. (I am aware of the fact that the > commit message of the main patch is not written yet).
So, I have done much more testing of this patch using an instance with a small shared buffer pool and pgbench running in parallel for having a large eviction rate, and I cannot convince myself to do that. My laptop got easily constrained on I/O, and within a total of 2000 base backups or so, I have seen some 5 backup failures with a correct detection logic. The rate is low here, but that could be annoying for users even at 1~2%. Couldn't we take the different approach to remove this feature instead? This still requires the grammar to be present in back-branches, but as things stand, we have a feature that fails its promise, and that also eats for nothing resources for each base backup taken :/ -- Michael
signature.asc
Description: PGP signature