Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > On Fri, Mar 30, 2018 at 5:35 AM, David Steele <da...@pgmasters.net> wrote: > > > On 3/24/18 10:32 AM, Michael Banck wrote: > > > Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck: > > >> Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele: > > >>> In my experience actual block errors are relatively rare, so there > > >>> aren't likely to be more than a few in a file. More common are > > >>> overwritten or transposed files, rogue files, etc. These produce a lot > > >>> of output. > > >>> > > >>> Maybe stop after five? > > > > > > The attached patch does that, and outputs the total number of > > > verification failures of that file after it got sent. > > > > > >> I'm on board with this, but I have the feeling that this is not a very > > >> common pattern in Postgres, or might not be project style at all. I > > >> can't remember even seen an error message like that. > > >> > > >> Anybody know whether we're doing this in a similar fashion elsewhere? > > > > > > I tried to have look around and couldn't find any examples, so I'm not > > > sure that patch should go in. On the other hand, we abort on checksum > > > failures usually (in pg_dump e.g.), so limiting the number of warnings > > > does makes sense. > > > > > > I guess we need to see what others think. > > > > Well, at this point I would say silence more or less gives consent. > > > > Can you provide a rebased patch with the validation retry and warning > > limiting logic added? I would like to take another pass through it but I > > think this is getting close. > > I was meaning to mention it, but ran out of cycles. > > I think this is the right way to do it, except the 5 should be a #define > and not an inline hardcoded value :) We could argue whether it should be "5 > total" or "5 per file". When I read the emails I thought it was going to be > 5 total, but I see the implementation does 5 / file. In a super-damanged > system that will still lead to horrible amounts of logging, but I think > maybe if your system is in that bad shape, then it's a lost cause anyway.
5/file seems reasonable to me as well. > I also think the "total number of checksum errors" should be logged if > they're >0, not >5. And I think *that* one should be logged at the end of > the entire process, not per file. That'd be the kind of output that would > be the most interesting, I think (e.g. if I have it spread out with 1 block > each across 4 files, I want that logged at the end because it's easy to > otherwise miss one or two of them that may have happened a long time apart). I definitely like having a total # of checksum errors included at the end, if there are any at all. When someone is looking to see why the process returned a non-zero exit code, they're likely to start looking at the end of the log, so having that easily available and clear as to why the backup failed is definitely valuable. > I don't think we have a good comparison elsewhere, and that is as David > mention because other codepaths fail hard when they run into something like > that. And we explicitly want to *not* fail hard, per previous discussion. Agreed. Thanks! Stephen
signature.asc
Description: PGP signature