On Thu, Nov 19, 2020 at 1:50 PM Mark Dilger <mark.dil...@enterprisedb.com> wrote: > It makes sense to me to have a "don't run through minefields" option, and a > "go ahead, run through minefields" option for pg_amcheck, given that users in > differing situations will have differing business consequences to bringing > down the server in question.
This kind of framing suggests zero-risk bias to me: https://en.wikipedia.org/wiki/Zero-risk_bias It's simply not helpful to think of the risks as "running through a minefield" versus "not running through a minefield". I also dislike this framing because in reality nobody runs through a minefield, unless maybe it's a battlefield and the alternative is probably even worse. Risks are not discrete -- they're continuous. And they're situational. I accept that there are certain reasonable gradations in the degree to which a segfault is bad, even in contexts in which pg_amcheck runs into actual serious problems. And as Robert points out, experience suggests that on average people care about availability the most when push comes to shove (though I hasten to add that that's not the same thing as considering a once-off segfault to be the greater evil here). Even still, I firmly believe that it's a mistake to assign *infinite* weight to not having a segfault. That is likely to have certain unintended consequences that could be even worse than a segfault, such as not detecting pernicious corruption over many months because our can't-segfault version of core functionality fails to have the same bugs as the actual core functionality (and thus fails to detect a problem in the core functionality). The problem with giving infinite weight to any one bad outcome is that it makes it impossible to draw reasonable distinctions between it and some other extreme bad outcome. For example, I would really not like to get infected with Covid-19. But I also think that it would be much worse to get infected with Ebola. It follows that Covid-19 must not be infinitely bad, because if it is then I can't make this useful distinction -- which might actually matter. If somebody hears me say this, and takes it as evidence of my lackadaisical attitude towards Covid-19, I can live with that. I care about avoiding criticism as much as the next person, but I refuse to prioritize it over all other things. > I doubt other backend hardening is any more likely to get committed. I suspect you're right about that. Because of the risks of causing real harm to users. The backend code is obviously *not* written with the assumption that data cannot be corrupt. There are lots of specific ways in which it is hardened (e.g., there are many defensive "can't happen" elog() statements). I really don't know why you insist on this black and white framing. -- Peter Geoghegan