On Mon, Jul 13, 2020 at 2:12 PM Robert Haas <robertmh...@gmail.com> wrote: > 1. There's nothing to identify the tuple that has the problem, and no > way to know how many more of them there might be. Back-patching > b61d161c146328ae6ba9ed937862d66e5c8b035a would help with the first > part of this.
I am in favor of backpatching such changes in cases where senior community members feel that it could help with hypothetical undiscovered data corruption issues -- if they're willing to take responsibility for the change. It certainly wouldn't be the first time. A "defense in depth" mindset seems like the right one when it comes to data corruption bugs. Early detection is really important. > Moreover, not everyone is as > interested in an extended debugging exercise as they are in getting > the system working again, and VACUUM failing repeatedly is a pretty > serious problem. That's absolutely consistent with my experience. Most users want to get back to business as usual now, while letting somebody else do the hard work of debugging. > Therefore, one of my colleagues has - at my request - created a couple > of functions called heap_force_kill() and heap_force_freeze() which > take an array of TIDs. > So I have these questions: > > - Do people think it would me smart/good/useful to include something > like this in PostgreSQL? I'm in favor of it. > - If so, how? I would propose a new contrib module that we back-patch > all the way, because the VACUUM errors were back-patched all the way, > and there seems to be no advantage in making people wait 5 years for a > new version that has some kind of tooling in this area. I'm in favor of it being *possible* to backpatch tooling that is clearly related to correctness in a fundamental way. Obviously this would mean that we'd be revising our general position on backpatching to allow some limited exceptions around corruption. I'm not sure that this meets that standard, though. It's hardly something that we can expect all that many users to be able to use effectively. I may be biased, but I'd be inclined to permit it in the case of something like amcheck, or pg_visibility, on the grounds that they're more or less the same as the new VACUUM errcontext instrumentation you mentioned. The same cannot be said of something like this new heap_force_kill() stuff. > - Any ideas for additional things we should include, or improvements > on the sketch above? Clearly you should work out a way of making it very hard to accidentally (mis)use. For example, maybe you make the functions check for the presence of a sentinel file in the data directory. -- Peter Geoghegan