On Fri, Sep 2, 2022 at 6:14 PM Michael Paquier <mich...@paquier.xyz> wrote: > This has been a problem for years, and still for years to come with > libc updates. I am not much into this stuff, but does running VACUUM > in this case help with the state of the index that used a past, > now-invalid, collation (be it libc or ICU) to get a bit cleaned up?
Yes -- nbtree VACUUM generally can cope quite well, even when the index is corrupt. It should mostly manage to do what is expected here, even with a misbehaving opclass, because it relies as little as possible on user-defined opclass code. Even without the hardening in place, nbtree VACUUM will still do a *surprisingly* good job of recovering when the opclass is broken in some way: VACUUM just needs the insertion scankey operator class code to initially determine roughly where to look for the to-be-deleted page's downlink, one level up in the tree. Even when an operator class is wildly broken (e.g. the comparator gives a result that it determines at random), we still won't see problems in nbtree VACUUM most of the time -- because even being roughly correct is good enough in practice! You have to be quite unlucky to hit this, even when the opclass is wildly broken (which is probably much less common than "moderately broken"). > When written like that, this surely sounds extremely bad and this > would need more complex chirurgy (or just running with a build that > includes this patch?). The patch will fix the case in question, which I have seen internal AWS reports about -- though the initial fix that went into 14 wasn't driven by any complaint from any user. I just happened to notice that we were throwing an ERROR in nbtree VACUUM for no good reason, which is something that should be avoided on general principle. In theory there could be other ways in which you'd run into the same basic problem (in any index AM). The important point is that we're better off not throwing any errors in the first place, but if we must then they had better not be errors that will be repeated again and again, without any chance of the problem going away naturally. (Not that it never makes sense to just throw an error; there are meaningful gradations of "totally unacceptable problem".) -- Peter Geoghegan