Hi Dilip!
> 17 июля 2020 г., в 15:46, Dilip Kumar <dilipbal...@gmail.com> написал(а): > > The attached patch allows the vacuum to continue by emitting WARNING > for the corrupted tuple instead of immediately error out as discussed > at [1]. > > Basically, it provides a new GUC called vacuum_tolerate_damage, to > control whether to continue the vacuum or to stop on the occurrence of > a corrupted tuple. So if the vacuum_tolerate_damage is set then in > all the cases in heap_prepare_freeze_tuple where the corrupted xid is > detected, it will emit a warning and return that nothing is changed in > the tuple and the 'tuple_totally_frozen' will also be set to false. > Since we are returning false the caller will not try to freeze such > tuple and the tuple_totally_frozen is also set to false so that the > page will not be marked to all frozen even if all other tuples in the > page are frozen. > > Alternatively, we can try to freeze other XIDs in the tuple which is > not corrupted but I don't think we will gain anything from this, > because if one of the xmin or xmax is wrong then next time also if we > run the vacuum then we are going to get the same WARNING or the ERROR. > Is there any other opinion on this? FWIW AFAIK this ERROR was the reason why we had to use older versions of heap_prepare_freeze_tuple() in our recovery kit [0]. So +1 from me. But I do not think that just ignoring corruption here is sufficient. Soon after this freeze problem user will, probably, have to deal with absent CLOG. I think this GUC is only a part of an incomplete solution. Personally I'd be happy if this is backported - our recovery kit would be much smaller. But this does not seem like a valid reason. Thanks! Best regards, Andrey Borodin. [0] https://github.com/dsarafan/pg_dirty_hands/blob/master/src/pg_dirty_hands.c#L443