Hello Masahiko-san, I've spent some more time trying to understand the code in lazy_scan_heap function to know under what all circumstances a VACUUM can fail with "found xmin ... before relfrozenxid ..." error for a tuple whose xmin is behind relfrozenxid. Here are my observations:
1) It can fail with this error for a live tuple OR, 2) It can also fail with this error if a tuple (that went through update) is marked as HEAP_HOT_UPDATED or HEAP_ONLY_TUPLE. OR, 3) If there are any concurrent transactions, then the tuple might be marked as HEAPTUPLE_INSERT_IN_PROGRESS or HEAPTUPLE_DELETE_IN_PROGRESS or HEAPTUPLE_RECENTLY_DEAD in which case also VACUUM can fail with this error. Now, AFAIU, as we will be dealing with a damaged table, the chances of point #3 being the cause of this error looks impossible in our case because I don't think we will be doing anything in parallel when performing surgery on a damaged table, in fact we shouldn't be doing any such things. However, it is quite possible that reason #2 could cause VACUUM to fail with this sort of error, but, as we are already skipping redirected item pointers in heap_force_common(), I think, we would never be marking HEAP_HOT_UPDATED tuple as frozen and I don't see any problem in marking HEAP_ONLY_TUPLE as frozen. So, probably, we may not need to handle point #2 as well. Further, I also don't see VACUUM reporting this error for a tuple that has been moved from one partition to another. So, I think we might not need to do any special handling for a tuple that got updated and its new version was moved to another partition. If you feel I am missing something here, please correct me. Thank you. Moreover, while I was exploring on above, I noticed that in lazy_scan_heap(), before we call HeapTupleSatisfiesVacuum() we check for a redirected item pointers and if any redirected item pointer is detected we do not call HeapTupleSatisfiesVacuum(). So, not sure how HeapTupleSatisfiesVacuum would ever return a dead tuple that is marked with HEAP_HOT_UPDATED. I am referring to the following code in lazy_scan_heap(). for (offnum = FirstOffsetNumber; offnum <= maxoff; offnum = OffsetNumberNext(offnum)) { ItemId itemid; itemid = PageGetItemId(page, offnum); ............. ............. /* Redirect items mustn't be touched */ <-- this check would bypass the redirected item pointers from being checked for HeapTupleSatisfiesVacuum. if (ItemIdIsRedirected(itemid)) { hastup = true; /* this page won't be truncatable */ continue; } .............. .............. switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf)) { case HEAPTUPLE_DEAD: if (HeapTupleIsHotUpdated(&tuple) || HeapTupleIsHeapOnly(&tuple) || params->index_cleanup == VACOPT_TERNARY_DISABLED) nkeep += 1; else tupgone = true; /* we can delete the tuple */ .............. .............. } So, the point is, would HeapTupleIsHotUpdated(&tuple) ever be true? -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com On Mon, Aug 17, 2020 at 11:35 AM Ashutosh Sharma <ashu.coe...@gmail.com> wrote: > > Hello Masahiko-san, > > Thanks for the review. Please check the comments inline below: > > On Fri, Aug 14, 2020 at 10:07 AM Masahiko Sawada > <masahiko.saw...@2ndquadrant.com> wrote: > > > Thank you for updating the patch! Here are my comments on v5 patch: > > > > --- a/contrib/Makefile > > +++ b/contrib/Makefile > > @@ -35,6 +35,7 @@ SUBDIRS = \ > > pg_standby \ > > pg_stat_statements \ > > pg_trgm \ > > + pg_surgery \ > > pgcrypto \ > > > > I guess we use alphabetical order here. So pg_surgery should be placed > > before pg_trgm. > > > > Okay, will take care of this in the next version of patch. > > > --- > > + if (heap_force_opt == HEAP_FORCE_KILL) > > + ItemIdSetDead(itemid); > > > > I think that if the page is an all-visible page, we should clear an > > all-visible bit on the visibility map corresponding to the page and > > PD_ALL_VISIBLE on the page header. Otherwise, index only scan would > > return the wrong results. > > > > I think we should let VACUUM do that. Please note that this module is > intended to be used only on a damaged relation and should only be > operated on damaged tuples of such relations. And the execution of any > of the functions provided by this module on a damaged relation must be > followed by VACUUM with DISABLE_PAGE_SKIPPING option on that relation. > This is necessary to bring back a damaged relation to the sane state > once a surgery is performed on it. I will try to add this note in the > documentation for this module. > > > --- > > + /* > > + * We do not mark the buffer dirty or do WAL logging for unmodifed > > + * pages. > > + */ > > + if (!did_modify_page) > > + goto skip_wal; > > + > > + /* Mark buffer dirty before we write WAL. */ > > + MarkBufferDirty(buf); > > + > > + /* XLOG stuff */ > > + if (RelationNeedsWAL(rel)) > > + log_newpage_buffer(buf, true); > > + > > +skip_wal: > > + END_CRIT_SECTION(); > > + > > > > s/unmodifed/unmodified/ > > > > okay, will fix this typo. > > > Do we really need to use goto? I think we can modify it like follows: > > > > if (did_modity_page) > > { > > /* Mark buffer dirty before we write WAL. */ > > MarkBufferDirty(buf); > > > > /* XLOG stuff */ > > if (RelationNeedsWAL(rel)) > > log_newpage_buffer(buf, true); > > } > > > > END_CRIT_SECTION(); > > > > No, we don't need it. We can achieve the same by checking the status > of did_modify_page flag as you suggested. I will do this change in the > next version. > > > --- > > pg_force_freeze() can revival a tuple that is already deleted but not > > vacuumed yet. Therefore, the user might need to reindex indexes after > > using that function. For instance, with the following script, the last > > two queries: index scan and seq scan, will return different results. > > > > set enable_seqscan to off; > > set enable_bitmapscan to off; > > set enable_indexonlyscan to off; > > create table tbl (a int primary key); > > insert into tbl values (1); > > > > update tbl set a = a + 100 where a = 1; > > > > explain analyze select * from tbl where a < 200; > > > > -- revive deleted tuple on heap > > select heap_force_freeze('tbl', array['(0,1)'::tid]); > > > > -- index scan returns 2 tuples > > explain analyze select * from tbl where a < 200; > > > > -- seq scan returns 1 tuple > > set enable_seqscan to on; > > explain analyze select * from tbl; > > > > I am not sure if this is the right use-case of pg_force_freeze > function. I think we should only be running pg_force_freeze function > on a tuple for which VACUUM reports "found xmin ABC from before > relfrozenxid PQR" sort of error otherwise it might worsen the things > instead of making it better. Now, the question is - can VACUUM report > this type of error for a deleted tuple or it would only report it for > a live tuple? AFAIU this won't be reported for the deleted tuples > because VACUUM wouldn't consider freezing a tuple that has been > deleted. > > > Also, if a tuple updated and moved to another partition is revived by > > heap_force_freeze(), its ctid still has special values: > > MovedPartitionsOffsetNumber and MovedPartitionsBlockNumber. I don't > > see a problem yet caused by a visible tuple having the special ctid > > value, but it might be worth considering either to reset ctid value as > > well or to not freezing already-deleted tuple. > > > > For this as well, the answer remains the same as above. > > -- > With Regards, > Ashutosh Sharma > EnterpriseDB:http://www.enterprisedb.com