I have been doing some user-level testing of this feature, apart from sanity test for extension and it's functions
I have tried to corrupt tuples and then able to fix it using heap_force_freeze/kill functions. --corrupt relfrozenxid, cause vacuum failed. update pg_class set relfrozenxid = (relfrozenxid::text::integer + 10)::text::xid where relname = 'test_tbl'; UPDATE 1 insert into test_tbl values (2, 'BBB'); postgres=# vacuum test_tbl; ERROR: found xmin 507 from before relfrozenxid 516 CONTEXT: while scanning block 0 of relation "public.test_tbl" postgres=# select *, ctid, xmin, xmax from test_tbl; a | b | ctid | xmin | xmax ---+-----+-------+------+------ 1 | AAA | (0,1) | 505 | 0 2 | BBB | (0,2) | 507 | 0 (2 rows) --fixed using heap_force_freeze postgres=# select heap_force_freeze('test_tbl'::regclass, ARRAY['(0,2)']::tid[]); heap_force_freeze ------------------- postgres=# vacuum test_tbl; VACUUM postgres=# select *, ctid, xmin, xmax from test_tbl; a | b | ctid | xmin | xmax ---+-----+-------+------+------ 1 | AAA | (0,1) | 505 | 0 2 | BBB | (0,2) | 2 | 0 (2 rows) --corrupt table headers in base/oid. file, cause table access failed. postgres=# select ctid, * from test_tbl; ERROR: could not access status of transaction 4294967295 DETAIL: Could not open file "pg_xact/0FFF": No such file or directory. --removed corrupted tuple using heap_force_kill postgres=# select heap_force_kill('test_tbl'::regclass, ARRAY['(0,2)']::tid[]); heap_force_kill ----------------- (1 row) postgres=# select ctid, * from test_tbl; ctid | a | b -------+---+----- (0,1) | 1 | AAA (1 row) I will be continuing with my testing with the latest patch and update here if found anything. Thanks & Regards, Rajkumar Raghuwanshi On Thu, Aug 6, 2020 at 1:42 PM Masahiko Sawada < masahiko.saw...@2ndquadrant.com> wrote: > On Wed, 5 Aug 2020 at 22:42, Ashutosh Sharma <ashu.coe...@gmail.com> > wrote: > > > > Hi Robert, > > > > Thanks for the review. Please find my comments inline: > > > > On Sat, Aug 1, 2020 at 12:18 AM Robert Haas <robertmh...@gmail.com> > wrote: > > > > > > On Fri, Jul 31, 2020 at 8:52 AM Ashutosh Sharma <ashu.coe...@gmail.com> > wrote: > > > > Attached is the new version of patch that addresses the comments > from Andrey and Beena. > > > > > > +PGFILEDESC = "pg_surgery - perform surgery on the damaged heap table" > > > > > > the -> a > > > > > > I also suggest: heap table -> relation, because we might want to > > > extend this to other cases later. > > > > > > > Corrected. > > > > > +-- toast table. > > > +begin; > > > +create table ttab(a text); > > > +insert into ttab select string_agg(chr(floor(random() * 26)::int + > > > 65), '') from generate_series(1,10000); > > > +select * from ttab where xmin = 2; > > > + a > > > +--- > > > +(0 rows) > > > + > > > +select heap_force_freeze('ttab'::regclass, ARRAY['(0, 1)']::tid[]); > > > + heap_force_freeze > > > +------------------- > > > + > > > +(1 row) > > > + > > > > > > I don't understand the point of this. You're not testing the function > > > on the TOAST table; you're testing it on the main table when there > > > happens to be a TOAST table that is probably getting used for > > > something. But that's not really relevant to what is being tested > > > here, so as written this seems redundant with the previous cases. > > > > > > > Yeah, it's being tested on the main table, not on a toast table. I've > > removed this test-case and also restricted direct access to the toast > > table using heap_force_kill/freeze functions. I think we shouldn't be > > using these functions to do any changes in the toast table. We will > > only use these functions with the main table and let VACUUM remove the > > corresponding data chunks (pointed by the tuple that got removed from > > the main table). > > > > Another option would be to identify all the data chunks corresponding > > to the tuple (ctid) being killed from the main table and remove them > > one by one. We will only do this if the tuple from the main table that > > has been marked as killed has an external storage. We will have to add > > a bunch of code for this otherwise we can let VACUUM do this for us. > > Let me know your thoughts on this. > > > > > +-- test pg_surgery functions with the unsupported relations. Should > fail. > > > > > > Please name the specific functions being tested here in case we add > > > more in the future that are tested separately. > > > > > > > Done. > > > > > +++ b/contrib/pg_surgery/heap_surgery_funcs.c > > > > > > I think we could drop _funcs from the file name. > > > > > > > Done. > > > > > +#ifdef PG_MODULE_MAGIC > > > +PG_MODULE_MAGIC; > > > +#endif > > > > > > The #ifdef here is not required, and if you look at other contrib > > > modules you'll see that they don't have it. > > > > > > > Okay, done. > > > > > I don't like all the macros at the top of the file much. CHECKARRVALID > > > is only used in one place, so it seems to me that you might as well > > > just inline it and lose the macro. Likewise for SORT and ARRISEMPTY. > > > > > > > Done. > > > > > Once you do that, heap_force_common() can just compute the number of > > > array elements once, instead of doing it once inside ARRISEMPTY, then > > > again inside SORT, and then a third time to initialize ntids. You can > > > just have a local variable in that function that is set once and then > > > used as needed. Then you are only doing ARRNELEMS once, so you can get > > > rid of that macro too. The same technique can be used to get rid of > > > ARRPTR. So then all the macros go away without introducing any code > > > duplication. > > > > > > > Done. > > > > > +/* Options to forcefully change the state of a heap tuple. */ > > > +typedef enum HTupleForceOption > > > +{ > > > + FORCE_KILL, > > > + FORCE_FREEZE > > > +} HTupleForceOption; > > > > > > I suggest un-abbreviating HTuple to HeapTuple and un-abbreviating the > > > enum members to HEAP_FORCE_KILL and HEAP_FORCE_FREE. > > > > Done. > > > > Also, how about > > > option -> operation? > > > > > > > I think both look okay to me. > > > > > + return heap_force_common(fcinfo, FORCE_KILL); > > > > > > I think it might be more idiomatic to use PG_RETURN_DATUM here. I > > > know it's the same thing, though, and perhaps I'm even wrong about the > > > prevailing style. > > > > > > > Done. > > > > > + Assert(force_opt == FORCE_KILL || force_opt == FORCE_FREEZE); > > > > > > I think this is unnecessary. It's an enum with 2 values. > > > > > > > Removed. > > > > > + if (ARRISEMPTY(ta)) > > > + ereport(ERROR, > > > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > + errmsg("empty tid array"))); > > > > > > I don't see why this should be an error. Why can't we just continue > > > normally and if it does nothing, it does nothing? You'd need to change > > > the do..while loop to a while loop so that the end condition is tested > > > at the top, but that seems fine. > > > > > > > I think it's okay to have this check. So, just left it as-is. We do > > have such checks in other contrib modules as well wherever the array > > is being passed as an input to the function. > > > > > + rel = relation_open(relid, AccessShareLock); > > > > > > Maybe we should take RowExclusiveLock, since we are going to modify > > > rows. Not sure how much it matters, though. > > > > > > > I don't know how it would make a difference, but still as you said > > replaced AccessShare with RowExclusive. > > > > > + if (!superuser() && GetUserId() != rel->rd_rel->relowner) > > > + ereport(ERROR, > > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > > + errmsg("must be superuser or object owner to use %s.", > > > + force_opt == FORCE_KILL ? "heap_force_kill()" : > > > + "heap_force_freeze()"))); > > > > > > This is the wrong way to do a permissions check, and it's also the > > > wrong way to write an error message about having failed one. To see > > > the correct method, grep for pg_class_aclcheck. However, I think that > > > we shouldn't in general trust the object owner to do this, unless the > > > super-user gave permission. This is a data-corrupting operation, and > > > only the boss is allowed to authorize it. So I think you should also > > > add REVOKE EXECUTE FROM PUBLIC statements to the SQL file, and then > > > have this check as a backup. Then, the superuser is always allowed, > > > and if they choose to GRANT EXECUTE on this function to some users, > > > those users can do it for their own relations, but not others. > > > > > > > Done. > > > > > + if (rel->rd_rel->relam != HEAP_TABLE_AM_OID) > > > + ereport(ERROR, > > > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > > + errmsg("only heap AM is supported"))); > > > + > > > + check_relation_relkind(rel); > > > > > > Seems like these checks are in the wrong order. > > > > I don't think there is anything wrong with the order. I can see the > > same order in other contrib modules as well. > > > > Also, maybe you could > > > call the function something like check_relation_ok() and put the > > > permissions test, the relkind test, and the relam test all inside of > > > it, just to tighten up the code in this main function a bit. > > > > > > > Yeah, I've added a couple of functions named sanity_check_relation and > > sanity_check_tid_array and shifted all the sanity checks there. > > > > > + if (noffs > maxoffset) > > > + ereport(ERROR, > > > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > > > + errmsg("number of offsets specified for block %u exceeds the max > > > offset number %u", > > > + blkno, maxoffset))); > > > > > > Hmm, this doesn't seem quite right. The actual problem is if an > > > individual item pointer's offset number is greater than maxoffset, > > > which can be true even if the total number of offsets is less than > > > maxoffset. So I think you need to remove this check and add a check > > > inside the loop which follows that offnos[i] is in range. > > > > > > > Agreed and done. > > > > > The way you've structured that loop is actually problematic -- I don't > > > think we want to be calling elog() or ereport() inside a critical > > > section. You could fix the case that checks for an invalid force_opt > > > by just doing if (op == HEAP_FORCE_KILL) { ... } else { Assert(op == > > > HEAP_FORCE_FREEZE); ... }, or by using a switch with no default. The > > > NOTICE case you have here is a bigger problem. > > > > Done. > > > > You really cannot > > > modify the buffer like this and then decide, oops, never mind, I think > > > I won't mark it dirty or write WAL for the changes. If you do that, > > > the buffer is still in memory, but it's now been modified. A > > > subsequent operation that modifies it will start with the altered > > > state you created here, quite possibly leading to WAL that cannot be > > > correctly replayed on the standby. In other words, you've got to > > > decide for certain whether you want to proceed with the operation > > > *before* you enter the critical section. You also need to emit any > > > messages before or after the critical section. So you could: > > > > > > > This is still not clear. I think Robert needs to respond to my earlier > comment. > > > > > I believe this violates our guidelines on message construction. Have > > > two completely separate messages -- and maybe lose the word "already": > > > > > > "skipping tid (%u, %u) because it is dead" > > > "skipping tid (%u, %u) because it is unused" > > > > > > The point of this is that it makes it easier for translators. > > > > > > > Done. > > > > > I see very little point in what verify_tid() is doing. Before using > > > each block number, we should check that it's less than or equal to a > > > cached value of RelationGetNumberOfBlocks(rel). That's necessary in > > > any case to avoid funny errors; and then the check here against > > > specifically InvalidBlockNumber is redundant. For the offset number, > > > same thing: we need to check each offset against the page's > > > PageGetMaxOffsetNumber(page); and if we do that then we don't need > > > these checks. > > > > > > > Done. > > > > Please check the attached patch for the changes. > > I also looked at this version patch and have some small comments: > > + Oid relid = PG_GETARG_OID(0); > + ArrayType *ta = PG_GETARG_ARRAYTYPE_P_COPY(1); > + ItemPointer tids; > + int ntids; > + Relation rel; > + Buffer buf; > + Page page; > + ItemId itemid; > + BlockNumber blkno; > + OffsetNumber *offnos; > + OffsetNumber offno, > + noffs, > + curr_start_ptr, > + next_start_ptr, > + maxoffset; > + int i, > + nskippedItems, > + nblocks; > > You declare all variables at the top of heap_force_common() function > but I think we can declare some variables such as buf, page inside of > the do loop. > > --- > + if (offnos[i] > maxoffset) > + { > + ereport(NOTICE, > + errmsg("skipping tid (%u, %u) because it > contains an invalid offset", > + blkno, offnos[i])); > + continue; > + } > > If all tids on a page take the above path, we will end up logging FPI > in spite of modifying nothing on the page. > > --- > + /* XLOG stuff */ > + if (RelationNeedsWAL(rel)) > + log_newpage_buffer(buf, true); > > I think we need to set the returned LSN by log_newpage_buffer() to the > page lsn. > > Regards, > > -- > Masahiko Sawada http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > >