Thanks Robert for the review. Please find my comments inline below:
On Fri, Aug 7, 2020 at 9:21 PM Robert Haas <[email protected]> wrote:
>
> On Thu, Aug 6, 2020 at 9:23 AM Ashutosh Sharma <[email protected]> wrote:
> > Attached v4 patch fixes the latest comments from Robert and Masahiko-san.
>
> Compiler warning:
>
> heap_surgery.c:136:13: error: comparison of unsigned expression < 0 is
> always false [-Werror,-Wtautological-compare]
> if (blkno < 0 || blkno >= nblocks)
> ~~~~~ ^ ~
>
Fixed.
> There's a certain inconsistency to these messages:
>
> rhaas=# create table foo (a int);
> CREATE TABLE
> rhaas=# insert into foo values (1);
> INSERT 0 1
> rhaas=# select heap_force_kill('foo'::regclass, array['(0,2)'::tid]);
> NOTICE: skipping tid (0, 2) because it contains an invalid offset
> heap_force_kill
> -----------------
>
> (1 row)
>
> rhaas=# select heap_force_kill('foo'::regclass, array['(1,0)'::tid]);
> ERROR: invalid item pointer
> LOCATION: tids_same_page_fetch_offnums, heap_surgery.c:347
> rhaas=# select heap_force_kill('foo'::regclass, array['(1,1)'::tid]);
> ERROR: block number 1 is out of range for relation "foo"
>
> From a user perspective it seems like I've made three very similar
> mistakes: in the first case, the offset is too high, in the second
> case it's too low, and in the third case the block number is out of
> range. But in one case I get a NOTICE and in the other two cases I get
> an ERROR. In one case I get the relation name and in the other two
> cases I don't. The two complaints about an invalid offset are phrased
> completely differently from each other. For example, suppose you do
> this:
>
> ERROR: tid (%u, %u) is invalid for relation "%s" because the block
> number is out of range (%u..%u)
> ERROR: tid (%u, %u) is invalid for relation "%s" because the item
> number is out of range for this block (%u..%u)
> ERROR: tid (%u, %u) is invalid for relation "%s" because the item is unused
> ERROR: tid (%u, %u) is invalid for relation "%s" because the item is dead
>
Corrected.
> I think I misled you when I said to use pg_class_aclcheck. I think it
> should actually be pg_class_ownercheck.
>
okay, I've changed it to pg_class_ownercheck.
> I think the relkind sanity check should permit RELKIND_MATVIEW also.
>
Yeah, actually we should allow MATVIEW, don't know why I thought of
blocking it earlier.
> It's unclear to me why the freeze logic here shouldn't do this part
> what heap_prepare_freeze_tuple() does when freezing xmax:
>
> frz->t_infomask2 &= ~HEAP_HOT_UPDATED;
> frz->t_infomask2 &= ~HEAP_KEYS_UPDATED;
>
Yeah, we should have these changes when freezing the xmax.
> Likewise, why should we not freeze or invalidate xvac in the case
> where tuple->t_infomask & HEAP_MOVED, as heap_prepare_freeze_tuple()
> would do?
>
Again, we should have this as well.
Apart from above, this time I've also added the documentation on
pg_surgery module and added a few more test-cases.
Attached patch with above changes.
--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com
From 2aa56b9632cf1d291f4433afd972dc647d354dcb Mon Sep 17 00:00:00 2001
From: ashu <[email protected]>
Date: Wed, 12 Aug 2020 14:38:14 +0530
Subject: [PATCH] Add contrib/pg_surgery to perform surgery on a damaged heap
table.
This contrib module basically adds a couple of functions named
heap_force_kill and heap_force_freeze that can be used in the scripts
to perform surgery on the damaged heap tables.
Ashutosh Sharma.
---
contrib/Makefile | 1 +
contrib/pg_surgery/Makefile | 23 ++
contrib/pg_surgery/expected/pg_surgery.out | 161 +++++++++++++
contrib/pg_surgery/heap_surgery.c | 375 +++++++++++++++++++++++++++++
contrib/pg_surgery/pg_surgery--1.0.sql | 18 ++
contrib/pg_surgery/pg_surgery.control | 5 +
contrib/pg_surgery/sql/pg_surgery.sql | 89 +++++++
doc/src/sgml/contrib.sgml | 1 +
doc/src/sgml/filelist.sgml | 1 +
doc/src/sgml/pgsurgery.sgml | 94 ++++++++
10 files changed, 768 insertions(+)
create mode 100644 contrib/pg_surgery/Makefile
create mode 100644 contrib/pg_surgery/expected/pg_surgery.out
create mode 100644 contrib/pg_surgery/heap_surgery.c
create mode 100644 contrib/pg_surgery/pg_surgery--1.0.sql
create mode 100644 contrib/pg_surgery/pg_surgery.control
create mode 100644 contrib/pg_surgery/sql/pg_surgery.sql
create mode 100644 doc/src/sgml/pgsurgery.sgml
diff --git a/contrib/Makefile b/contrib/Makefile
index 1846d41..07d5734 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -35,6 +35,7 @@ SUBDIRS = \
pg_standby \
pg_stat_statements \
pg_trgm \
+ pg_surgery \
pgcrypto \
pgrowlocks \
pgstattuple \
diff --git a/contrib/pg_surgery/Makefile b/contrib/pg_surgery/Makefile
new file mode 100644
index 0000000..ecf2e20
--- /dev/null
+++ b/contrib/pg_surgery/Makefile
@@ -0,0 +1,23 @@
+# contrib/pg_surgery/Makefile
+
+MODULE_big = pg_surgery
+OBJS = \
+ $(WIN32RES) \
+ heap_surgery.o
+
+EXTENSION = pg_surgery
+DATA = pg_surgery--1.0.sql
+PGFILEDESC = "pg_surgery - perform surgery on a damaged relation"
+
+REGRESS = pg_surgery
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/pg_surgery
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/pg_surgery/expected/pg_surgery.out b/contrib/pg_surgery/expected/pg_surgery.out
new file mode 100644
index 0000000..55f8dce
--- /dev/null
+++ b/contrib/pg_surgery/expected/pg_surgery.out
@@ -0,0 +1,161 @@
+create extension pg_surgery;
+--
+-- check that using heap_force_kill and heap_force_freeze functions with the
+-- supported relations passes.
+--
+-- normal heap table.
+begin;
+create table htab(a int);
+insert into htab values (100), (200), (300), (400), (500);
+select * from htab where xmin = 2;
+ a
+---
+(0 rows)
+
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 4)']::tid[]);
+ heap_force_freeze
+-------------------
+
+(1 row)
+
+select ctid, xmax from htab where xmin = 2;
+ ctid | xmax
+-------+------
+ (0,4) | 0
+(1 row)
+
+select heap_force_kill('htab'::regclass, ARRAY['(0, 4)']::tid[]);
+ heap_force_kill
+-----------------
+
+(1 row)
+
+select * from htab where ctid = '(0, 4)';
+ a
+---
+(0 rows)
+
+rollback;
+-- materialized view.
+begin;
+create materialized view mvw as select a from generate_series(1, 3) a;
+select * from mvw where xmin = 2;
+ a
+---
+(0 rows)
+
+select heap_force_freeze('mvw'::regclass, ARRAY['(0, 3)']::tid[]);
+ heap_force_freeze
+-------------------
+
+(1 row)
+
+select * from mvw where xmin = 2;
+ a
+---
+ 3
+(1 row)
+
+select heap_force_kill('mvw'::regclass, ARRAY['(0, 3)']::tid[]);
+ heap_force_kill
+-----------------
+
+(1 row)
+
+select * from mvw where ctid = '(0, 3)';
+ a
+---
+(0 rows)
+
+rollback;
+--
+-- check that using heap_force_kill and heap_force_freeze functions with the
+-- unsupported relations fails.
+--
+-- partitioned tables (the parent table) doesn't contain any tuple.
+create table ptab (a int) partition by list (a);
+select heap_force_kill('ptab'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR: only heap AM is supported
+select heap_force_freeze('ptab'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR: only heap AM is supported
+create index ptab_idx on ptab (a);
+-- indexes are not supported, should fail.
+select heap_force_kill('ptab_idx'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR: only heap AM is supported
+select heap_force_freeze('ptab_idx'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR: only heap AM is supported
+create view vw as select 1;
+-- views are not supported as well. so, all these should fail.
+select heap_force_kill('vw'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR: only heap AM is supported
+select heap_force_freeze('vw'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR: only heap AM is supported
+create sequence seq;
+-- sequences are not supported as well. so, all these functions should fail.
+select heap_force_kill('seq'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR: only heap AM is supported
+select heap_force_freeze('seq'::regclass, ARRAY['(0, 1)']::tid[]);
+ERROR: only heap AM is supported
+--
+-- Some negative test-cases with invalid inputs.
+--
+begin;
+create table htab(a int);
+-- invalid block number, should be skipped.
+select heap_force_kill('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+NOTICE: skipping block 0 for relation "htab" because the block number is out of range
+ heap_force_kill
+-----------------
+
+(1 row)
+
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+NOTICE: skipping block 0 for relation "htab" because the block number is out of range
+ heap_force_freeze
+-------------------
+
+(1 row)
+
+insert into htab values (10);
+-- invalid offset number, should be skipped.
+select heap_force_kill('htab'::regclass, ARRAY['(0, 2)']::tid[]);
+NOTICE: skipping tid (0, 2) for relation "htab" because the item number is out of range for this block
+ heap_force_kill
+-----------------
+
+(1 row)
+
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 2)']::tid[]);
+NOTICE: skipping tid (0, 2) for relation "htab" because the item number is out of range for this block
+ heap_force_freeze
+-------------------
+
+(1 row)
+
+-- dead tuple, should be skipped.
+select heap_force_kill('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+ heap_force_kill
+-----------------
+
+(1 row)
+
+select heap_force_kill('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+NOTICE: skipping tid (0, 1) for relation "htab" because it is marked dead
+ heap_force_kill
+-----------------
+
+(1 row)
+
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+NOTICE: skipping tid (0, 1) for relation "htab" because it is marked dead
+ heap_force_freeze
+-------------------
+
+(1 row)
+
+rollback;
+-- cleanup.
+drop table ptab;
+drop view vw;
+drop sequence seq;
+drop extension pg_surgery;
diff --git a/contrib/pg_surgery/heap_surgery.c b/contrib/pg_surgery/heap_surgery.c
new file mode 100644
index 0000000..20ac4fc
--- /dev/null
+++ b/contrib/pg_surgery/heap_surgery.c
@@ -0,0 +1,375 @@
+/*-------------------------------------------------------------------------
+ *
+ * heap_surgery.c
+ * Functions to perform surgery on the damaged heap table.
+ *
+ * Copyright (c) 2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ * contrib/pg_surgery/heap_surgery.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "access/heapam.h"
+#include "catalog/pg_am_d.h"
+#include "miscadmin.h"
+#include "storage/bufmgr.h"
+#include "utils/acl.h"
+#include "utils/rel.h"
+
+PG_MODULE_MAGIC;
+
+/* Options to forcefully change the state of a heap tuple. */
+typedef enum HeapTupleForceOption
+{
+ HEAP_FORCE_KILL,
+ HEAP_FORCE_FREEZE
+} HeapTupleForceOption;
+
+PG_FUNCTION_INFO_V1(heap_force_kill);
+PG_FUNCTION_INFO_V1(heap_force_freeze);
+
+static int32 tidcmp(const void *a, const void *b);
+static Datum heap_force_common(FunctionCallInfo fcinfo,
+ HeapTupleForceOption heap_force_opt);
+static void sanity_check_tid_array(ArrayType *ta, int *ntids);
+static void sanity_check_relation(Relation rel);
+static BlockNumber tids_same_page_fetch_offnums(ItemPointer tids, int ntids,
+ OffsetNumber *next_start_ptr,
+ OffsetNumber *offnos);
+
+/*-------------------------------------------------------------------------
+ * heap_force_kill()
+ *
+ * Force kill the tuple(s) pointed to by the item pointer(s) stored in the
+ * given tid array.
+ *
+ * Usage: SELECT heap_force_kill(regclass, tid[]);
+ *-------------------------------------------------------------------------
+ */
+Datum
+heap_force_kill(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_DATUM(heap_force_common(fcinfo, HEAP_FORCE_KILL));
+}
+
+/*-------------------------------------------------------------------------
+ * heap_force_freeze()
+ *
+ * Force freeze the tuple(s) pointed to by the item pointer(s) stored in the
+ * given tid array.
+ *
+ * Usage: SELECT heap_force_freeze(regclass, tid[]);
+ *-------------------------------------------------------------------------
+ */
+Datum
+heap_force_freeze(PG_FUNCTION_ARGS)
+{
+ PG_RETURN_DATUM(heap_force_common(fcinfo, HEAP_FORCE_FREEZE));
+}
+
+/*-------------------------------------------------------------------------
+ * heap_force_common()
+ *
+ * Common code for heap_force_kill and heap_force_freeze
+ *-------------------------------------------------------------------------
+ */
+static Datum
+heap_force_common(FunctionCallInfo fcinfo, HeapTupleForceOption heap_force_opt)
+{
+ Oid relid = PG_GETARG_OID(0);
+ ArrayType *ta = PG_GETARG_ARRAYTYPE_P_COPY(1);
+ ItemPointer tids;
+ int ntids,
+ nblocks;
+ Relation rel;
+ OffsetNumber *offnos;
+ OffsetNumber noffs,
+ curr_start_ptr,
+ next_start_ptr;
+
+ if (RecoveryInProgress())
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("recovery is in progress"),
+ errhint("heap surgery functions cannot be executed during recovery.")));
+
+ /* Basic sanity checking. */
+ sanity_check_tid_array(ta, &ntids);
+
+ rel = relation_open(relid, RowExclusiveLock);
+
+ sanity_check_relation(rel);
+
+ tids = ((ItemPointer) ARR_DATA_PTR(ta));
+
+ /*
+ * If there is more than one tid in the array, sort it so that we can
+ * easily fetch all the tids belonging to one particular page from the
+ * array.
+ */
+ if (ntids > 1)
+ qsort((void*) tids, ntids, sizeof(ItemPointerData), tidcmp);
+
+ offnos = (OffsetNumber *) palloc(ntids * sizeof(OffsetNumber));
+ noffs = curr_start_ptr = next_start_ptr = 0;
+ nblocks = RelationGetNumberOfBlocks(rel);
+
+ do
+ {
+ Buffer buf;
+ Page page;
+ BlockNumber blkno;
+ OffsetNumber maxoffset;
+ int i;
+ bool did_modify_page = false;
+
+ /*
+ * Get the offset numbers from the tids belonging to one particular page
+ * and process them one by one.
+ */
+ blkno = tids_same_page_fetch_offnums(tids, ntids, &next_start_ptr,
+ offnos);
+
+ /* Calculate the number of offsets stored in offnos array. */
+ noffs = next_start_ptr - curr_start_ptr;
+
+ /*
+ * Update the current start pointer so that next time when
+ * tids_same_page_fetch_offnums() is called, we can calculate the number
+ * of offsets present in the offnos array.
+ */
+ curr_start_ptr = next_start_ptr;
+
+ /* Check whether the block number is valid. */
+ if (blkno >= nblocks)
+ {
+ ereport(NOTICE,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("skipping block %u for relation \"%s\" because the block number is out of range",
+ blkno, RelationGetRelationName(rel))));
+ continue;
+ }
+
+ CHECK_FOR_INTERRUPTS();
+
+ buf = ReadBuffer(rel, blkno);
+ LockBufferForCleanup(buf);
+
+ page = BufferGetPage(buf);
+
+ maxoffset = PageGetMaxOffsetNumber(page);
+
+ /* No ereport(ERROR) from here until all the changes are logged. */
+ START_CRIT_SECTION();
+
+ for (i = 0; i < noffs; i++)
+ {
+ OffsetNumber offno;
+ ItemId itemid;
+
+ if (offnos[i] == 0 || offnos[i] > maxoffset)
+ {
+ ereport(NOTICE,
+ errmsg("skipping tid (%u, %u) for relation \"%s\" because the item number is out of range for this block",
+ blkno, offnos[i], RelationGetRelationName(rel)));
+ continue;
+ }
+
+ itemid = PageGetItemId(page, offnos[i]);
+
+ /* Follow any redirections until we find something useful. */
+ while (ItemIdIsRedirected(itemid))
+ {
+ offno = ItemIdGetRedirect(itemid);
+ itemid = PageGetItemId(page, offno);
+ CHECK_FOR_INTERRUPTS();
+ }
+
+ /* Nothing to do if the itemid is unused or already dead. */
+ if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
+ {
+ if (!ItemIdIsUsed(itemid))
+ ereport(NOTICE,
+ (errmsg("skipping tid (%u, %u) for relation \"%s\" because it is marked unused",
+ blkno, offnos[i], RelationGetRelationName(rel))));
+ else
+ ereport(NOTICE,
+ (errmsg("skipping tid (%u, %u) for relation \"%s\" because it is marked dead",
+ blkno, offnos[i], RelationGetRelationName(rel))));
+ continue;
+ }
+
+ Assert(ItemIdIsNormal(itemid));
+
+ did_modify_page = true;
+
+ if (heap_force_opt == HEAP_FORCE_KILL)
+ ItemIdSetDead(itemid);
+ else
+ {
+ HeapTupleHeader htup;
+
+ Assert(heap_force_opt == HEAP_FORCE_FREEZE);
+
+ htup = (HeapTupleHeader) PageGetItem(page, itemid);
+
+ /* We might have MOVED_OFF/MOVED_IN tuples in the database */
+ if (htup->t_infomask & HEAP_MOVED)
+ {
+ if (htup->t_infomask & HEAP_MOVED_OFF)
+ HeapTupleHeaderSetXvac(htup, InvalidTransactionId);
+ else
+ HeapTupleHeaderSetXvac(htup, FrozenTransactionId);
+ }
+
+ HeapTupleHeaderSetXmin(htup, FrozenTransactionId);
+ HeapTupleHeaderSetXmax(htup, InvalidTransactionId);
+
+ /*
+ * Clear all the visibility-related bits of this tuple and mark
+ * it as frozen. Also, get rid of HOT_UPDATED and KEYS_UPDATES
+ * bits.
+ */
+ htup->t_infomask &= ~HEAP_XACT_MASK;
+ htup->t_infomask |= (HEAP_XMIN_FROZEN | HEAP_XMAX_INVALID);
+ htup->t_infomask2 &= ~HEAP_HOT_UPDATED;
+ htup->t_infomask2 &= ~HEAP_KEYS_UPDATED;
+ }
+ }
+
+ /*
+ * We do not mark the buffer dirty or do WAL logging for unmodifed
+ * pages.
+ */
+ if (!did_modify_page)
+ goto skip_wal;
+
+ /* Mark buffer dirty before we write WAL. */
+ MarkBufferDirty(buf);
+
+ /* XLOG stuff */
+ if (RelationNeedsWAL(rel))
+ log_newpage_buffer(buf, true);
+
+skip_wal:
+ END_CRIT_SECTION();
+
+ UnlockReleaseBuffer(buf);
+ } while (next_start_ptr != ntids);
+
+ relation_close(rel, RowExclusiveLock);
+
+ pfree(ta);
+ pfree(offnos);
+
+ PG_RETURN_VOID();
+}
+
+/*-------------------------------------------------------------------------
+ * tidcmp()
+ *
+ * Compare two item pointers, return -1, 0, or +1.
+ *
+ * See ItemPointerCompare for details.
+ * ------------------------------------------------------------------------
+ */
+static int32
+tidcmp(const void *a, const void *b)
+{
+ ItemPointer iptr1 = ((const ItemPointer) a);
+ ItemPointer iptr2 = ((const ItemPointer) b);
+
+ return ItemPointerCompare(iptr1, iptr2);
+}
+
+/*-------------------------------------------------------------------------
+ * sanity_check_tid_array()
+ *
+ * Perform sanity check on the given tid array.
+ * ------------------------------------------------------------------------
+ */
+static void
+sanity_check_tid_array(ArrayType *ta, int *ntids)
+{
+ if (ARR_HASNULL(ta) && array_contains_nulls(ta))
+ ereport(ERROR,
+ (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED),
+ errmsg("array must not contain nulls")));
+
+ *ntids = ArrayGetNItems(ARR_NDIM(ta), ARR_DIMS(ta));
+
+ if (*ntids == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("empty tid array")));
+}
+
+/*-------------------------------------------------------------------------
+ * sanity_check_relation()
+ *
+ * Perform sanity check on the given relation.
+ * ------------------------------------------------------------------------
+ */
+static void
+sanity_check_relation(Relation rel)
+{
+ if (rel->rd_rel->relam != HEAP_TABLE_AM_OID)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("only heap AM is supported")));
+
+ if (rel->rd_rel->relkind != RELKIND_RELATION &&
+ rel->rd_rel->relkind != RELKIND_MATVIEW &&
+ rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+ RelationGetRelationName(rel))));
+
+ /* Must be owner of the table or superuser. */
+ if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId()))
+ aclcheck_error(ACLCHECK_NOT_OWNER,
+ get_relkind_objtype(rel->rd_rel->relkind),
+ RelationGetRelationName(rel));
+}
+
+/*-------------------------------------------------------------------------
+ * tids_same_page_fetch_offnums()
+ *
+ * Find out all the tids residing in the same page as tids[next_start_ptr] and
+ * fetch the offset number stored in each of them into a caller-allocated offset
+ * number array.
+ * ------------------------------------------------------------------------
+ */
+static BlockNumber
+tids_same_page_fetch_offnums(ItemPointer tids, int ntids,
+ OffsetNumber *next_start_ptr, OffsetNumber *offnos)
+{
+ int i;
+ BlockNumber prev_blkno,
+ blkno;
+ OffsetNumber offno;
+
+ prev_blkno = blkno = InvalidBlockNumber;
+
+ for (i = *next_start_ptr; i < ntids; i++)
+ {
+ ItemPointerData tid = tids[i];
+
+ blkno = ItemPointerGetBlockNumberNoCheck(&tid);
+ offno = ItemPointerGetOffsetNumberNoCheck(&tid);
+
+ if (i == *next_start_ptr || (prev_blkno == blkno))
+ offnos[i - *next_start_ptr] = offno;
+ else
+ break;
+
+ prev_blkno = blkno;
+ }
+
+ *next_start_ptr = i;
+ return prev_blkno;
+}
\ No newline at end of file
diff --git a/contrib/pg_surgery/pg_surgery--1.0.sql b/contrib/pg_surgery/pg_surgery--1.0.sql
new file mode 100644
index 0000000..2ae7f22
--- /dev/null
+++ b/contrib/pg_surgery/pg_surgery--1.0.sql
@@ -0,0 +1,18 @@
+/* contrib/pg_surgery/pg_surgery--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION pg_surgery" to load this file. \quit
+
+CREATE FUNCTION heap_force_kill(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_kill'
+LANGUAGE C STRICT;
+
+REVOKE EXECUTE ON FUNCTION heap_force_kill(regclass, tid[]) FROM PUBLIC;
+
+CREATE FUNCTION heap_force_freeze(reloid regclass, tids tid[])
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'heap_force_freeze'
+LANGUAGE C STRICT;
+
+REVOKE EXECUTE ON FUNCTION heap_force_freeze(regclass, tid[]) FROM PUBLIC;
\ No newline at end of file
diff --git a/contrib/pg_surgery/pg_surgery.control b/contrib/pg_surgery/pg_surgery.control
new file mode 100644
index 0000000..2bcdad1
--- /dev/null
+++ b/contrib/pg_surgery/pg_surgery.control
@@ -0,0 +1,5 @@
+# pg_surgery extension
+comment = 'extension to perform surgery on a damaged relation'
+default_version = '1.0'
+module_pathname = '$libdir/pg_surgery'
+relocatable = true
diff --git a/contrib/pg_surgery/sql/pg_surgery.sql b/contrib/pg_surgery/sql/pg_surgery.sql
new file mode 100644
index 0000000..4aef380
--- /dev/null
+++ b/contrib/pg_surgery/sql/pg_surgery.sql
@@ -0,0 +1,89 @@
+create extension pg_surgery;
+
+--
+-- check that using heap_force_kill and heap_force_freeze functions with the
+-- supported relations passes.
+--
+
+-- normal heap table.
+begin;
+create table htab(a int);
+insert into htab values (100), (200), (300), (400), (500);
+
+select * from htab where xmin = 2;
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 4)']::tid[]);
+select ctid, xmax from htab where xmin = 2;
+
+select heap_force_kill('htab'::regclass, ARRAY['(0, 4)']::tid[]);
+select * from htab where ctid = '(0, 4)';
+rollback;
+
+-- materialized view.
+begin;
+create materialized view mvw as select a from generate_series(1, 3) a;
+
+select * from mvw where xmin = 2;
+select heap_force_freeze('mvw'::regclass, ARRAY['(0, 3)']::tid[]);
+select * from mvw where xmin = 2;
+
+select heap_force_kill('mvw'::regclass, ARRAY['(0, 3)']::tid[]);
+select * from mvw where ctid = '(0, 3)';
+rollback;
+
+--
+-- check that using heap_force_kill and heap_force_freeze functions with the
+-- unsupported relations fails.
+--
+
+-- partitioned tables (the parent table) doesn't contain any tuple.
+create table ptab (a int) partition by list (a);
+
+select heap_force_kill('ptab'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('ptab'::regclass, ARRAY['(0, 1)']::tid[]);
+
+create index ptab_idx on ptab (a);
+
+-- indexes are not supported, should fail.
+select heap_force_kill('ptab_idx'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('ptab_idx'::regclass, ARRAY['(0, 1)']::tid[]);
+
+create view vw as select 1;
+
+-- views are not supported as well. so, all these should fail.
+select heap_force_kill('vw'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('vw'::regclass, ARRAY['(0, 1)']::tid[]);
+
+create sequence seq;
+
+-- sequences are not supported as well. so, all these functions should fail.
+select heap_force_kill('seq'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('seq'::regclass, ARRAY['(0, 1)']::tid[]);
+
+--
+-- Some negative test-cases with invalid inputs.
+--
+begin;
+create table htab(a int);
+
+-- invalid block number, should be skipped.
+select heap_force_kill('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+
+insert into htab values (10);
+
+-- invalid offset number, should be skipped.
+select heap_force_kill('htab'::regclass, ARRAY['(0, 2)']::tid[]);
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 2)']::tid[]);
+
+-- dead tuple, should be skipped.
+select heap_force_kill('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_kill('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+select heap_force_freeze('htab'::regclass, ARRAY['(0, 1)']::tid[]);
+
+rollback;
+
+-- cleanup.
+drop table ptab;
+drop view vw;
+drop sequence seq;
+drop extension pg_surgery;
diff --git a/doc/src/sgml/contrib.sgml b/doc/src/sgml/contrib.sgml
index 261a559..c82dde2 100644
--- a/doc/src/sgml/contrib.sgml
+++ b/doc/src/sgml/contrib.sgml
@@ -125,6 +125,7 @@ CREATE EXTENSION <replaceable>module_name</replaceable>;
&pgrowlocks;
&pgstatstatements;
&pgstattuple;
+ &pgsurgery;
&pgtrgm;
&pgvisibility;
&postgres-fdw;
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 64b5da0..828396d 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -139,6 +139,7 @@
<!ENTITY pgstandby SYSTEM "pgstandby.sgml">
<!ENTITY pgstatstatements SYSTEM "pgstatstatements.sgml">
<!ENTITY pgstattuple SYSTEM "pgstattuple.sgml">
+<!ENTITY pgsurgery SYSTEM "pgsurgery.sgml">
<!ENTITY pgtrgm SYSTEM "pgtrgm.sgml">
<!ENTITY pgvisibility SYSTEM "pgvisibility.sgml">
<!ENTITY postgres-fdw SYSTEM "postgres-fdw.sgml">
diff --git a/doc/src/sgml/pgsurgery.sgml b/doc/src/sgml/pgsurgery.sgml
new file mode 100644
index 0000000..92b9060
--- /dev/null
+++ b/doc/src/sgml/pgsurgery.sgml
@@ -0,0 +1,94 @@
+<!-- doc/src/sgml/pgsurgery.sgml -->
+
+<sect1 id="pgsurgery" xreflabel="pg_surgery">
+ <title>pg_surgery</title>
+
+ <indexterm zone="pgsurgery">
+ <primary>pg_surgery</primary>
+ </indexterm>
+
+ <para>
+ The <filename>pg_surgery</filename> module provides various functions to
+ perform surgery on a damaged relation.
+ </para>
+
+ <para>
+ By default, only the superusers have <literal>EXECUTE</literal> privilege on
+ the functions provided by this module. After the extension has been installed,
+ users may issue <command>GRANT</command> commands to change the privileges on
+ the functions to allow others to execute them.
+ </para>
+
+ <sect2>
+ <title>Functions</title>
+
+ <variablelist>
+ <varlistentry>
+ <term>
+ <function>heap_force_kill(regclass, tid[]) returns void</function>
+ </term>
+
+ <listitem>
+ <para>
+ <function>heap_force_kill</function> marks a <quote>used</quote> line
+ pointer as <quote>dead</quote> without examining the tuple. This may help
+ users to force-kill a corrupted tuple in the table. It takes the relation
+ oid and an array of item pointers as an input arguments. For example:
+<programlisting>
+test=> select * from t1 where ctid = '(0, 1)';
+-[ RECORD 1 ]
+a | 10
+
+test=# select heap_force_kill('t1'::regclass, ARRAY['(0, 1)']::tid[]);
+-[ RECORD 1 ]---+-
+heap_force_kill |
+
+test=# select * from t1 where ctid = '(0, 1)';
+(0 rows)
+</programlisting>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>
+ <function>heap_force_freeze(regclass, tid[]) returns void</function>
+ </term>
+
+ <listitem>
+ <para>
+ <function>heap_force_freeze</function> marks a live tuple as frozen by
+ setting its xmin to FrozenTransactionId and xmax to InvalidTransactionId.
+ It clears all the visibility related bits in the infomask of the tuple and
+ sets the infomask flag to HEAP_XMIN_FROZEN and HEAP_XMAX_INVALID. This may
+ help users to force-freeze a corrupted tuple in the table. Similar to
+ <function>heap_force_kill</function> function, this also takes the relation
+ oid and an array of item pointers as an input arguments. For example:
+<programlisting>
+test=> select ctid from t1 where xmin = 2;
+(0 rows)
+
+test=# select heap_force_freeze('t1'::regclass, ARRAY['(0, 1)']::tid[]);
+-[ RECORD 1 ]-----+-
+heap_force_freeze |
+
+test=# select ctid from t1 where xmin = 2;
+-[ RECORD 1 ]
+ctid | (0,1)
+</programlisting>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </sect2>
+
+ <sect2>
+ <title>Authors</title>
+
+ <para>
+ Ashutosh Sharma <email>[email protected]</email>
+ </para>
+ </sect2>
+
+</sect1>
--
1.8.3.1