Re: [HACKERS] pageinspect patch, for showing tuple data

Michael Paquier Wed, 28 Oct 2015 09:00:13 -0700

On Sat, Oct 17, 2015 at 1:48 AM, Michael Paquier wrote:
> On Sat, Oct 17, 2015 at 5:15 AM, Nikolay Shaplov wrote:
>> Or it's ready to commit, and just not marked this way?
>
> No, I don't think we have reached this state yet.
>
>> I am going to make report based on this patch in Vienna. It would be
>> nice to have it committed until then :)
>
> This is registered in this November's CF and the current one is not
> completely wrapped up, so I haven't rushed into looking at it.


So, I have finally been able to look at this patch in more details,
resulting in the attached. I noticed a couple of things that should be
addressed, mainly:
- addition of a new routine text_to_bits to perform the reverse
operation of bits_to_text. This was previously part of
tuple_data_split, I think that it deserves its own function.
- split_tuple_data should be static
- t_bits_str should not be a text, rather a char* fetched using
text_to_cstring(PG_GETARG_TEXT_PP(4)). This way there is no need to
perform extra calculations with VARSIZE and VARHDRSZ
- split_tuple_data can directly use the relation OID instead of the
tuple descriptor of the relation
- t_bits was leaking memory. For correctness I think that it is better
to free it after calling split_tuple_data.
- PG_DETOAST_DATUM_COPY allocates some memory, this was leaking as
well in raw_attr actually. I refactored the code such as a bytea* is
used and always freed when allocated to avoid leaks. Removing raw_attr
actually simplified the code a bit.
- I simplified the docs, that was largely too verbose in my opinion.
- Instead of using VARATT_IS_1B_E and VARTAG_EXTERNAL, using
VARATT_IS_EXTERNAL and VARATT_IS_EXTERNAL_ONDISK seems more adapted to
me, those other ones are much more low-level and not really spread in
the backend code.
- Found some typos in the code, the docs and some comments. I reworked
the error messages as well.
- The code format was not really in line with the project guidelines.
I fixed that as well.
- I removed heap_page_item_attrs for now to get this patch in a
bare-bone state. Though I would not mind if this is re-added (I
personally don't think that's much necessary in the module
actually...).
- The calculation of the length of t_bits using HEAP_NATTS_MASK is
more correct as you mentioned earlier, so I let it in its state.
That's actually more accurate for error handling as well.
That's everything I recall I have. How does this look?
-- 
Michael

diff --git a/contrib/pageinspect/Makefile b/contrib/pageinspect/Makefile
index aec5258..91ab119 100644
--- a/contrib/pageinspect/Makefile
+++ b/contrib/pageinspect/Makefile
@@ -5,9 +5,9 @@ OBJS		= rawpage.o heapfuncs.o btreefuncs.o fsmfuncs.o \
 		  brinfuncs.o ginfuncs.o $(WIN32RES)
 
 EXTENSION = pageinspect
-DATA = pageinspect--1.3.sql pageinspect--1.2--1.3.sql \
-	pageinspect--1.1--1.2.sql pageinspect--1.0--1.1.sql \
-	pageinspect--unpackaged--1.0.sql
+DATA = pageinspect--1.4.sql pageinspect--1.3--1.4.sql \
+	pageinspect--1.2--1.3.sql pageinspect--1.1--1.2.sql \
+	pageinspect--1.0--1.1.sql pageinspect--unpackaged--1.0.sql
 PGFILEDESC = "pageinspect - functions to inspect contents of database pages"
 
 ifdef USE_PGXS
diff --git a/contrib/pageinspect/heapfuncs.c b/contrib/pageinspect/heapfuncs.c
index 8d1666c..15fd7f1 100644
--- a/contrib/pageinspect/heapfuncs.c
+++ b/contrib/pageinspect/heapfuncs.c
@@ -27,8 +27,11 @@
 
 #include "access/htup_details.h"
 #include "funcapi.h"
-#include "utils/builtins.h"
+#include "catalog/pg_type.h"
 #include "miscadmin.h"
+#include "utils/array.h"
+#include "utils/builtins.h"
+#include "utils/rel.h"
 
 
 /*
@@ -55,6 +58,42 @@ bits_to_text(bits8 *bits, int len)
 
 
 /*
+ * text_to_bits
+ *
+ * Converts a c-string representation of bits into a bits8-array. This is
+ * the reverse operation of previous routine.
+ */
+static bits8 *
+text_to_bits(char *str, int len)
+{
+	bits8	   *bits;
+	int			off = 0;
+	char		byte = 0;
+
+	bits = palloc(len + 1);
+
+	while (off < len)
+	{
+		if (off % 8 == 0)
+			byte = 0;
+
+		if ((str[off] == '0') || (str[off] == '1'))
+			byte = byte | ((str[off] - '0') << off % 8);
+		else
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("illegal character '%c' in t_bits string", str[off])));
+
+		if (off % 8 == 7)
+			bits[off / 8] = byte;
+
+		off++;
+	}
+
+	return bits;
+}
+
+/*
  * heap_page_items
  *
  * Allows inspection of line pointers and tuple headers of a heap page.
@@ -122,8 +161,8 @@ heap_page_items(PG_FUNCTION_ARGS)
 		HeapTuple	resultTuple;
 		Datum		result;
 		ItemId		id;
-		Datum		values[13];
-		bool		nulls[13];
+		Datum		values[14];
+		bool		nulls[14];
 		uint16		lp_offset;
 		uint16		lp_flags;
 		uint16		lp_len;
@@ -154,7 +193,8 @@ heap_page_items(PG_FUNCTION_ARGS)
 			lp_offset + lp_len <= raw_page_size)
 		{
 			HeapTupleHeader tuphdr;
-			int			bits_len;
+			bytea		*tuple_data_bytea;
+			int			tuple_data_len;
 
 			/* Extract information from the tuple header */
 
@@ -168,6 +208,14 @@ heap_page_items(PG_FUNCTION_ARGS)
 			values[9] = UInt32GetDatum(tuphdr->t_infomask);
 			values[10] = UInt8GetDatum(tuphdr->t_hoff);
 
+			/* Copy raw tuple data into bytea attribute */
+			tuple_data_len = lp_len - tuphdr->t_hoff;
+			tuple_data_bytea = (bytea *) palloc(tuple_data_len + VARHDRSZ);
+			SET_VARSIZE(tuple_data_bytea, tuple_data_len + VARHDRSZ);
+			memcpy(VARDATA(tuple_data_bytea), (char *) tuphdr + tuphdr->t_hoff,
+					 tuple_data_len);
+			values[13] = PointerGetDatum(tuple_data_bytea);
+
 			/*
 			 * We already checked that the item is completely within the raw
 			 * page passed to us, with the length given in the line pointer.
@@ -180,11 +228,11 @@ heap_page_items(PG_FUNCTION_ARGS)
 			{
 				if (tuphdr->t_infomask & HEAP_HASNULL)
 				{
-					bits_len = tuphdr->t_hoff -
-						offsetof(HeapTupleHeaderData, t_bits);
+					int	bits_len =
+							((tuphdr->t_infomask2 & HEAP_NATTS_MASK) / 8 + 1) * 8;
 
 					values[11] = CStringGetTextDatum(
-								 bits_to_text(tuphdr->t_bits, bits_len * 8));
+								 bits_to_text(tuphdr->t_bits, bits_len));
 				}
 				else
 					nulls[11] = true;
@@ -208,7 +256,7 @@ heap_page_items(PG_FUNCTION_ARGS)
 			 */
 			int			i;
 
-			for (i = 4; i <= 12; i++)
+			for (i = 4; i <= 13; i++)
 				nulls[i] = true;
 		}
 
@@ -223,3 +271,205 @@ heap_page_items(PG_FUNCTION_ARGS)
 	else
 		SRF_RETURN_DONE(fctx);
 }
+
+/*
+ * split_tuple_data
+ *
+ * Split raw tuple data taken directly from a page into an array of bytea
+ * elements. This routine does a lookup on NULL balues and creates array
+ * elements accordindly. This is a simplified reimplementation of
+ * nocachegetattr() in heaptuple.c simplified for educational purposes.
+ */
+static Datum
+split_tuple_data(Oid relid, char *tupdata,
+				 uint16 tupdata_len, uint16 t_infomask,
+				 uint16 t_infomask2, bits8 *t_bits,
+				 bool do_detoast)
+{
+	ArrayBuildState	   *raw_attrs;
+	int 				nattrs;
+	int					i;
+	int					off = 0;
+	Relation			rel;
+	TupleDesc			tupdesc;
+
+	/* Get tuple descriptor from relation OID */
+	rel = relation_open(relid, NoLock);
+	tupdesc = CreateTupleDescCopyConstr(rel->rd_att);
+	relation_close(rel, NoLock);
+
+	raw_attrs = initArrayResult(BYTEAOID, CurrentMemoryContext, false);
+	nattrs = tupdesc->natts;
+
+	if (nattrs < (t_infomask2 & HEAP_NATTS_MASK))
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("number of attributes in tuple header is greater than number of attributes in tuple descriptor")));
+
+	for (i = 0; i < nattrs; i++)
+	{
+		Form_pg_attribute	attr;
+		bool				is_null;
+		bytea			   *attr_data = NULL;
+
+		attr = tupdesc->attrs[i];
+		is_null = (t_infomask & HEAP_HASNULL) && att_isnull(i, t_bits);
+
+		/*
+		 * Tuple header can specify less attributes than tuple descriptor
+		 * as ALTER TABLE ADD COLUMN without DEFAULT keyword does not
+		 * actually change tuples in pages, so attributes with numbers greater
+		 * than (t_infomask2 & HEAP_NATTS_MASK) should be treated as NULL.
+		 */
+		if (i >= (t_infomask2 & HEAP_NATTS_MASK))
+			is_null = true;
+
+		if (!is_null)
+		{
+			int		len;
+
+			if (attr->attlen == -1)
+			{
+				off = att_align_pointer(off, tupdesc->attrs[i]->attalign, -1,
+										tupdata + off);
+				/*
+				 * As VARSIZE_ANY throws an exception if it can't properly detect
+				 * the type of external storage in macros VARTAG_SIZE, this check
+				 * is repeated to have a nicer error handling.
+				 */
+				if (VARATT_IS_EXTERNAL(tupdata + off) &&
+					!VARATT_IS_EXTERNAL_ONDISK(tupdata + off) &&
+					!VARATT_IS_EXTERNAL_INDIRECT(tupdata + off))
+					ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg("first byte of varlen attribute is incorrect for attribute %d", i)));
+
+				len = VARSIZE_ANY(tupdata + off);
+			}
+			else
+			{
+				off = att_align_nominal(off, tupdesc->attrs[i]->attalign);
+				len = attr->attlen;
+			}
+
+			if (tupdata_len < off + len)
+				ereport(ERROR,
+						(errcode(ERRCODE_DATA_CORRUPTED),
+						 errmsg("unexpected end of tuple data")));
+
+			if (attr->attlen == -1 && do_detoast)
+				attr_data = DatumGetByteaPCopy(tupdata + off);
+			else
+			{
+				attr_data = (bytea *) palloc(len + VARHDRSZ);
+				SET_VARSIZE(attr_data, len + VARHDRSZ);
+				memcpy(VARDATA(attr_data), tupdata + off, len);
+			}
+
+			off = att_addlength_pointer(off, tupdesc->attrs[i]->attlen,
+										tupdata + off);
+		}
+
+		raw_attrs = accumArrayResult(raw_attrs, PointerGetDatum(attr_data),
+									 is_null, BYTEAOID, CurrentMemoryContext);
+		if (attr_data)
+			pfree(attr_data);
+	}
+
+	if (tupdata_len != off)
+		ereport(ERROR,
+				(errcode(ERRCODE_DATA_CORRUPTED),
+				 errmsg("end of tuple reached without looking at all its data")));
+
+	return makeArrayResult(raw_attrs, CurrentMemoryContext);
+}
+
+/*
+ * tuple_data_split
+ *
+ * Split raw tuple data taken directly from page into distinct elements
+ * taking into account null values.
+ */
+PG_FUNCTION_INFO_V1(tuple_data_split);
+
+Datum
+tuple_data_split(PG_FUNCTION_ARGS)
+{
+	Oid				relid;
+	bytea		   *raw_data;
+	uint16			t_infomask;
+	uint16			t_infomask2;
+	char		   *t_bits_str;
+	bool			do_detoast = false;
+	bits8		   *t_bits = NULL;
+	Datum			res;
+
+	relid = PG_GETARG_OID(0);
+	raw_data = PG_ARGISNULL(1) ? NULL : PG_GETARG_BYTEA_P(1);
+	t_infomask = PG_GETARG_INT16(2);
+	t_infomask2 = PG_GETARG_INT16(3);
+	t_bits_str = PG_ARGISNULL(4) ? NULL :
+		text_to_cstring(PG_GETARG_TEXT_PP(4));
+
+	if (PG_NARGS() >= 6)
+		do_detoast = PG_GETARG_BOOL(5);
+
+	if (!superuser())
+		ereport(ERROR,
+				(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+				 errmsg("must be superuser to use raw page functions")));
+
+	if (!raw_data)
+		PG_RETURN_NULL();
+
+	/*
+	 * Convert t_bits string back to the bits8 array as represented in the
+	 * tuple header.
+	 */
+	if (t_infomask & HEAP_HASNULL)
+	{
+		int		bits_str_len;
+		int		bits_len;
+
+		bits_len = (t_infomask2 & HEAP_NATTS_MASK) / 8 + 1;
+		if (!t_bits_str)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("argument of t_bits is null, but it is expected to be null and %i character long",
+							bits_len * 8)));
+
+		bits_str_len = strlen(t_bits_str);
+		if ((bits_str_len % 8) != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("length of t_bits is not a multiple of eight")));
+
+		if (bits_len * 8 != bits_str_len)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("unexpected length of t_bits %u, expected %i",
+							bits_str_len, bits_len * 8)));
+
+		/* do the conversion */
+		t_bits = text_to_bits(t_bits_str, bits_str_len);
+	}
+	else
+	{
+		if (t_bits_str)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATA_CORRUPTED),
+					 errmsg("t_bits string is expected to be NULL, but instead it is %lu bytes length",
+							strlen(t_bits_str))));
+	}
+
+	/* Split tuple data */
+	res = split_tuple_data(relid, (char *) raw_data + VARHDRSZ,
+						   VARSIZE(raw_data) - VARHDRSZ,
+						   t_infomask, t_infomask2, t_bits,
+						   do_detoast);
+
+	if (t_bits)
+		pfree(t_bits);
+
+	PG_RETURN_ARRAYTYPE_P(res);
+}
diff --git a/contrib/pageinspect/pageinspect--1.3--1.4.sql b/contrib/pageinspect/pageinspect--1.3--1.4.sql
new file mode 100644
index 0000000..ea8f356
--- /dev/null
+++ b/contrib/pageinspect/pageinspect--1.3--1.4.sql
@@ -0,0 +1,49 @@
+/* contrib/pageinspect/pageinspect--1.3--1.4.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pageinspect UPDATE TO '1.4'" to load this file. \quit
+
+--
+-- heap_page_items()
+--
+DROP FUNCTION heap_page_items(bytea);
+CREATE FUNCTION heap_page_items(IN page bytea,
+    OUT lp smallint,
+    OUT lp_off smallint,
+    OUT lp_flags smallint,
+    OUT lp_len smallint,
+    OUT t_xmin xid,
+    OUT t_xmax xid,
+    OUT t_field3 int4,
+    OUT t_ctid tid,
+    OUT t_infomask2 integer,
+    OUT t_infomask integer,
+    OUT t_hoff smallint,
+    OUT t_bits text,
+    OUT t_oid oid,
+    OUT t_data bytea)
+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'heap_page_items'
+LANGUAGE C STRICT;
+
+--
+-- tuple_data_split()
+--
+CREATE FUNCTION tuple_data_split(t_oid rel_oid,
+    t_data bytea,
+    t_infomask integer,
+    t_infomask2 integer,
+    t_bits text)
+RETURNS bytea[]
+AS 'MODULE_PATHNAME','tuple_data_split'
+LANGUAGE C;
+
+CREATE FUNCTION tuple_data_split(t_oid rel_oid,
+    t_data bytea,
+    t_infomask integer,
+    t_infomask2 integer,
+    t_bits text,
+    do_detoast bool)
+RETURNS bytea[]
+AS 'MODULE_PATHNAME','tuple_data_split'
+LANGUAGE C;
diff --git a/contrib/pageinspect/pageinspect--1.3.sql b/contrib/pageinspect/pageinspect--1.4.sql
similarity index 88%
rename from contrib/pageinspect/pageinspect--1.3.sql
rename to contrib/pageinspect/pageinspect--1.4.sql
index a99e058..47864f4 100644
--- a/contrib/pageinspect/pageinspect--1.3.sql
+++ b/contrib/pageinspect/pageinspect--1.4.sql
@@ -1,4 +1,4 @@
-/* contrib/pageinspect/pageinspect--1.3.sql */
+/* contrib/pageinspect/pageinspect--1.4.sql */
 
 -- complain if script is sourced in psql, rather than via CREATE EXTENSION
 \echo Use "CREATE EXTENSION pageinspect" to load this file. \quit
@@ -48,12 +48,35 @@ CREATE FUNCTION heap_page_items(IN page bytea,
     OUT t_infomask integer,
     OUT t_hoff smallint,
     OUT t_bits text,
-    OUT t_oid oid)
+    OUT t_oid oid,
+    OUT t_data bytea)
 RETURNS SETOF record
 AS 'MODULE_PATHNAME', 'heap_page_items'
 LANGUAGE C STRICT;
 
 --
+-- tuple_data_split()
+--
+CREATE FUNCTION tuple_data_split(rel_oid oid,
+    t_data bytea,
+    t_infomask integer,
+    t_infomask2 integer,
+    t_bits text)
+RETURNS bytea[]
+AS 'MODULE_PATHNAME','tuple_data_split'
+LANGUAGE C;
+
+CREATE FUNCTION tuple_data_split(rel_oid oid,
+    t_data bytea,
+    t_infomask integer,
+    t_infomask2 integer,
+    t_bits text,
+    do_detoast bool)
+RETURNS bytea[]
+AS 'MODULE_PATHNAME','tuple_data_split'
+LANGUAGE C;
+
+--
 -- bt_metap()
 --
 CREATE FUNCTION bt_metap(IN relname text,
diff --git a/contrib/pageinspect/pageinspect.control b/contrib/pageinspect/pageinspect.control
index a9dab33..68c7d61 100644
--- a/contrib/pageinspect/pageinspect.control
+++ b/contrib/pageinspect/pageinspect.control
@@ -1,5 +1,5 @@
 # pageinspect extension
 comment = 'inspect the contents of database pages at a low level'
-default_version = '1.3'
+default_version = '1.4'
 module_pathname = '$libdir/pageinspect'
 relocatable = true
diff --git a/doc/src/sgml/pageinspect.sgml b/doc/src/sgml/pageinspect.sgml
index b95cc81..5e528c9 100644
--- a/doc/src/sgml/pageinspect.sgml
+++ b/doc/src/sgml/pageinspect.sgml
@@ -93,9 +93,10 @@ test=# SELECT * FROM page_header(get_raw_page('pg_class', 0));
     <listitem>
      <para>
       <function>heap_page_items</function> shows all line pointers on a heap
-      page.  For those line pointers that are in use, tuple headers are also
-      shown. All tuples are shown, whether or not the tuples were visible to
-      an MVCC snapshot at the time the raw page was copied.
+      page.  For those line pointers that are in use, tuple headers as well
+      as tuple raw data are also shown. All tuples are shown, whether or not
+      the tuples were visible to an MVCC snapshot at the time the raw page
+      was copied.
      </para>
      <para>
       A heap page image obtained with <function>get_raw_page</function> should
@@ -112,6 +113,31 @@ test=# SELECT * FROM heap_page_items(get_raw_page('pg_class', 0));
 
    <varlistentry>
     <term>
+     <function>tuple_data_split(rel_oid, t_data bytea, t_infomask integer, t_infomask2 integer, t_bits text [, do_detoast bool]) returs bytea[]</function>
+     <indexterm>
+      <primary>tuple_data_split</primary>
+     </indexterm>
+    </term>
+    <listitem>
+     <para>
+      <function>tuple_data_split</function> splits tuple data into attributes
+      in the same way as backend internals.
+<screen>
+test=# SELECT tuple_data_split('pg_class'::regclass, t_data, t_infomask, t_infomask2, t_bits) FROM heap_page_items(get_raw_page('pg_class', 0));
+</screen>
+      This function should be called with the same arguments as the return
+      attributes of <function>heap_page_items</function>.
+     </para>
+     <para>
+      If <parameter>do_detoast</parameter> is <literal>true</literal>,
+      attribute values that will be detoasted if needed. Default value is
+      <literal>false</literal>.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
+    <term>
      <function>bt_metap(relname text) returns record</function>
      <indexterm>
       <primary>bt_metap</primary>

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect patch, for showing tuple data

Reply via email to