Hi,

It seems we have pretty annoying problem with logical decoding when
performing VACUUM FULL / CLUSTER on a table with toast-ed data.

The trouble is that the rewritten heap is WAL-logged using XLOG/FPI
records, the TOAST data is logged as regular INSERT records. XLOG/FPI is
ignored in logical decoding, and so reorderbuffer never gets those
records. But we do decode the TOAST data, and reorderbuffer stashes them
in toast_hash hash table. Which gets reset only when handling a row from
the main heap, and that never arrives. So we end up stashing all the
TOAST data in memory :-(

So do VACUUM FULL (or CLUSTER) on a sufficiently large table, and you're
likely to break any logical replication connection. And it does not
matter if you replicate this particular table.

Luckily enough, this can leverage some of the pieces introduced by
e9edc1ba which was meant to deal with rewrites of system tables, and in
raw_heap_insert it added this:

    /*
     * The new relfilenode's relcache entrye doesn't have the necessary
     * information to determine whether a relation should emit data for
     * logical decoding.  Force it to off if necessary.
     */
    if (!RelationIsLogicallyLogged(state->rs_old_rel))
        options |= HEAP_INSERT_NO_LOGICAL;

As raw_heap_insert is used only for heap rewrites, we can simply remove
the if condition and use the HEAP_INSERT_NO_LOGICAL flag for all TOAST
data logged from here.

This does fix the issue, because we still decode the TOAST changes but
there are no data and so

    if (change->data.tp.newtuple != NULL)
    {
        dlist_delete(&change->node);
        ReorderBufferToastAppendChunk(rb, txn, relation,
                                      change);
    }

ends up not stashing the change in the hash table. It's imperfect,
because we still decode the changes (and stash them to disk), but ISTM
that can be fixed by tweaking DecodeInsert a bit to just ignore those
changes entirely.

Attached is a PoC patch with these two fixes.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
index c5db75afa1..ce6f9ed117 100644
--- a/src/backend/access/heap/rewriteheap.c
+++ b/src/backend/access/heap/rewriteheap.c
@@ -658,13 +658,8 @@ raw_heap_insert(RewriteState state, HeapTuple tup)
 		if (!state->rs_use_wal)
 			options |= HEAP_INSERT_SKIP_WAL;
 
-		/*
-		 * The new relfilenode's relcache entrye doesn't have the necessary
-		 * information to determine whether a relation should emit data for
-		 * logical decoding.  Force it to off if necessary.
-		 */
-		if (!RelationIsLogicallyLogged(state->rs_old_rel))
-			options |= HEAP_INSERT_NO_LOGICAL;
+		/* do not decode TOAST data for heap rewrites */
+		options |= HEAP_INSERT_NO_LOGICAL;
 
 		heaptup = toast_insert_or_update(state->rs_new_rel, tup, NULL,
 										 options);
diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c
index afb497227e..f23cb120e8 100644
--- a/src/backend/replication/logical/decode.c
+++ b/src/backend/replication/logical/decode.c
@@ -665,6 +665,9 @@ DecodeAbort(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
 static void
 DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 {
+	Size		datalen;
+	char	   *tupledata;
+	Size		tuplelen;
 	XLogReaderState *r = buf->record;
 	xl_heap_insert *xlrec;
 	ReorderBufferChange *change;
@@ -672,6 +675,10 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	xlrec = (xl_heap_insert *) XLogRecGetData(r);
 
+	/* ignore insert records without new tuples */
+	if (!(xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE))
+		return;
+
 	/* only interested in our database */
 	XLogRecGetBlockTag(r, 0, &target_node, NULL, NULL);
 	if (target_node.dbNode != ctx->slot->data.database)
@@ -690,17 +697,13 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
 
 	memcpy(&change->data.tp.relnode, &target_node, sizeof(RelFileNode));
 
-	if (xlrec->flags & XLH_INSERT_CONTAINS_NEW_TUPLE)
-	{
-		Size		datalen;
-		char	   *tupledata = XLogRecGetBlockData(r, 0, &datalen);
-		Size		tuplelen = datalen - SizeOfHeapHeader;
+	tupledata = XLogRecGetBlockData(r, 0, &datalen);
+	tuplelen = datalen - SizeOfHeapHeader;
 
-		change->data.tp.newtuple =
-			ReorderBufferGetTupleBuf(ctx->reorder, tuplelen);
+	change->data.tp.newtuple =
+		ReorderBufferGetTupleBuf(ctx->reorder, tuplelen);
 
-		DecodeXLogTuple(tupledata, datalen, change->data.tp.newtuple);
-	}
+	DecodeXLogTuple(tupledata, datalen, change->data.tp.newtuple);
 
 	change->data.tp.clear_toast_afterwards = true;
 

Reply via email to