On Mon, Nov 19, 2018 at 6:52 AM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > > Hi, > > It seems we have pretty annoying problem with logical decoding when > performing VACUUM FULL / CLUSTER on a table with toast-ed data. > > The trouble is that the rewritten heap is WAL-logged using XLOG/FPI > records, the TOAST data is logged as regular INSERT records. XLOG/FPI is > ignored in logical decoding, and so reorderbuffer never gets those > records. But we do decode the TOAST data, and reorderbuffer stashes them > in toast_hash hash table. Which gets reset only when handling a row from > the main heap, and that never arrives. So we end up stashing all the > TOAST data in memory :-( > > So do VACUUM FULL (or CLUSTER) on a sufficiently large table, and you're > likely to break any logical replication connection. And it does not > matter if you replicate this particular table. > > Luckily enough, this can leverage some of the pieces introduced by > e9edc1ba which was meant to deal with rewrites of system tables, and in > raw_heap_insert it added this: > > /* > * The new relfilenode's relcache entrye doesn't have the necessary > * information to determine whether a relation should emit data for > * logical decoding. Force it to off if necessary. > */ > if (!RelationIsLogicallyLogged(state->rs_old_rel)) > options |= HEAP_INSERT_NO_LOGICAL; > > As raw_heap_insert is used only for heap rewrites, we can simply remove > the if condition and use the HEAP_INSERT_NO_LOGICAL flag for all TOAST > data logged from here. >
This fix seems fine to me. > This does fix the issue, because we still decode the TOAST changes but > there are no data and so > > if (change->data.tp.newtuple != NULL) > { > dlist_delete(&change->node); > ReorderBufferToastAppendChunk(rb, txn, relation, > change); > } > > ends up not stashing the change in the hash table. With the below change you proposed can we remove the above condition because toast-insertion changes being processed by the reorder buffer always have a new tuple? If a toast-insertion record doesn't have a new tuple it has already ignored when decoding. > It's imperfect, > because we still decode the changes (and stash them to disk), but ISTM > that can be fixed by tweaking DecodeInsert a bit to just ignore those > changes entirely. > > Attached is a PoC patch with these two fixes. > I think this change is also fine. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center