Hi, During logical decoding, if there is a large write transaction, some spill files will be written to disk, depending on the setting of max_changes_in_memory.
This behavior can effectively avoid OOM, but if the transaction generates a lot of change before commit, a large number of files may fill the disk. For example, you can update a TB-level table. Of course, this is also inevitable. But I found an inelegant phenomenon. If the updated large table is not published, its changes will also be written with a large number of spill files. Look at an example below: publisher: ``` create table tbl_pub(id int, val1 text, val2 text,val3 text); create table tbl_t1(id int, val1 text, val2 text,val3 text); CREATE PUBLICATION mypub FOR TABLE public.tbl_pub; ``` subscriber: ``` create table tbl_pub(id int, val1 text, val2 text,val3 text); create table tbl_t1(id int, val1 text, val2 text,val3 text); CREATE SUBSCRIPTION mysub CONNECTION 'host=127.0.0.1 port=5432 user=postgres dbname=postgres' PUBLICATION mypub; ``` publisher: ``` begin; insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba', i),repeat('dfds', i) from generate_series(0,999999) i; ``` Later you will see a large number of spill files in the "/$PGDATA/pg_replslot/mysub/" directory. ``` $ll -sh total 4.5G 4.0K -rw------- 1 postgres postgres 200 Nov 30 09:24 state 17M -rw------- 1 postgres postgres 17M Nov 30 08:22 xid-750-lsn-0-10000000.spill 12M -rw------- 1 postgres postgres 12M Nov 30 08:20 xid-750-lsn-0-1000000.spill 17M -rw------- 1 postgres postgres 17M Nov 30 08:23 xid-750-lsn-0-11000000.spill ...... ``` We can see that table tbl_t1 is not published in mypub. It is also not sent downstream because it is subscribed. After the transaction is reorganized, the pgoutput decoding plug-in filters out these change of unpublished relation when sending logical changes. see function pgoutput_change. Above all, if after constructing a change and before queuing a change into a transaction, we filter out unpublished relation-related changes, will it make logical decoding less laborious and avoid disk growth as much as possible? This is just an immature idea. I haven't started to implement it yet. Maybe it was designed this way because there are key factors that I didn't consider. So I want to hear everyone's opinions, especially the designers of logic decoding.