Re: Logical replication, need to reclaim big disk space

Achilleas Mantzios Fri, 16 May 2025 12:34:04 -0700

On 16/5/25 18:45, Moreno Andreo wrote:

Hi,
we are moving our old binary data approach, moving them from byteafields in a table to external storage (making database smaller andrelated operations faster and smarter).In short, we have a job that runs in background and copies data fromthe table to an external file and then sets the bytea field to NULL.
(UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = <uuid>)
This results, at the end of the operations, to a table that's lessthan one tenth in size.We have a multi-tenant architecture (100s of schemas with identicalarchitecture, all inheriting from public) and we are performing thetask on one table per schema.

So? toasted data are kept on separate TOAST tables, unless those byteacols are selected, you won't even touch them. I cannot understand whatyou are trying to achieve here.

Years ago, when I made the mistake to go for a coffee and let mydevelopers "improvise" , the result was a design similar to what you aretrying to achieve. Years after, I am seriously considering moving thosedata back to PostgreSQL.

The problem is: this is generating BIG table bloat, as you may imagine.
Running a VACUUM FULL on an ex-22GB table on a standalone test serveris almost immediate.If I had only one server, I'll process a table a time, with a nightlyscript, and issue a VACUUM FULL to tables that have already beenprocessed.
But I'm in a logical replication architecture (we are using amultimaster system called pgEdge, but I don't think it will make bigdifference, since it's based on logical replication), and I'm buildinga test cluster.

So you use PgEdge , but you wanna lose all the benefits of multi-master, since your binary data won't be replicated ...

I've been instructed to issue VACUUM FULL on both nodes, nightly, butbefore proceeding I read on docs that VACUUM FULL can disrupt logicalreplication, so I'm a bit concerned on how to proceed. Rows arecleared one a time (one transaction, one row, to keep errors to therecord that issued them)

PgEdge is based on the old pg_logical, the old 2ndQuadrant extension,not the native logical replication we have since pgsql 10. But I mightbe mistaken.

I read about extensions like pg_squeeze, but I wonder if they arestill not dangerous for replication.

What's pgEdge take on that, I mean the bytea thing you are trying toachieve here.

Thanks for your help.
Moreno.-

Re: Logical replication, need to reclaim big disk space

Reply via email to