Re: Logical replication, need to reclaim big disk space

2025-05-16 Thread Achilleas Mantzios

On 16/5/25 18:45, Moreno Andreo wrote:


Hi,
    we are moving our old binary data approach, moving them from bytea 
fields in a table to external storage (making database smaller and 
related operations faster and smarter).
In short, we have a job that runs in background and copies data from 
the table to an external file and then sets the bytea field to NULL.

(UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = )

This results, at the end of the operations, to a table that's less 
than one tenth in size.
We have a multi-tenant architecture (100s of schemas with identical 
architecture, all inheriting from public) and we are performing the 
task on one table per schema.


So? toasted data are kept on separate TOAST tables, unless those bytea 
cols are selected, you won't even touch them. I cannot understand what 
you are trying to achieve here.


Years ago, when I made the mistake to go for a coffee and let my 
developers "improvise" , the result was a design similar to what you are 
trying to achieve. Years after, I am seriously considering moving those 
data back to PostgreSQL.




The problem is: this is generating BIG table bloat, as you may imagine.
Running a VACUUM FULL on an ex-22GB table on a standalone test server 
is almost immediate.
If I had only one server, I'll process a table a time, with a nightly 
script, and issue a VACUUM FULL to tables that have already been 
processed.


But I'm in a logical replication architecture (we are using a 
multimaster system called pgEdge, but I don't think it will make big 
difference, since it's based on logical replication), and I'm building 
a test cluster.


So you use PgEdge , but you wanna lose all the benefits of multi-master 
, since your binary data won't be replicated ...
I've been instructed to issue VACUUM FULL on both nodes, nightly, but 
before proceeding I read on docs that VACUUM FULL can disrupt logical 
replication, so I'm a bit concerned on how to proceed. Rows are 
cleared one a time (one transaction, one row, to keep errors to the 
record that issued them)


PgEdge is based on the old pg_logical, the old 2ndQuadrant extension, 
not the native logical replication we have since pgsql 10. But I might 
be mistaken.
I read about extensions like pg_squeeze, but I wonder if they are 
still not dangerous for replication.


What's pgEdge take on that, I mean the bytea thing you are trying to 
achieve here.

Thanks for your help.
Moreno.-








Logical replication, need to reclaim big disk space

2025-05-16 Thread Moreno Andreo

Hi,
    we are moving our old binary data approach, moving them from bytea 
fields in a table to external storage (making database smaller and 
related operations faster and smarter).
In short, we have a job that runs in background and copies data from the 
table to an external file and then sets the bytea field to NULL.

(UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = )

This results, at the end of the operations, to a table that's less than 
one tenth in size.
We have a multi-tenant architecture (100s of schemas with identical 
architecture, all inheriting from public) and we are performing the task 
on one table per schema.


The problem is: this is generating BIG table bloat, as you may imagine.
Running a VACUUM FULL on an ex-22GB table on a standalone test server is 
almost immediate.
If I had only one server, I'll process a table a time, with a nightly 
script, and issue a VACUUM FULL to tables that have already been processed.


But I'm in a logical replication architecture (we are using a 
multimaster system called pgEdge, but I don't think it will make big 
difference, since it's based on logical replication), and I'm building a 
test cluster.


I've been instructed to issue VACUUM FULL on both nodes, nightly, but 
before proceeding I read on docs that VACUUM FULL can disrupt logical 
replication, so I'm a bit concerned on how to proceed. Rows are cleared 
one a time (one transaction, one row, to keep errors to the record that 
issued them)


I read about extensions like pg_squeeze, but I wonder if they are still 
not dangerous for replication.


Thanks for your help.
Moreno.-