date:20250520

Re: Logical replication, need to reclaim big disk space

2025-05-20 Thread Moreno Andreo

On 19/05/25 20:49, Achilleas Mantzios wrote:

On 19/5/25 17:38, Moreno Andreo wrote:

On 19/05/25 14:41, Achilleas Mantzios wrote:

On 5/19/25 09:14, Moreno Andreo wrote:

On 16/05/25 21:33, Achilleas Mantzios wrote:

On 16/5/25 18:45, Moreno Andreo wrote:

Hi,
we are moving our old binary data approach, moving them from
bytea fields in a table to external storage (making database
smaller and related operations faster and smarter).
In short, we have a job that runs in background and copies data
from the table to an external file and then sets the bytea field
to NULL.

(UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = )

This results, at the end of the operations, to a table that's
less than one tenth in size.
We have a multi-tenant architecture (100s of schemas with
identical architecture, all inheriting from public) and we are
performing the task on one table per schema.

So? toasted data are kept on separate TOAST tables, unless those
bytea cols are selected, you won't even touch them. I cannot
understand what you are trying to achieve here.

Years ago, when I made the mistake to go for a coffee and let my
developers "improvise" , the result was a design similar to what
you are trying to achieve. Years after, I am seriously considering
moving those data back to PostgreSQL.
The "related operations" I was talking about are backups and
database maintenance when needed, cluster/replica management, etc.
With a smaller database size they would be easier in timing and
effort, right?
Ok, but you'll lose replica functionality for those blobs, which
means you don't care about them, correct me if I am wrong.
I'm not saying I don't care about them, the opposite, they are
protected with Object Versioning and soft deletion, this should
assure a good protection against e.g. ransomware, if someone manages
to get in there (and if this happens, we'll have bigger troubles than
this).
PostgreSQL has become very popular because of ppl who care about their
data.
Yeah, it's always been famous for its robustness, and that's why I chose
PostgreSQL more than 10 years ago, and, in spite of how a "normal" user
treats his PC, we never had corruption (only where FS/disk were failing,
but that's not PG fault)
We are mostly talking about costs, here. To give things their
names, I'm moving bytea contents (85% of total data) to files into
Google Cloud Storage buckets, that has a fraction of the cost of
the disks holding my database (on GCE, to be clear ).

May I ask the size of the bytea data (uncompressed) ?.
single records vary from 150k to 80 MB, the grand total is more than
8,5 TB in a circa 10 TB data footprint
This data is not accessed frequently (just by the owner when he
needs to do it), so no need to keep it on expensive hardware.
I've already read in these years that keeping many big bytea fields
in databases is not recommended, but might have misunderstood this.

Ok, I assume those are unimportant data, but let me ask, what is the
longevity or expected legitimacy of those ? I haven't worked with
those just reading :

https://cloud.google.com/storage/pricing?_gl=1*1b25r8o*_up*MQ..&gclid=CjwKCAjwravBBhBjEiwAIr30VKfaOJytxmk7J29vjG4rBBkk2EUimPU5zPibST73nm3XRL2h0O9SxRoCaogQAvD_BwE&gclsrc=aw.ds#storage-pricing

would you choose e.g. "*Anywhere Cache storage" ?
*

Absolutely not, this is *not* unimportant data, and we are using
Standard Storage, for 0,02$/GB/month + operations, that compared to a
0.17$/GB/month of an SSD or even more for the Hyperdisks we are
using, is a good price drop.

How about hosting your data in your own storage and spend 0$/GB/month ?
If we could host on our own hardware I'd not be here talking. Maybe we
would have a 10-node full-mesh multimaster architecture with barman
backup on 2 separate SANs.
But we are a small company that has to balance performance, consistency,
security and, last but not latter, costs. And margins are tightening.

Another way would have been to move these tables to a different
tablespace, in cheaper storage, but it still would have been 3
times the buckets cost.

can you actually mount those Cloud Storage Buckets under a supported
FS in linux and just move them to tablespaces backed by this storage ?

Never tried, I mounted this via FUSE and had some simple operations
in the past, but not sure it can handle database operations in terms
of I/O bandwidth

Why are you considering to get data back to database tables?
Because now if we need to migrate from cloud to on-premise, or just
upgrade or move the specific server which holds those data I will
have an extra headache. Also this is a single point of failure, or
best case a cause for fragmented technology introduced just for the
sake of keeping things out of the DB.
This is managed as an hierarchical disk structure, so the calling
server may be literally everywhere, it just needs an account (or a
service account) to get in there ,
and you are locke

Re: Logical replication, need to reclaim big disk space

2025-05-20 Thread Achilleas Mantzios

Στις 20/5/25 12:17, ο/η Moreno Andreo έγραψε:

On 19/05/25 20:49, Achilleas Mantzios wrote:

On 19/5/25 17:38, Moreno Andreo wrote:

On 19/05/25 14:41, Achilleas Mantzios wrote:

On 5/19/25 09:14, Moreno Andreo wrote:

On 16/05/25 21:33, Achilleas Mantzios wrote:

On 16/5/25 18:45, Moreno Andreo wrote:

So? toasted data are kept on separate TOAST tables, unless those
bytea cols are selected, you won't even touch them. I cannot
understand what you are trying to achieve here.

Years ago, when I made the mistake to go for a coffee and let my
developers "improvise" , the result was a design similar to what
you are trying to achieve. Years after, I am seriously
considering moving those data back to PostgreSQL.
The "related operations" I was talking about are backups and
database maintenance when needed, cluster/replica management, etc.
With a smaller database size they would be easier in timing and
effort, right?
Ok, but you'll lose replica functionality for those blobs, which
means you don't care about them, correct me if I am wrong.
I'm not saying I don't care about them, the opposite, they are
protected with Object Versioning and soft deletion, this should
assure a good protection against e.g. ransomware, if someone manages
to get in there (and if this happens, we'll have bigger troubles
than this).
PostgreSQL has become very popular because of ppl who care about
their data.
Yeah, it's always been famous for its robustness, and that's why I
chose PostgreSQL more than 10 years ago, and, in spite of how a
"normal" user treats his PC, we never had corruption (only where
FS/disk were failing, but that's not PG fault)
We are mostly talking about costs, here. To give things their
names, I'm moving bytea contents (85% of total data) to files into
Google Cloud Storage buckets, that has a fraction of the cost of
the disks holding my database (on GCE, to be clear ).

May I ask the size of the bytea data (uncompressed) ?.
single records vary from 150k to 80 MB, the grand total is more than
8,5 TB in a circa 10 TB data footprint
This data is not accessed frequently (just by the owner when he
needs to do it), so no need to keep it on expensive hardware.
I've already read in these years that keeping many big bytea
fields in databases is not recommended, but might have
misunderstood this.

Ok, I assume those are unimportant data, but let me ask, what is
the longevity or expected legitimacy of those ? I haven't worked
with those just reading :

https://cloud.google.com/storage/pricing?_gl=1*1b25r8o*_up*MQ..&gclid=CjwKCAjwravBBhBjEiwAIr30VKfaOJytxmk7J29vjG4rBBkk2EUimPU5zPibST73nm3XRL2h0O9SxRoCaogQAvD_BwE&gclsrc=aw.ds#storage-pricing

would you choose e.g. "*Anywhere Cache storage" ?
*

Absolutely not, this is *not* unimportant data, and we are using
Standard Storage, for 0,02$/GB/month + operations, that compared to
a 0.17$/GB/month of an SSD or even more for the Hyperdisks we are
using, is a good price drop.

How about hosting your data in your own storage and spend 0$/GB/month ?
If we could host on our own hardware I'd not be here talking. Maybe we
would have a 10-node full-mesh multimaster architecture with barman
backup on 2 separate SANs.
But we are a small company that has to balance performance,
consistency, security and, last but not latter, costs. And margins are
tightening.

Another way would have been to move these tables to a different
tablespace, in cheaper storage, but it still would have been 3
times the buckets cost.

can you actually mount those Cloud Storage Buckets under a
supported FS in linux and just move them to tablespaces backed by
this storage ?

Never tried, I mounted this via FUSE and had some simple operations
in the past, but not sure it can handle database operations in terms
of I/O bandwidth

Do stuck replication slots prevent autovacuum of running entirely?

2025-05-20 Thread Marcelo Fernandes

Hi there,

I am trying to understand if a stuck replication slot would be sufficient to
stop an autovacuum of even starting.

Couldn't the autovacuum process start, but fail to remove dead tuples that
are still necessary by the replication slot? Why would it prevent autovacuum
of even starting instead?

Thanks.

Re: Logical replication, need to reclaim big disk space

2025-05-20 Thread Moreno Andreo

On 20/05/25 12:58, Achilleas Mantzios wrote:

Στις 20/5/25 12:17, ο/η Moreno Andreo έγραψε:

On 19/05/25 20:49, Achilleas Mantzios wrote:

On 19/5/25 17:38, Moreno Andreo wrote:

On 19/05/25 14:41, Achilleas Mantzios wrote:

On 5/19/25 09:14, Moreno Andreo wrote:

On 16/05/25 21:33, Achilleas Mantzios wrote:

On 16/5/25 18:45, Moreno Andreo wrote:

Hi,
we are moving our old binary data approach, moving them
from bytea fields in a table to external storage (making
database smaller and related operations faster and smarter).
In short, we have a job that runs in background and copies data
from the table to an external file and then sets the bytea
field to NULL.
(UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id =
)

So? toasted data are kept on separate TOAST tables, unless those
bytea cols are selected, you won't even touch them. I cannot
understand what you are trying to achieve here.

Years ago, when I made the mistake to go for a coffee and let my
developers "improvise" , the result was a design similar to what
you are trying to achieve. Years after, I am seriously
considering moving those data back to PostgreSQL.
The "related operations" I was talking about are backups and
database maintenance when needed, cluster/replica management,
etc. With a smaller database size they would be easier in timing
and effort, right?
Ok, but you'll lose replica functionality for those blobs, which
means you don't care about them, correct me if I am wrong.
I'm not saying I don't care about them, the opposite, they are
protected with Object Versioning and soft deletion, this should
assure a good protection against e.g. ransomware, if someone
manages to get in there (and if this happens, we'll have bigger
troubles than this).
PostgreSQL has become very popular because of ppl who care about
their data.
Yeah, it's always been famous for its robustness, and that's why I
chose PostgreSQL more than 10 years ago, and, in spite of how a
"normal" user treats his PC, we never had corruption (only where
FS/disk were failing, but that's not PG fault)
We are mostly talking about costs, here. To give things their
names, I'm moving bytea contents (85% of total data) to files
into Google Cloud Storage buckets, that has a fraction of the
cost of the disks holding my database (on GCE, to be clear ).

May I ask the size of the bytea data (uncompressed) ?.
single records vary from 150k to 80 MB, the grand total is more
than 8,5 TB in a circa 10 TB data footprint
This data is not accessed frequently (just by the owner when he
needs to do it), so no need to keep it on expensive hardware.
I've already read in these years that keeping many big bytea
fields in databases is not recommended, but might have
misunderstood this.

Ok, I assume those are unimportant data, but let me ask, what is
the longevity or expected legitimacy of those ? I haven't worked
with those just reading :

https://cloud.google.com/storage/pricing?_gl=1*1b25r8o*_up*MQ..&gclid=CjwKCAjwravBBhBjEiwAIr30VKfaOJytxmk7J29vjG4rBBkk2EUimPU5zPibST73nm3XRL2h0O9SxRoCaogQAvD_BwE&gclsrc=aw.ds#storage-pricing

would you choose e.g. "*Anywhere Cache storage" ?
*

Absolutely not, this is *not* unimportant data, and we are using
Standard Storage, for 0,02$/GB/month + operations, that compared to
a 0.17$/GB/month of an SSD or even more for the Hyperdisks we are
using, is a good price drop.

How about hosting your data in your own storage and spend 0$/GB/month ?
If we could host on our own hardware I'd not be here talking. Maybe
we would have a 10-node full-mesh multimaster architecture with
barman backup on 2 separate SANs.
But we are a small company that has to balance performance,
consistency, security and, last but not latter, costs. And margins
are tightening.

Another way would have been to move these tables to a different
tablespace, in cheaper storage, but it still would have been 3
times the buckets cost.

can you actually mount those Cloud Storage Buckets under a
supported FS in linux and just move them to tablespaces backed by
this storage ?

Never tried, I mounted this via FUSE and had some simple operations
in the past, but not sure it can handle database operations in
terms of I/O bandwidth

Why are you considering to get data back to database tables?
Because now if we need to migrate from cloud to on-premise, or
just upgrade or move the specific server which holds those data I
will have an extra headache. Also this is a single point of
failure, or best case a cause for fragmented technology introduced
just for the sake of keeping things out of the DB.
This is managed as an hierarchical disk structure, so the calling
server may be liter

Re: Do stuck replication slots prevent autovacuum of running entirely?

2025-05-20 Thread Laurenz Albe

On Wed, 2025-05-21 at 16:34 +1200, Marcelo Fernandes wrote:
> I am trying to understand if a stuck replication slot would be sufficient to
> stop an autovacuum of even starting.
> 
> Couldn't the autovacuum process start, but fail to remove dead tuples that
> are still necessary by the replication slot? Why would it prevent autovacuum
> of even starting instead?

I cannot think of a reason why an abandoned replication slot (not sure what
you mean with "stuck") would keep an autovacuum worker from starting.

Typical reasons why autovacuum doesn't start running on a table are:

- there are already "autovacuum_max_workers" worker processes running

- the thresholds for dead or inserted tuples have not been crossed

- the statistics collector has a problem and doesn't gather statistics;
  this applies mostly to v14 and older, see
  https://www.cybertec-postgresql.com/en/stale-statistics-cause-table-bloat/

- the parameter "track_activities" was disabled, so that PostgreSQL doesn't
  collect statistics

- the functions "pg_stat_reset" or "pg_stat_reset_single_table_counters"
  are called repeatedly and wipe out table statistics

Yours,
Laurenz Albe

Re: Logical replication, need to reclaim big disk space

Re: Logical replication, need to reclaim big disk space

Do stuck replication slots prevent autovacuum of running entirely?

Re: Logical replication, need to reclaim big disk space

Re: Do stuck replication slots prevent autovacuum of running entirely?

5 matches

Site Navigation

Mail list logo

Footer information