Postgresql went crazy and flooded all the SSD

2023-11-06 Thread Gabriel Dodan
Not sure exactly what happened but Postgresql flooded all the available SSD
space and obviously crashed. It has written a lot of data in the pg_wal
folder. Most likely it was caused by replication. The postgresql instance
that crashed was a subscriber and publisher at the same time. Even so, it
was not a heavy write replication, I mean the data that needed to be
replicated was rarely updated. Has anyone else encountered such an issue?

Best!


Re: Postgresql went crazy and flooded all the SSD

2023-11-06 Thread Andreas Kretschmer



On 6 November 2023 12:11:31 CET, Gabriel Dodan  wrote:
>Not sure exactly what happened but Postgresql flooded all the available SSD
>space and obviously crashed. It has written a lot of data in the pg_wal
>folder. Most likely it was caused by replication. The postgresql instance
>that crashed was a subscriber and publisher at the same time. Even so, it
>was not a heavy write replication, I mean the data that needed to be
>replicated was rarely updated. Has anyone else encountered such an issue?
>
>Best!


Check the logs.

Andreas




Re: Postgresql went crazy and flooded all the SSD

2023-11-06 Thread hubert depesz lubaczewski
On Mon, Nov 06, 2023 at 01:11:31PM +0200, Gabriel Dodan wrote:
> Not sure exactly what happened but Postgresql flooded all the available SSD
> space and obviously crashed. It has written a lot of data in the pg_wal
> folder. Most likely it was caused by replication. The postgresql instance
> that crashed was a subscriber and publisher at the same time. Even so, it
> was not a heavy write replication, I mean the data that needed to be
> replicated was rarely updated. Has anyone else encountered such an issue?

You might want to read 
https://www.depesz.com/2023/06/18/why-is-my-wal-directory-so-large/

Best regards,

depesz





Cluster for an appliance-type deployment

2023-11-06 Thread Matthias Leisi
Dear all,

I’m reasonably experienced with Postgres with simple (single, „rebuild and 
restore“) requirements, but would need some hints on what to look for in a more 
complex situation - deploying Postgres as the backend for a (virtual) 
appliance. 

This appliance can scale horizontally from a single to dozens of VMs 
(theoretically more, but most installations are small-ish). It is feasible to 
configure VMs for particular purposes (eg „you are [also] a DB node“), but 
basically all instances will/should be able to perform their main tasks besides 
(also) being a DB node. As the VMs may be installed in very different 
environments, network-based solutions are less feasible and we would prefer a 
DB-level solution. We assume that for most cases, primary/stand-by 
configurations would be sufficient in terms of availability / latency / 
throughput. 

We must also assume that there is no person who would be able to touch things 
if an error occurs. Data consistency and (as much as possible) automated 
recovery from error situations („VM down“, „network lost“, …) are therefor more 
important than „n nines". We can assume that the VMs can talk to each other 
over TCP (eg using SSH tunnels, direct Postgres connection, or some other 
suitable protocol). Scripting „around“ the database is available to initialize 
instances and for similar tasks.

Would Postgres’ own log-shipping (file-based + streaming replication, possibly 
with remote_write) be sufficient for such a set of requirements?

What aspects would you consider important for such a scenario? 

PS: In case it matters, OpenBSD is the base platform of the VMs.

Thanks for your inputs!

— Matthias






Re: Postgres Out Of Memory Crash

2023-11-06 Thread Merlin Moncure
On Thu, Nov 2, 2023 at 4:13 AM Avi Weinberg  wrote:

> Hi experts,
>
>
>
> I'm using Patroni Postgres installation and noticed that twice already
> postgres crashed due to out of memory.  I'm using logical replication with
> around 30-40 active subscribers on this machine.  The machine has 128GB but
> only 32GB is allocated to Postgres.  How can I know what is actually
> causing the out of memory issue?  Is it caused by not optimal postgres
> configuration or something else?
>
>
>
> /usr/lib/postgresql/13/bin/postgres -D
> /home/postgres/pgdata/pgroot/data
> --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432
> --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on
> --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0
> --max_locks_per_transaction=64 --track_commit_timestamp=on
> --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on
>

Couple things here.   You don't really allocate memory to postgres, rather
you set up memory reserved for certain kinds of buffering operations via
shared buffers, and other less important settings.   This memory is
carefully managed, and is likely not underneath your oom condition.
Ironically, increasing shared buffers can make your problem more likely to
occur as you are taking memory from other tasks.

>  --max_connections=533

Probably your problem is at least partially here.   This number was
configured specifically, implying you are running out of connections and
had to crank this value.

If you have a lot of idle connections sitting around (say, if you have
several application servers managing connection pools),  a sudden spike in
memory load and/or connection utilization can cause this to occur.  Suggest
one or more of:
* lowering shared buffers
* implementing pgbouncer and lowering max_connections
* increasing physical memory

Also,
> I'm using logical replication with around 30-40 active subscribers on
this machine.

Hm.  Have you considered streaming replica setup, so that you can attach
read only processes to the standby?

merlin