Postgresql went crazy and flooded all the SSD
Not sure exactly what happened but Postgresql flooded all the available SSD space and obviously crashed. It has written a lot of data in the pg_wal folder. Most likely it was caused by replication. The postgresql instance that crashed was a subscriber and publisher at the same time. Even so, it was not a heavy write replication, I mean the data that needed to be replicated was rarely updated. Has anyone else encountered such an issue? Best!
Re: Postgresql went crazy and flooded all the SSD
On 6 November 2023 12:11:31 CET, Gabriel Dodan wrote: >Not sure exactly what happened but Postgresql flooded all the available SSD >space and obviously crashed. It has written a lot of data in the pg_wal >folder. Most likely it was caused by replication. The postgresql instance >that crashed was a subscriber and publisher at the same time. Even so, it >was not a heavy write replication, I mean the data that needed to be >replicated was rarely updated. Has anyone else encountered such an issue? > >Best! Check the logs. Andreas
Re: Postgresql went crazy and flooded all the SSD
On Mon, Nov 06, 2023 at 01:11:31PM +0200, Gabriel Dodan wrote: > Not sure exactly what happened but Postgresql flooded all the available SSD > space and obviously crashed. It has written a lot of data in the pg_wal > folder. Most likely it was caused by replication. The postgresql instance > that crashed was a subscriber and publisher at the same time. Even so, it > was not a heavy write replication, I mean the data that needed to be > replicated was rarely updated. Has anyone else encountered such an issue? You might want to read https://www.depesz.com/2023/06/18/why-is-my-wal-directory-so-large/ Best regards, depesz
Cluster for an appliance-type deployment
Dear all, I’m reasonably experienced with Postgres with simple (single, „rebuild and restore“) requirements, but would need some hints on what to look for in a more complex situation - deploying Postgres as the backend for a (virtual) appliance. This appliance can scale horizontally from a single to dozens of VMs (theoretically more, but most installations are small-ish). It is feasible to configure VMs for particular purposes (eg „you are [also] a DB node“), but basically all instances will/should be able to perform their main tasks besides (also) being a DB node. As the VMs may be installed in very different environments, network-based solutions are less feasible and we would prefer a DB-level solution. We assume that for most cases, primary/stand-by configurations would be sufficient in terms of availability / latency / throughput. We must also assume that there is no person who would be able to touch things if an error occurs. Data consistency and (as much as possible) automated recovery from error situations („VM down“, „network lost“, …) are therefor more important than „n nines". We can assume that the VMs can talk to each other over TCP (eg using SSH tunnels, direct Postgres connection, or some other suitable protocol). Scripting „around“ the database is available to initialize instances and for similar tasks. Would Postgres’ own log-shipping (file-based + streaming replication, possibly with remote_write) be sufficient for such a set of requirements? What aspects would you consider important for such a scenario? PS: In case it matters, OpenBSD is the base platform of the VMs. Thanks for your inputs! — Matthias
Re: Postgres Out Of Memory Crash
On Thu, Nov 2, 2023 at 4:13 AM Avi Weinberg wrote: > Hi experts, > > > > I'm using Patroni Postgres installation and noticed that twice already > postgres crashed due to out of memory. I'm using logical replication with > around 30-40 active subscribers on this machine. The machine has 128GB but > only 32GB is allocated to Postgres. How can I know what is actually > causing the out of memory issue? Is it caused by not optimal postgres > configuration or something else? > > > > /usr/lib/postgresql/13/bin/postgres -D > /home/postgres/pgdata/pgroot/data > --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 > --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on > --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 > --max_locks_per_transaction=64 --track_commit_timestamp=on > --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on > Couple things here. You don't really allocate memory to postgres, rather you set up memory reserved for certain kinds of buffering operations via shared buffers, and other less important settings. This memory is carefully managed, and is likely not underneath your oom condition. Ironically, increasing shared buffers can make your problem more likely to occur as you are taking memory from other tasks. > --max_connections=533 Probably your problem is at least partially here. This number was configured specifically, implying you are running out of connections and had to crank this value. If you have a lot of idle connections sitting around (say, if you have several application servers managing connection pools), a sudden spike in memory load and/or connection utilization can cause this to occur. Suggest one or more of: * lowering shared buffers * implementing pgbouncer and lowering max_connections * increasing physical memory Also, > I'm using logical replication with around 30-40 active subscribers on this machine. Hm. Have you considered streaming replica setup, so that you can attach read only processes to the standby? merlin