Re: pg_dump/pg_restore --jobs practical limit?
Hi! When I really want to use all the resources - I set the number of jobs to a value equal to the number of CPU plus 1. Probably there is no reason to make run more jobs than number of CPU :-) . Unfortunately, pg_dump will not allocate more than one thread to a table, even a huge one (unless it is partitioned) - so, sometimes it is no sense to define many jobs when you have one or two big tables and the rest of them are relatively small - in such situation there will be no difference if you define 4 or 10 jobs. But, yes testing is the best way to get known :-). Regards Tomek czw., 2 lis 2023 o 02:20 Ron napisał(a): > On 11/1/23 20:05, Brad White wrote: > > > > -- > *From:* Ron > *Sent:* Thursday, November 2, 2023 3:01:47 AM > *To:* pgsql-general@lists.postgresql.org > > *Subject:* Re: pg_dump/pg_restore --jobs practical limit? > > On 11/1/23 15:42, Laurenz Albe wrote: > > On Wed, 2023-11-01 at 13:09 -0500, Ron wrote: > > I will "soon" migrate some 1+ TB database from Pg 9.6.24 on RHEL 6 VMs to Pg > 14.latest on RHEL 8 VMs. The VMs have 10Gbps "NICs", SAN-based LUNs managed > by LVM, and are all on ESX blades. nproc count on some is 16 and on others > is 32. > > Does anyone have experience as to the point of diminishing returns? > > IOW, can I crank them processes up to --jobs=30, will I see no gain -- or > even degradation -- after, for example, --jobs=24? > > This would be for both pg_dump and pg_restore (which would be run on the > RHEL 8 VM). > > Test, test, test. Theoretical considerations are pretty worthless, > > > Which is why I asked if anyone has experience. > > and it is easy to measure that. > > > Not necessarily. Our test systems are way too small (only good enough to > validate that the script works correctly), and there's always something > (sometimes a lot, sometime just "some") going on in production, whether > it's my customer's work, or the SAN (like snapshotting *every* VM and > then copying the snapshots to the virtual tape device) or something else. > > Sure, but are the new systems busy already? > Ideally you could run tests on them before they are put into production. > > > Testing pg_restore with different --jobs= values will be easier. pg_dump > is what's going to be reading from a constantly varying system. > > -- > Born in Arizona, moved to Babylonia. >
Postgres Out Of Memory Crash
Hi experts, I'm using Patroni Postgres installation and noticed that twice already postgres crashed due to out of memory. I'm using logical replication with around 30-40 active subscribers on this machine. The machine has 128GB but only 32GB is allocated to Postgres. How can I know what is actually causing the out of memory issue? Is it caused by not optimal postgres configuration or something else? /usr/lib/postgresql/13/bin/postgres -D /home/postgres/pgdata/pgroot/data --config-file=/home/postgres/pgdata/pgroot/data/postgresql.conf --port=5432 --cluster_name=postgres-cluster --wal_level=logical --hot_standby=on --max_connections=533 --max_wal_senders=90 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=on --max_replication_slots=90 --max_worker_processes=30 --wal_log_hints=on Oct 27 07:05:31 node2 kernel: postgres invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=993 Oct 27 07:05:31 node2 kernel: postgres cpuset=docker-6ae67e04710619972d3b1ab5d4c69c318d001c2da47fecee121cdc60279a14a0.scope mems_allowed=0 Oct 27 07:05:31 node2 kernel: CPU: 6 PID: 15536 Comm: postgres Kdump: loaded Tainted: G T 3.10.0-1160.el7.x86_64 #1 Oct 27 07:05:31 node2 kernel: Hardware name: Kontron MSP8040/4008, BIOS Core: 5.11, MSP804x: 1.57.0943FC77 05/06/2020 Oct 27 07:05:31 node2 kernel: Call Trace: Oct 27 07:05:31 node2 kernel: [] dump_stack+0x19/0x1b Oct 27 07:05:31 node2 kernel: [] dump_header+0x90/0x229 Oct 27 07:05:31 node2 kernel: [] ? ep_poll_callback+0xf8/0x220 Oct 27 07:05:31 node2 kernel: [] ? find_lock_task_mm+0x56/0xc0 Oct 27 07:05:31 node2 kernel: [] ? try_get_mem_cgroup_from_mm+0x28/0x60 Oct 27 07:05:31 node2 kernel: [] oom_kill_process+0x2cd/0x490 Oct 27 07:05:31 node2 kernel: [] mem_cgroup_oom_synchronize+0x55c/0x590 Oct 27 07:05:31 node2 kernel: [] ? mem_cgroup_charge_common+0xc0/0xc0 Oct 27 07:05:31 node2 kernel: [] pagefault_out_of_memory+0x14/0x90 Oct 27 07:05:31 node2 kernel: [] mm_fault_error+0x6a/0x157 Oct 27 07:05:31 node2 kernel: [] __do_page_fault+0x491/0x500 Oct 27 07:05:31 node2 kernel: [] do_page_fault+0x35/0x90 Oct 27 07:05:31 node2 kernel: [] page_fault+0x28/0x30 Oct 27 07:05:31 node2 kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-6ae67e0471061997$ Oct 27 07:05:31 node2 kernel: memory: usage 32768000kB, limit 32768000kB, failcnt 144867 Oct 27 07:05:31 node2 kernel: memory+swap: usage 32768000kB, limit 9007199254740988kB, failcnt 0 Oct 27 07:05:31 node2 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice: cache:$ Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$ Oct 27 07:05:31 node2 kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod5984f099_30b0_4506_8730_6c72e7c02b78.slice/docker-$ Oct 27 07:05:31 node2 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Oct 27 07:05:31 node2 kernel: [13159] 0 13159 2391 3 0 -998 pause Oct 27 07:05:31 node2 kernel: [13322] 0 13322 1095 97 8 0 993 dumb-init Oct 27 07:05:31 node2 kernel: [13335] 0 13335 1156 171 8 0 993 sh Oct 27 07:05:31 node2 kernel: [13411] 0 13411 1137 98 8 0 993 runsvdir Oct 27 07:05:31 node2 kernel: [13438] 0 13438 1099 98 7 0 993 runsv Oct 27 07:05:31 node2 kernel: [13439] 0 13439 1099 98 7 0 993 runsv Oct 27 07:05:31 node2 kernel: [13440] 101 1344027026 1186 54 0 993 pgqd Oct 27 07:05:31 node2 kernel: [13441] 101 13441 155215 8237 101 0 993 patroni Oct 27 07:05:31 node2 kernel: [19532] 101 19532 174003346817 171 0 993 postgres Oct 27 07:05:31 node2 kernel: [19542] 101 19542 1767874 6713 121 0 993 postgres Oct 27 07:05:31 node2 kernel: [19546] 101 19546 1740173 14450313166 0 993 postgres Oct 27 07:05:31 node2 kernel: [19547] 101 19547 174006920060 171 0 993 postgres Oct 27 07:05:31 node2 kernel: [19548] 101 19548 1740027 4821 86 0 993 postgres Oct 27 07:05:31 node2 kernel: [19549] 101 19549 1740283 1011 91 0 993 postgres Oct 27 07:05:31 node2 kernel: [19549] 101 19549 174028
Re: pg_checksums?
Hi Nikolay, > On Nov 2, 2023, at 07:36, Nikolay Samokhvalov wrote: > There is also a good trick described in > https://www.crunchydata.com/blog/fun-with-pg_checksums to avoid > accidental start of Postgres: [...] > Additionally, I compiled some thoughts about running pg_checksums > without downtime (Patroni-friendly, of course) here: > https://twitter.com/samokhvalov/status/1719961485160689993. These two links are very interesting. Thanks very much. Cheers Paul
Re: Postgres Out Of Memory Crash
On Thu, 2023-11-02 at 09:12 +, Avi Weinberg wrote: > I'm using Patroni Postgres installation and noticed that twice already > postgres > crashed due to out of memory. I'm using logical replication with around 30-40 > active subscribers on this machine. The machine has 128GB but only 32GB is > allocated > to Postgres. How can I know what is actually causing the out of memory issue? > Is it caused by not optimal postgres configuration or something else? You should look into the PostgreSQL log. That should show a message like LOG: server process (PID 16024) was terminated by signal 9: Killed DETAIL: Failed process was running: SELECT ... It is not certain, but often that statement is the one that used up all that memory. At least it is a starting point for your investigation. Yours, Laurenz Albe