Re: Basebackup fails without useful error message
Hi, You can enable verbose to get more detail about what is happening during backup pg_basebackup -h your_host -U your_user -D /mnt/base_backup/dir/ -v Also examine recent postgres logs Verify backup directory permissions On Sun, 29 Sept 2024 at 19:49, Koen De Groote wrote: > Having run a basebackup, I'm getting this output at the very end: > > pg_basebackup: child process exited with error 1 > pg_basebackup: removing contents of data directory "/mnt/base_backup/dir/" > > Is there a way to get more information as to what exactly happened? > > I'd like to look into fixing this or doing whatever is required so that it > doesn't happen again, but this just isn't enough info. Where do I start > looking? > > Regards, > Koen De Groote >
Re: Failing GSSAPI TCP when connecting to server
Ron Johnson writes: > On Sun, Sep 29, 2024 at 2:00 PM Peter wrote: >> My application is trying to connect the database server, and meanwhile >> tries to talk to the KDC server for a service ticket. >> A configuration problem on the machine(s) can be ruled out, > Famous last words. The TCP trace looks like the client side is timing out too quickly in the unsuccessful case. It's not clear to me how the different Discourse version would lead to the Kerberos library applying a different timeout. Still, it seems like most of the moving parts here are outside of Postgres' control --- I don't think that libpq itself has much involvement in the KDC communication. I concur with looking at the Discourse release notes and maybe asking some questions in that community. regards, tom lane
Re: Failing GSSAPI TCP when connecting to server
On Sun, Sep 29, 2024 at 2:00 PM Peter wrote: > My application is trying to connect the database server, and meanwhile > tries to talk to the KDC server for a service ticket. > Earlier these TCP connections did run like this, and were successful: > > [snip] > > A configuration problem on the machine(s) can be ruled out, Famous last words. > because both > old (working) and new (failing) application are installed on the same > machine at the same time, using the same network, same hardware, same > OS, same libgssapi and same postgres client software connecting to the > same database. > > There are no errors logged on the KDC server, it appears to have > orderly processed the requests (at least at first, it starts to > complain later when the stale sockets get too many). > > The error message on the postgres server is > FATAL: GSSAPI authentication failed for user "pmc" > Is there a way to test pmc authentication via some other tool, like psql? > The OS is FreeBSD Release 13.4-p1 > The postgres client library libpq.so.5 is Release 15.7 > The application postgres interface is rubyGem pq Release 1.5.4 > The application is Discourse 2.2.0 (working) and 2.3.1 (failing) > > What is going on there? > If *only *the application changed, then by definition it can't be a database problem. *Something* in the application changed; you just haven't found it. Specifically, I'd read the Discourse 2.3.0 and 2.3.1 release notes. -- Death to , and butter sauce. Don't boil me, I'm still alive. crustacean!
Re: Logical Replication Delay
Hi Justin and Greg, Thank you for your input and recommendations. We understand your point regarding separating the tables into different publications and subscriptions. However, due to certain business constraints, we are unable to implement this approach at the moment. *We are planning to set up logical replication from a standby to another server. When the primary goes down, there is no issue as the standby becomes the primary and the logical slots are already present. However, when the standby goes down, these slots are not copied to the third node or the primary by Patroni. Is there an option available to handle this scenario? * Regards, Ram. On Wed, 25 Sept 2024 at 20:12, Greg Sabino Mullane wrote: > On Sat, Sep 21, 2024 at 3:08 PM Ramakrishna m wrote: > >> I would greatly appreciate any suggestions you may have to help avoid >> logical replication delays, whether through tuning database or operating >> system parameters, or any other recommendations >> > In addition to the things already answered: > > * What is the use case for logical replication? I assume your local > replicas are able to keep up just fine. > > * Check the nature of the work for problems, e.g. ORM doing > unnecessary/redundant updates, maintaining indexes that are not really > needed > > * Looks like your wal_segment_size was boosted to 1GB. What drove that > change? > > * Yes, autovacuum could affect things - make sure > log_autovacuum_min_durations is set > > Cheers, > Greg > > >
Re: Logical Replication Delay
Hi Ram, 29 set 2024, 12:29 Ramakrishna m : *We are planning to set up logical replication from a standby to another > server. When the primary goes down, there is no issue as the standby > becomes the primary and the logical slots are already present. However, > when the standby goes down, these slots are not copied to the third node or > the primary by Patroni. Is there an option available to handle this > scenario? * > You could take a look at the pg_failover_slots extension ( https://www.enterprisedb.com/docs/pg_extensions/pg_failover_slots/), it is aimed exactly at cloning the slot information to a standby. Best, giovanni
Basebackup fails without useful error message
Having run a basebackup, I'm getting this output at the very end: pg_basebackup: child process exited with error 1 pg_basebackup: removing contents of data directory "/mnt/base_backup/dir/" Is there a way to get more information as to what exactly happened? I'd like to look into fixing this or doing whatever is required so that it doesn't happen again, but this just isn't enough info. Where do I start looking? Regards, Koen De Groote
Re: Basebackup fails without useful error message
> What is the complete command you are using? The full command is: pg_basebackup -h localhost -p 5432 -U basebackup_user -D /mnt/base_backup/dir -Ft -z -P So output Format as tar, gzipped, and with progress being printed. > Have you looked at the Postgres log? > Is --verbose being used? This is straight from the logs, it's the only output besides the % progress counter. Will have a look at --verbose. Regards, Koen De Groote On Sun, Sep 29, 2024 at 5:01 PM Adrian Klaver wrote: > On 9/29/24 07:48, Koen De Groote wrote: > > Having run a basebackup, I'm getting this output at the very end: > > > > pg_basebackup: child process exited with error 1 > > pg_basebackup: removing contents of data directory > "/mnt/base_backup/dir/" > > What is the complete command you are using? > > > > > Is there a way to get more information as to what exactly happened? > > Have you looked at the Postgres log? > > Is --verbose being used? > > > > > I'd like to look into fixing this or doing whatever is required so that > > it doesn't happen again, but this just isn't enough info. Where do I > > start looking? > > > > Regards, > > Koen De Groote > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
Re: Basebackup fails without useful error message
On 9/29/24 08:57, Koen De Groote wrote: > What is the complete command you are using? The full command is: pg_basebackup -h localhost -p 5432 -U basebackup_user -D /mnt/base_backup/dir -Ft -z -P So output Format as tar, gzipped, and with progress being printed. > Have you looked at the Postgres log? > Is --verbose being used? This is straight from the logs, it's the only output besides the % progress counter. Will have a look at --verbose. When you report on that and if it does not report the error then what is?: Postgres version. OS and version. Anything special about the cluster like tablespaces, extensions, replication, etc. Regards, Koen De Groote -- Adrian Klaver adrian.kla...@aklaver.com
Re: Basebackup fails without useful error message
On 9/29/24 07:48, Koen De Groote wrote: Having run a basebackup, I'm getting this output at the very end: pg_basebackup: child process exited with error 1 pg_basebackup: removing contents of data directory "/mnt/base_backup/dir/" What is the complete command you are using? Is there a way to get more information as to what exactly happened? Have you looked at the Postgres log? Is --verbose being used? I'd like to look into fixing this or doing whatever is required so that it doesn't happen again, but this just isn't enough info. Where do I start looking? Regards, Koen De Groote -- Adrian Klaver adrian.kla...@aklaver.com
Failing GSSAPI TCP when connecting to server
My application is trying to connect the database server, and meanwhile tries to talk to the KDC server for a service ticket. Earlier these TCP connections did run like this, and were successful: 13:57:53.788797 IP6 clientIPv6.54143 > serverIPv6.88: Flags [S], seq 4189109662, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 402503231 ecr 0], length 0 13:57:53.789053 IP6 serverIPv6.88 > clientIPv6.54143: Flags [S.], seq 1604590824, ack 4189109663, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 1505702993 ecr 402503231], length 0 13:57:53.789122 IP6 clientIPv6.54143 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 402503231 ecr 1505702993], length 0 13:57:53.789223 IP6 clientIPv6.54143 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 402503231 ecr 1505702993], length 4 13:57:53.829575 IP6 serverIPv6.88 > clientIPv6.54143: Flags [.], ack 5, win 1035, options [nop,nop,TS val 1505703037 ecr 402503231], length 0 13:57:53.829696 IP6 clientIPv6.54143 > serverIPv6.88: Flags [P.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 402503271 ecr 1505703037], length 1161 13:57:53.832411 IP6 serverIPv6.88 > clientIPv6.54143: Flags [P.], seq 1:5, ack 1166, win 1035, options [nop,nop,TS val 1505703037 ecr 402503271], length 4 13:57:53.832440 IP6 serverIPv6.88 > clientIPv6.54143: Flags [FP.], seq 5:907, ack 1166, win 1035, options [nop,nop,TS val 1505703037 ecr 402503271], length 902 13:57:53.832461 IP6 clientIPv6.54143 > serverIPv6.88: Flags [.], ack 908, win 1021, options [nop,nop,TS val 402503276 ecr 1505703037], length 0 13:57:53.832620 IP6 clientIPv6.54143 > serverIPv6.88: Flags [F.], seq 1166, ack 908, win 1035, options [nop,nop,TS val 402503276 ecr 1505703037], length 0 13:57:53.832847 IP6 serverIPv6.88 > clientIPv6.54143: Flags [.], ack 1167, win 1035, options [nop,nop,TS val 1505703037 ecr 402503276], length 0 After an upgrade of the application they now look like this: 16:53:09.637708 IP6 clientIPv6.49124 > serverIPv6.88: Flags [S], seq 2860816940, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 2664510889 ecr 0], length 0 16:53:09.637990 IP6 serverIPv6.88 > clientIPv6.49124: Flags [S.], seq 2595272492, ack 2860816941, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 3896716896 ecr 2664510889], length 0 16:53:09.638153 IP6 clientIPv6.49124 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 0 16:53:09.638158 IP6 clientIPv6.49124 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 4 16:53:09.638162 IP6 clientIPv6.49124 > serverIPv6.88: Flags [FP.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 2664510889 ecr 3896716896], length 1161 16:53:09.638451 IP6 serverIPv6.88 > clientIPv6.49124: Flags [.], ack 1167, win 1017, options [nop,nop,TS val 3896716896 ecr 2664510889], length 0 16:53:09.638508 IP clientIPv4.58934 > serverIPv4.88: Flags [S], seq 3474302110, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 726603825 ecr 0], length 0 16:53:09.638696 IP serverIPv4.88 > clientIPv4.58934: Flags [S.], seq 2424341350, ack 3474302111, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 2662337458 ecr 726603825], length 0 16:53:09.638729 IP clientIPv4.58934 > serverIPv4.88: Flags [.], ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 0 16:53:09.638733 IP clientIPv4.58934 > serverIPv4.88: Flags [P.], seq 1:5, ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 4 16:53:09.638751 IP clientIPv4.58934 > serverIPv4.88: Flags [FP.], seq 5:1166, ack 1, win 1027, options [nop,nop,TS val 726603825 ecr 2662337458], length 1161 16:53:09.638856 IP6 clientIPv6.46516 > serverIPv6.88: Flags [S], seq 3686126664, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 1901751531 ecr 0], length 0 16:53:09.639054 IP serverIPv4.88 > clientIPv4.58934: Flags [.], ack 1167, win 1009, options [nop,nop,TS val 2662337458 ecr 726603825], length 0 16:53:09.639071 IP6 serverIPv6.88 > clientIPv6.46516: Flags [S.], seq 3830196306, ack 3686126665, win 65535, options [mss 1440,nop,wscale 6,sackOK,TS val 457425497 ecr 1901751531], length 0 16:53:09.639105 IP6 clientIPv6.46516 > serverIPv6.88: Flags [.], ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 0 16:53:09.639130 IP6 clientIPv6.46516 > serverIPv6.88: Flags [P.], seq 1:5, ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 4 16:53:09.639154 IP6 clientIPv6.46516 > serverIPv6.88: Flags [FP.], seq 5:1166, ack 1, win 1035, options [nop,nop,TS val 1901751531 ecr 457425497], length 1161 This is repeated LOTS of times, and what remains is hundreds of sockets in FIN_WAIT_2 that do not appear to disappear quickly. The error message from postgres client is GSSAPI continuation error: Miscellaneous failure: unable to reach any KDC in realm (In fact in does reach the K