Hi Yoshiharu,
-----Original Message----- From: Yoshiharu Mori [mailto:y-m...@sraoss.co.jp] Sent: 2011. november 25. 14:17 To: The Pacemaker cluster resource manager Cc: Attila Megyeri Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed Hi Attila > A quick snippet from the corosync.log > > Nov 23 05:43:05 psql1 pgsql[2845]: DEBUG: Checking right of master. > Nov 23 05:43:05 psql1 pgsql[2845]: INFO: My data status=. > Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql1 xlog location : > 000000000D000000 Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql2 xlog > location : 0000000008000000 > > As you see, the "my data status" returns an empty string. My log is same. but it works. Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Master is not exist. Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Checking right of master. Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: My data status=. Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm01 xlog location : 0000000005000020 Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm02 xlog location : 0000000005000000 In my log, the following logs are outputted and started after checking xlog location(3 times). Nov 18 19:29:39 osspc24-1 pgsql[18720]: INFO: I have a master right. Please show us more corosync.log. === I can leave it run forever, but will never show "I have a master right". To be honest, I have no idea what should promote the node to master. What is it that the RA checks, and what could be wrong? I just cannot find where the problem is. Right now I am running corosync on node 1 only, as I expect that this way it will have the most recent xlog and start as a master. But it never starts. Here is the output for crm_mon -A : ============ Last updated: Fri Nov 25 13:52:58 2011 Stack: openais Current DC: psql1 - partition WITHOUT quorum Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 4 Resources configured. ============ Online: [ psql1 ] OFFLINE: [ psql2 ] Master/Slave Set: msPostgresql [postgresql] Slaves: [ psql1 ] Stopped: [ postgresql:1 ] Clone Set: clnPingCheck [pingCheck] Started: [ psql1 ] Stopped: [ pingCheck:1 ] Node Attributes: * Node psql1: + default_ping_set : 100 + master-postgresql:0 : -INFINITY + pgsql-status : HS:alone + pgsql-xlog-loc : 0000000012000000 I sent the log directly in private not to overload the list. I did a "resource stop msPostgresql" and "resource start msPostgresql" around 13:52. You will see some extra debug messages starting with "ATT" - I added them to the RA to help my troubleshooting. Thank you for your help, Attila > > > -----Original Message----- > From: Attila Megyeri [mailto:amegy...@minerva-soft.com] > Sent: 2011. november 25. 9:28 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Postgresql streaming replication failover - > RA needed > > Hi Takatoshi, > > I have restored the PSQL to run without corosync so I cannot send you the > crm_mon output now. > > What I can tell for sure: > - RA never promoted any of the nodes, no matter what the status was. It also > did not promote the node, when it was the only one. > - I believe the issue is in the comparison of the xlogs. How could I > troubleshoot that? I see from the logs that crm NEVER tried to invoke pgsql > with "promote" > - I tried previously the crm_mon -A option, but there was never a " > pgsql-data-status" attribute. The other attribs were there, including > the HS:alone > - In the corosync log the only relevant RA message I see is " Master is not > exist. " I never saw a message like "My data is out-of-date" > > Thank you! > > Attila > > > -----Original Message----- > From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] > Sent: 2011. november 25. 8:56 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Postgresql streaming replication failover - > RA needed > > Hi Attila > > 2011/11/24 Attila Megyeri <amegy...@minerva-soft.com>: > > Hi Takatoshi, All, > > > > Thanks for your reply. > > I see that you have invested significant effort in the development of the > > RA. I spent the last day trying to set up the RA, but without much success. > > > > My infrastructure is very similar to yours, except for the fact that > > currently I am testing with a single network adapter. > > > > Replication works nicely when I start the databases manually, not using > > corosync. > > > > When I try to start using corosync,I see that the ping resources start > > normally, but the msPostgresql starts on both nodes in slave mode, and I > > see "HS:alone" > > To see "HS:alone" is normal. > And RA compares xlog locations and promote the postgresql having new data. > > > In the Wiki you state, the if I start on a signle node only, PSQL should > > start in Master mode (PRI), but this is not the case. > > If the data is old, the node can't be master. > To be master needs pgsql-data-status="LATEST" or "STREAMING|SYNC". > Plese check it using "crm_mon -A". > > > > > And to become a master from stopped takes a few minutes because the RA > compares xlog location on monitor. > > > > The recovery.conf file is created immediately, and from the logs I see no > > attempt at all to promote the node. > > In the postgres logs I see that node1, which is supposed to be a master, > > tries to connect to the vip-rep IP address, which is NOT brought up, > > because it depends on the Master role... > > > > Do you have any idea? > > Please check HA log. > My RA outputs "My data is out-of-date. status=********" to log if the data is > old. > > Regards, > Takatoshi MATSUO > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org -- Yoshiharu Mori <y-m...@sraoss.co.jp> SRA OSS, Inc Japan http://www.sraoss.co.jp TEL: 03-5979-2701 FAX: 03-5979-2702 _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org