Hi Attila 1. Are there /var/lib/pgsql/xlog_note.0 , xlog_note.1, xlog_note.2 .... files? These files are created while checking a xlog location on monitor.
2. Do these files include lines as below? --------- pgsql1 0000000019000000 pgsql2 0000000019000000 --------- Regards. Takatoshi MATSUO 2011年11月26日22:44 Attila Megyeri <amegy...@minerva-soft.com>: > Hi Yoshiharu, Takatoshi, > > Spent another day, without success. :( > > I started from scratch and synchronous replications works nicely when nodes > are started outside pacemaker. > My PostgreSQL version is 9.1.1. > > When I start from pacemaker, after a while it gets into the following state: > > Online: [ psql1 psql2 ] > > Master/Slave Set: msPostgresql [postgresql] > Slaves: [ psql1 psql2 ] > Clone Set: clnPingCheck [pingCheck] > Started: [ psql1 psql2 ] > > Node Attributes: > * Node psql1: > + default_ping_set : 100 > + master-postgresql:0 : -INFINITY > + pgsql-status : HS:alone > + pgsql-xlog-loc : 0000000019000000 > * Node psql2: > + default_ping_set : 100 > + master-postgresql:1 : -INFINITY > + pgsql-status : HS:alone > + pgsql-xlog-loc : 0000000019000000 > > > The psql status queries return the following: > > PSQL1 > ====== > postgres@psql1:/root$ psql -c "select > application_name,upper(state),upper(sync_state) from pg_stat_replication" > application_name | upper | upper > ------------------+-------+------- > (0 rows) > > postgres@psql1:/root$ psql -Atc "select > pg_last_xlog_replay_location(),pg_last_xlog_receive_location()" > 0/19000020|0/19000000 > > PSQL2 > ====== > postgres@psql2:~$ psql -c "select > application_name,upper(state),upper(sync_state) from pg_stat_replication" > application_name | upper | upper > ------------------+-------+------- > (0 rows) > > postgres@psql2:~$ psql -Atc "select > pg_last_xlog_replay_location(),pg_last_xlog_receive_location()" > 0/19000000|0/19000000 > > > Neither server can connect (obviously) to the master, as the vip_repl Is not > brought up. > > > Could you help me understand WHAT is the action/state/event that sould > promote one of the nodes? I see that pacemaker monitors the servers every X > seconds, but nothing else happens. > > In the log (limited to pgsql) the following sequence is repeated forewer > > Nov 26 13:36:19 psql1 pgsql[19829]: INFO: Master is not exist. > Nov 26 13:36:19 psql1 pgsql[19829]: DEBUG: Checking right of master. > Nov 26 13:36:19 psql1 pgsql[19829]: INFO: My data status=. > Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql1 xlog location : > 0000000019000000 > Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql2 xlog location : > 0000000019000000 > Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: PostgreSQL is running as a hot > standby. > Nov 26 13:36:26 psql1 pgsql[19993]: INFO: Master is not exist. > Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: Checking right of master. > Nov 26 13:36:26 psql1 pgsql[19993]: INFO: My data status=. > Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql1 xlog location : > 0000000019000000 > Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql2 xlog location : > 0000000019000000 > Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: PostgreSQL is running as a hot > standby. > Nov 26 13:36:33 psql1 pgsql[20176]: INFO: Master is not exist. > Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: Checking right of master. > Nov 26 13:36:33 psql1 pgsql[20176]: INFO: My data status=. > Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql1 xlog location : > 0000000019000000 > Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql2 xlog location : > 0000000019000000 > Nov 26 13:36:41 psql1 pgsql[20343]: DEBUG: PostgreSQL is running as a hot > standby. > > > Any help is appreciated! > > Regards, > Attila > > > > > -----Original Message----- > From: Yoshiharu Mori [mailto:y-m...@sraoss.co.jp] > Sent: 2011. november 25. 14:17 > To: The Pacemaker cluster resource manager > Cc: Attila Megyeri > Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed > > Hi Attila > >> A quick snippet from the corosync.log >> >> Nov 23 05:43:05 psql1 pgsql[2845]: DEBUG: Checking right of master. >> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: My data status=. >> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql1 xlog location : >> 000000000D000000 Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql2 xlog >> location : 0000000008000000 >> >> As you see, the "my data status" returns an empty string. > > My log is same. but it works. > > Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Master is not exist. > Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Checking right of master. > Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: My data status=. > Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm01 xlog location : > 0000000005000020 Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm02 xlog > location : 0000000005000000 > > In my log, the following logs are outputted and started after checking xlog > location(3 times). > > Nov 18 19:29:39 osspc24-1 pgsql[18720]: INFO: I have a master right. > > Please show us more corosync.log. > > >> >> >> -----Original Message----- >> From: Attila Megyeri [mailto:amegy...@minerva-soft.com] >> Sent: 2011. november 25. 9:28 >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >> RA needed >> >> Hi Takatoshi, >> >> I have restored the PSQL to run without corosync so I cannot send you the >> crm_mon output now. >> >> What I can tell for sure: >> - RA never promoted any of the nodes, no matter what the status was. It also >> did not promote the node, when it was the only one. >> - I believe the issue is in the comparison of the xlogs. How could I >> troubleshoot that? I see from the logs that crm NEVER tried to invoke pgsql >> with "promote" >> - I tried previously the crm_mon -A option, but there was never a " >> pgsql-data-status" attribute. The other attribs were there, including >> the HS:alone >> - In the corosync log the only relevant RA message I see is " Master is not >> exist. " I never saw a message like "My data is out-of-date" >> >> Thank you! >> >> Attila >> >> >> -----Original Message----- >> From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] >> Sent: 2011. november 25. 8:56 >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >> RA needed >> >> Hi Attila >> >> 2011/11/24 Attila Megyeri <amegy...@minerva-soft.com>: >> > Hi Takatoshi, All, >> > >> > Thanks for your reply. >> > I see that you have invested significant effort in the development of the >> > RA. I spent the last day trying to set up the RA, but without much success. >> > >> > My infrastructure is very similar to yours, except for the fact that >> > currently I am testing with a single network adapter. >> > >> > Replication works nicely when I start the databases manually, not using >> > corosync. >> > >> > When I try to start using corosync,I see that the ping resources start >> > normally, but the msPostgresql starts on both nodes in slave mode, and I >> > see "HS:alone" >> >> To see "HS:alone" is normal. >> And RA compares xlog locations and promote the postgresql having new data. >> >> > In the Wiki you state, the if I start on a signle node only, PSQL should >> > start in Master mode (PRI), but this is not the case. >> >> If the data is old, the node can't be master. >> To be master needs pgsql-data-status="LATEST" or "STREAMING|SYNC". >> Plese check it using "crm_mon -A". >> >> >> >> >> And to become a master from stopped takes a few minutes because the RA >> compares xlog location on monitor. >> >> >> > The recovery.conf file is created immediately, and from the logs I see no >> > attempt at all to promote the node. >> > In the postgres logs I see that node1, which is supposed to be a master, >> > tries to connect to the vip-rep IP address, which is NOT brought up, >> > because it depends on the Master role... >> > >> > Do you have any idea? >> >> Please check HA log. >> My RA outputs "My data is out-of-date. status=********" to log if the data >> is old. >> >> Regards, >> Takatoshi MATSUO >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > -- > Yoshiharu Mori <y-m...@sraoss.co.jp> > SRA OSS, Inc Japan http://www.sraoss.co.jp > TEL: 03-5979-2701 > FAX: 03-5979-2702 > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org