Hi Takatoshi, You were right, changing the shell to bash resolved the problem. The cluster now started in sync mode - thank you very much. I will be testing it in the next couple of days. I did just a very quick test - it seems that psql master failed over to psql2 properly, but when I tried to move it back to psql1 there was some problems starting psql on node 1.
Does it work fine for you in both directions? Thank you very much. Have a nice weekend, Attila -----Original Message----- From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] Sent: 2011. november 27. 6:12 To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed Hi Attila 2011/11/27 Attila Megyeri <amegy...@minerva-soft.com>: > Hi Takatoshi, > > Thank you for coming back to me so quickly. > > In the /var/lib/pgsql there are the following files: > > PSQL1: > ===== > root@psql1:/var/lib/pgsql# ls -la > total 16 > drwxr-xr-x 2 postgres postgres 4096 Nov 26 18:04 . > drwxr-xr-x 35 root root 4096 Nov 25 22:21 .. > -rw-r--r-- 1 postgres postgres 1 Nov 26 00:17 rep_mode.conf > -rw-r--r-- 1 root root 49 Nov 26 18:04 xlog_note.0 > > root@psql1:/var/lib/pgsql# cat xlog_note.0 -e psql1 0000000019000000 > psql2 0000000019000000 > root@psql1:/var/lib/pgsql# > > PSQL2: > ======= > root@psql2:/var/lib/pgsql# ls -la > total 16 > drwxr-xr-x 2 postgres postgres 4096 Nov 26 18:05 . > drwxr-xr-x 33 root root 4096 Nov 26 00:10 .. > -rw-r--r-- 1 postgres postgres 1 Nov 26 00:24 rep_mode.conf > -rw-r--r-- 1 root root 49 Nov 26 18:05 xlog_note.0 > root@psql2:/var/lib/pgsql# cat xlog_note.0 -e psql1 0000000019000000 > psql2 0000000019000000 > root@psql2:/var/lib/pgsql# It seems that dash's bultin echo command is used because echo with "-e" option dose not function. Perhaps my RA also depends on bash. Can you use a bash instead of a dash? > BTW, postgres is installed under /var/lib/postgresql , but I noticed that > some parts of the RA are referring to the /var/lib/pgsql directory, so I > created that directory and i keep some of the files there. It's no ploblem. If you want to change this path, please specify it using "tmpdir" parameter. Regards, Takatoshi MATSUO > > Thanks, > Attila > > > > -----Original Message----- > From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] > Sent: 2011. november 26. 18:27 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Postgresql streaming replication failover - > RA needed > > Hi Attila > > 1. Are there /var/lib/pgsql/xlog_note.0 , xlog_note.1, xlog_note.2 .... files? > These files are created while checking a xlog location on monitor. > > 2. Do these files include lines as below? > --------- > pgsql1 0000000019000000 > pgsql2 0000000019000000 > --------- > > Regards. > Takatoshi MATSUO > > > 2011年11月26日22:44 Attila Megyeri <amegy...@minerva-soft.com>: >> Hi Yoshiharu, Takatoshi, >> >> Spent another day, without success. :( >> >> I started from scratch and synchronous replications works nicely when nodes >> are started outside pacemaker. >> My PostgreSQL version is 9.1.1. >> >> When I start from pacemaker, after a while it gets into the following state: >> >> Online: [ psql1 psql2 ] >> >> Master/Slave Set: msPostgresql [postgresql] >> Slaves: [ psql1 psql2 ] >> Clone Set: clnPingCheck [pingCheck] >> Started: [ psql1 psql2 ] >> >> Node Attributes: >> * Node psql1: >> + default_ping_set : 100 >> + master-postgresql:0 : -INFINITY >> + pgsql-status : HS:alone >> + pgsql-xlog-loc : 0000000019000000 >> * Node psql2: >> + default_ping_set : 100 >> + master-postgresql:1 : -INFINITY >> + pgsql-status : HS:alone >> + pgsql-xlog-loc : 0000000019000000 >> >> >> The psql status queries return the following: >> >> PSQL1 >> ====== >> postgres@psql1:/root$ psql -c "select >> application_name,upper(state),upper(sync_state) from pg_stat_replication" >> application_name | upper | upper >> ------------------+-------+------- >> (0 rows) >> >> postgres@psql1:/root$ psql -Atc "select >> pg_last_xlog_replay_location(),pg_last_xlog_receive_location()" >> 0/19000020|0/19000000 >> >> PSQL2 >> ====== >> postgres@psql2:~$ psql -c "select >> application_name,upper(state),upper(sync_state) from pg_stat_replication" >> application_name | upper | upper >> ------------------+-------+------- >> (0 rows) >> >> postgres@psql2:~$ psql -Atc "select >> pg_last_xlog_replay_location(),pg_last_xlog_receive_location()" >> 0/19000000|0/19000000 >> >> >> Neither server can connect (obviously) to the master, as the vip_repl Is not >> brought up. >> >> >> Could you help me understand WHAT is the action/state/event that sould >> promote one of the nodes? I see that pacemaker monitors the servers every X >> seconds, but nothing else happens. >> >> In the log (limited to pgsql) the following sequence is repeated >> forewer >> >> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: Master is not exist. >> Nov 26 13:36:19 psql1 pgsql[19829]: DEBUG: Checking right of master. >> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: My data status=. >> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql1 xlog location : >> 0000000019000000 Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql2 xlog >> location : 0000000019000000 Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: >> PostgreSQL is running as a hot standby. >> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: Master is not exist. >> Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: Checking right of master. >> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: My data status=. >> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql1 xlog location : >> 0000000019000000 Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql2 xlog >> location : 0000000019000000 Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: >> PostgreSQL is running as a hot standby. >> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: Master is not exist. >> Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: Checking right of master. >> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: My data status=. >> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql1 xlog location : >> 0000000019000000 Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql2 xlog >> location : 0000000019000000 Nov 26 13:36:41 psql1 pgsql[20343]: DEBUG: >> PostgreSQL is running as a hot standby. >> >> >> Any help is appreciated! >> >> Regards, >> Attila >> >> >> >> >> -----Original Message----- >> From: Yoshiharu Mori [mailto:y-m...@sraoss.co.jp] >> Sent: 2011. november 25. 14:17 >> To: The Pacemaker cluster resource manager >> Cc: Attila Megyeri >> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >> RA needed >> >> Hi Attila >> >>> A quick snippet from the corosync.log >>> >>> Nov 23 05:43:05 psql1 pgsql[2845]: DEBUG: Checking right of master. >>> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: My data status=. >>> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql1 xlog location : >>> 000000000D000000 Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql2 xlog >>> location : 0000000008000000 >>> >>> As you see, the "my data status" returns an empty string. >> >> My log is same. but it works. >> >> Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Master is not exist. >> Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Checking right of master. >> Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: My data status=. >> Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm01 xlog location : >> 0000000005000020 Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm02 >> xlog location : 0000000005000000 >> >> In my log, the following logs are outputted and started after checking xlog >> location(3 times). >> >> Nov 18 19:29:39 osspc24-1 pgsql[18720]: INFO: I have a master right. >> >> Please show us more corosync.log. >> >> >>> >>> >>> -----Original Message----- >>> From: Attila Megyeri [mailto:amegy...@minerva-soft.com] >>> Sent: 2011. november 25. 9:28 >>> To: The Pacemaker cluster resource manager >>> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >>> RA needed >>> >>> Hi Takatoshi, >>> >>> I have restored the PSQL to run without corosync so I cannot send you the >>> crm_mon output now. >>> >>> What I can tell for sure: >>> - RA never promoted any of the nodes, no matter what the status was. It >>> also did not promote the node, when it was the only one. >>> - I believe the issue is in the comparison of the xlogs. How could I >>> troubleshoot that? I see from the logs that crm NEVER tried to invoke pgsql >>> with "promote" >>> - I tried previously the crm_mon -A option, but there was never a " >>> pgsql-data-status" attribute. The other attribs were there, >>> including the HS:alone >>> - In the corosync log the only relevant RA message I see is " Master is not >>> exist. " I never saw a message like "My data is out-of-date" >>> >>> Thank you! >>> >>> Attila >>> >>> >>> -----Original Message----- >>> From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] >>> Sent: 2011. november 25. 8:56 >>> To: The Pacemaker cluster resource manager >>> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >>> RA needed >>> >>> Hi Attila >>> >>> 2011/11/24 Attila Megyeri <amegy...@minerva-soft.com>: >>> > Hi Takatoshi, All, >>> > >>> > Thanks for your reply. >>> > I see that you have invested significant effort in the development of the >>> > RA. I spent the last day trying to set up the RA, but without much >>> > success. >>> > >>> > My infrastructure is very similar to yours, except for the fact that >>> > currently I am testing with a single network adapter. >>> > >>> > Replication works nicely when I start the databases manually, not using >>> > corosync. >>> > >>> > When I try to start using corosync,I see that the ping resources start >>> > normally, but the msPostgresql starts on both nodes in slave mode, and I >>> > see "HS:alone" >>> >>> To see "HS:alone" is normal. >>> And RA compares xlog locations and promote the postgresql having new data. >>> >>> > In the Wiki you state, the if I start on a signle node only, PSQL should >>> > start in Master mode (PRI), but this is not the case. >>> >>> If the data is old, the node can't be master. >>> To be master needs pgsql-data-status="LATEST" or "STREAMING|SYNC". >>> Plese check it using "crm_mon -A". >>> >>> >>> >>> >>> And to become a master from stopped takes a few minutes because the RA >>> compares xlog location on monitor. >>> >>> >>> > The recovery.conf file is created immediately, and from the logs I see no >>> > attempt at all to promote the node. >>> > In the postgres logs I see that node1, which is supposed to be a master, >>> > tries to connect to the vip-rep IP address, which is NOT brought up, >>> > because it depends on the Master role... >>> > >>> > Do you have any idea? >>> >>> Please check HA log. >>> My RA outputs "My data is out-of-date. status=********" to log if the data >>> is old. >>> >>> Regards, >>> Takatoshi MATSUO >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> -- >> Yoshiharu Mori <y-m...@sraoss.co.jp> >> SRA OSS, Inc Japan http://www.sraoss.co.jp >> TEL: 03-5979-2701 >> FAX: 03-5979-2702 >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org