Hi Attila 2011/11/27 Attila Megyeri <amegy...@minerva-soft.com>: > Hi Takatoshi, > > You were right, changing the shell to bash resolved the problem. > The cluster now started in sync mode - thank you very much.
You're very welcome. > I will be testing it in the next couple of days. I did just a very quick test > - it seems that psql > master failed over to psql2 properly, but when I tried to move it back to > psql1 there was some > problems starting psql on node 1. If master(psql1) is failed, its data may be inconsistency. A PostgreSQL developer says that it's a feature. Therefore my RA prevent it from starting automatically if data is inconsistency. Please backup psql2' data and restore it to psql1, and remove /var/lib/pgsql/PGSQL.lock file before clearing failcount. I use rsync to backup and restore in the following way. ----- # psql -h 192.168.2.114 -U postgres -c "SELECT pg_start_backup('label')" # rsync -avr --delete --exclude=postmaster.pid 192.168.2.114:/var/lib/pgsql/9.1/data/ /var/lib/pgsql/9.1/data/ # psql -h 192.168.2.114 -U postgres -c "SELECT pg_stop_backup()" ----- BTW I fixed some bugs 2 days ago. Please use the newest version. Thanks, Takatoshi MATSUO > > Does it work fine for you in both directions? > > Thank you very much. > > Have a nice weekend, > > Attila > > > > -----Original Message----- > From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] > Sent: 2011. november 27. 6:12 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed > > Hi Attila > > 2011/11/27 Attila Megyeri <amegy...@minerva-soft.com>: >> Hi Takatoshi, >> >> Thank you for coming back to me so quickly. >> >> In the /var/lib/pgsql there are the following files: >> >> PSQL1: >> ===== >> root@psql1:/var/lib/pgsql# ls -la >> total 16 >> drwxr-xr-x 2 postgres postgres 4096 Nov 26 18:04 . >> drwxr-xr-x 35 root root 4096 Nov 25 22:21 .. >> -rw-r--r-- 1 postgres postgres 1 Nov 26 00:17 rep_mode.conf >> -rw-r--r-- 1 root root 49 Nov 26 18:04 xlog_note.0 >> >> root@psql1:/var/lib/pgsql# cat xlog_note.0 -e psql1 0000000019000000 >> psql2 0000000019000000 >> root@psql1:/var/lib/pgsql# >> >> PSQL2: >> ======= >> root@psql2:/var/lib/pgsql# ls -la >> total 16 >> drwxr-xr-x 2 postgres postgres 4096 Nov 26 18:05 . >> drwxr-xr-x 33 root root 4096 Nov 26 00:10 .. >> -rw-r--r-- 1 postgres postgres 1 Nov 26 00:24 rep_mode.conf >> -rw-r--r-- 1 root root 49 Nov 26 18:05 xlog_note.0 >> root@psql2:/var/lib/pgsql# cat xlog_note.0 -e psql1 0000000019000000 >> psql2 0000000019000000 >> root@psql2:/var/lib/pgsql# > > It seems that dash's bultin echo command is used because echo with "-e" > option dose not function. > > Perhaps my RA also depends on bash. > Can you use a bash instead of a dash? > >> BTW, postgres is installed under /var/lib/postgresql , but I noticed that >> some parts of the RA are referring to the /var/lib/pgsql directory, so I >> created that directory and i keep some of the files there. > > It's no ploblem. > If you want to change this path, please specify it using "tmpdir" parameter. > > Regards, > Takatoshi MATSUO > >> >> Thanks, >> Attila >> >> >> >> -----Original Message----- >> From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] >> Sent: 2011. november 26. 18:27 >> To: The Pacemaker cluster resource manager >> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >> RA needed >> >> Hi Attila >> >> 1. Are there /var/lib/pgsql/xlog_note.0 , xlog_note.1, xlog_note.2 .... >> files? >> These files are created while checking a xlog location on monitor. >> >> 2. Do these files include lines as below? >> --------- >> pgsql1 0000000019000000 >> pgsql2 0000000019000000 >> --------- >> >> Regards. >> Takatoshi MATSUO >> >> >> 2011年11月26日22:44 Attila Megyeri <amegy...@minerva-soft.com>: >>> Hi Yoshiharu, Takatoshi, >>> >>> Spent another day, without success. :( >>> >>> I started from scratch and synchronous replications works nicely when nodes >>> are started outside pacemaker. >>> My PostgreSQL version is 9.1.1. >>> >>> When I start from pacemaker, after a while it gets into the following state: >>> >>> Online: [ psql1 psql2 ] >>> >>> Master/Slave Set: msPostgresql [postgresql] >>> Slaves: [ psql1 psql2 ] >>> Clone Set: clnPingCheck [pingCheck] >>> Started: [ psql1 psql2 ] >>> >>> Node Attributes: >>> * Node psql1: >>> + default_ping_set : 100 >>> + master-postgresql:0 : -INFINITY >>> + pgsql-status : HS:alone >>> + pgsql-xlog-loc : 0000000019000000 >>> * Node psql2: >>> + default_ping_set : 100 >>> + master-postgresql:1 : -INFINITY >>> + pgsql-status : HS:alone >>> + pgsql-xlog-loc : 0000000019000000 >>> >>> >>> The psql status queries return the following: >>> >>> PSQL1 >>> ====== >>> postgres@psql1:/root$ psql -c "select >>> application_name,upper(state),upper(sync_state) from pg_stat_replication" >>> application_name | upper | upper >>> ------------------+-------+------- >>> (0 rows) >>> >>> postgres@psql1:/root$ psql -Atc "select >>> pg_last_xlog_replay_location(),pg_last_xlog_receive_location()" >>> 0/19000020|0/19000000 >>> >>> PSQL2 >>> ====== >>> postgres@psql2:~$ psql -c "select >>> application_name,upper(state),upper(sync_state) from pg_stat_replication" >>> application_name | upper | upper >>> ------------------+-------+------- >>> (0 rows) >>> >>> postgres@psql2:~$ psql -Atc "select >>> pg_last_xlog_replay_location(),pg_last_xlog_receive_location()" >>> 0/19000000|0/19000000 >>> >>> >>> Neither server can connect (obviously) to the master, as the vip_repl Is >>> not brought up. >>> >>> >>> Could you help me understand WHAT is the action/state/event that sould >>> promote one of the nodes? I see that pacemaker monitors the servers every X >>> seconds, but nothing else happens. >>> >>> In the log (limited to pgsql) the following sequence is repeated >>> forewer >>> >>> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: Master is not exist. >>> Nov 26 13:36:19 psql1 pgsql[19829]: DEBUG: Checking right of master. >>> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: My data status=. >>> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql1 xlog location : >>> 0000000019000000 Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql2 xlog >>> location : 0000000019000000 Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: >>> PostgreSQL is running as a hot standby. >>> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: Master is not exist. >>> Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: Checking right of master. >>> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: My data status=. >>> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql1 xlog location : >>> 0000000019000000 Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql2 xlog >>> location : 0000000019000000 Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: >>> PostgreSQL is running as a hot standby. >>> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: Master is not exist. >>> Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: Checking right of master. >>> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: My data status=. >>> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql1 xlog location : >>> 0000000019000000 Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql2 xlog >>> location : 0000000019000000 Nov 26 13:36:41 psql1 pgsql[20343]: DEBUG: >>> PostgreSQL is running as a hot standby. >>> >>> >>> Any help is appreciated! >>> >>> Regards, >>> Attila >>> >>> >>> >>> >>> -----Original Message----- >>> From: Yoshiharu Mori [mailto:y-m...@sraoss.co.jp] >>> Sent: 2011. november 25. 14:17 >>> To: The Pacemaker cluster resource manager >>> Cc: Attila Megyeri >>> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >>> RA needed >>> >>> Hi Attila >>> >>>> A quick snippet from the corosync.log >>>> >>>> Nov 23 05:43:05 psql1 pgsql[2845]: DEBUG: Checking right of master. >>>> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: My data status=. >>>> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql1 xlog location : >>>> 000000000D000000 Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql2 xlog >>>> location : 0000000008000000 >>>> >>>> As you see, the "my data status" returns an empty string. >>> >>> My log is same. but it works. >>> >>> Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Master is not exist. >>> Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Checking right of master. >>> Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: My data status=. >>> Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm01 xlog location : >>> 0000000005000020 Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm02 >>> xlog location : 0000000005000000 >>> >>> In my log, the following logs are outputted and started after checking xlog >>> location(3 times). >>> >>> Nov 18 19:29:39 osspc24-1 pgsql[18720]: INFO: I have a master right. >>> >>> Please show us more corosync.log. >>> >>> >>>> >>>> >>>> -----Original Message----- >>>> From: Attila Megyeri [mailto:amegy...@minerva-soft.com] >>>> Sent: 2011. november 25. 9:28 >>>> To: The Pacemaker cluster resource manager >>>> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >>>> RA needed >>>> >>>> Hi Takatoshi, >>>> >>>> I have restored the PSQL to run without corosync so I cannot send you the >>>> crm_mon output now. >>>> >>>> What I can tell for sure: >>>> - RA never promoted any of the nodes, no matter what the status was. It >>>> also did not promote the node, when it was the only one. >>>> - I believe the issue is in the comparison of the xlogs. How could I >>>> troubleshoot that? I see from the logs that crm NEVER tried to invoke >>>> pgsql with "promote" >>>> - I tried previously the crm_mon -A option, but there was never a " >>>> pgsql-data-status" attribute. The other attribs were there, >>>> including the HS:alone >>>> - In the corosync log the only relevant RA message I see is " Master is >>>> not exist. " I never saw a message like "My data is out-of-date" >>>> >>>> Thank you! >>>> >>>> Attila >>>> >>>> >>>> -----Original Message----- >>>> From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] >>>> Sent: 2011. november 25. 8:56 >>>> To: The Pacemaker cluster resource manager >>>> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >>>> RA needed >>>> >>>> Hi Attila >>>> >>>> 2011/11/24 Attila Megyeri <amegy...@minerva-soft.com>: >>>> > Hi Takatoshi, All, >>>> > >>>> > Thanks for your reply. >>>> > I see that you have invested significant effort in the development of >>>> > the RA. I spent the last day trying to set up the RA, but without much >>>> > success. >>>> > >>>> > My infrastructure is very similar to yours, except for the fact that >>>> > currently I am testing with a single network adapter. >>>> > >>>> > Replication works nicely when I start the databases manually, not using >>>> > corosync. >>>> > >>>> > When I try to start using corosync,I see that the ping resources start >>>> > normally, but the msPostgresql starts on both nodes in slave mode, and I >>>> > see "HS:alone" >>>> >>>> To see "HS:alone" is normal. >>>> And RA compares xlog locations and promote the postgresql having new data. >>>> >>>> > In the Wiki you state, the if I start on a signle node only, PSQL should >>>> > start in Master mode (PRI), but this is not the case. >>>> >>>> If the data is old, the node can't be master. >>>> To be master needs pgsql-data-status="LATEST" or "STREAMING|SYNC". >>>> Plese check it using "crm_mon -A". >>>> >>>> >>>> >>>> >>>> And to become a master from stopped takes a few minutes because the RA >>>> compares xlog location on monitor. >>>> >>>> >>>> > The recovery.conf file is created immediately, and from the logs I see >>>> > no attempt at all to promote the node. >>>> > In the postgres logs I see that node1, which is supposed to be a master, >>>> > tries to connect to the vip-rep IP address, which is NOT brought up, >>>> > because it depends on the Master role... >>>> > >>>> > Do you have any idea? >>>> >>>> Please check HA log. >>>> My RA outputs "My data is out-of-date. status=********" to log if the data >>>> is old. >>>> >>>> Regards, >>>> Takatoshi MATSUO >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> >>> -- >>> Yoshiharu Mori <y-m...@sraoss.co.jp> >>> SRA OSS, Inc Japan http://www.sraoss.co.jp >>> TEL: 03-5979-2701 >>> FAX: 03-5979-2702 >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org