Re: [Pacemaker] Postgresql streaming replication failover - RA needed

Takatoshi MATSUO Sat, 26 Nov 2011 09:32:48 -0800

Hi Attila

1. Are there /var/lib/pgsql/xlog_note.0 , xlog_note.1, xlog_note.2 .... files?
   These files are created while checking a xlog location on monitor.


2. Do these files include lines as below?
---------
pgsql1  0000000019000000
pgsql2  0000000019000000
---------

Regards.
Takatoshi MATSUO


2011年11月26日22:44 Attila Megyeri <amegy...@minerva-soft.com>:
> Hi Yoshiharu, Takatoshi,
>
> Spent another day, without success. :(
>
> I started from scratch and synchronous replications works nicely when nodes 
> are started outside pacemaker.
> My PostgreSQL version is 9.1.1.
>
> When I start from pacemaker, after a while it gets into the following state:
>
> Online: [ psql1 psql2 ]
>
>  Master/Slave Set: msPostgresql [postgresql]
>     Slaves: [ psql1 psql2 ]
>  Clone Set: clnPingCheck [pingCheck]
>     Started: [ psql1 psql2 ]
>
> Node Attributes:
> * Node psql1:
>    + default_ping_set                  : 100
>    + master-postgresql:0               : -INFINITY
>    + pgsql-status                      : HS:alone
>    + pgsql-xlog-loc                    : 0000000019000000
> * Node psql2:
>    + default_ping_set                  : 100
>    + master-postgresql:1               : -INFINITY
>    + pgsql-status                      : HS:alone
>    + pgsql-xlog-loc                    : 0000000019000000
>
>
> The psql status queries return the following:
>
> PSQL1
> ======
> postgres@psql1:/root$ psql  -c "select 
> application_name,upper(state),upper(sync_state) from pg_stat_replication"
> application_name | upper | upper
> ------------------+-------+-------
> (0 rows)
>
> postgres@psql1:/root$ psql  -Atc "select 
> pg_last_xlog_replay_location(),pg_last_xlog_receive_location()"
> 0/19000020|0/19000000
>
> PSQL2
> ======
> postgres@psql2:~$  psql  -c "select 
> application_name,upper(state),upper(sync_state) from pg_stat_replication"
>  application_name | upper | upper
> ------------------+-------+-------
> (0 rows)
>
> postgres@psql2:~$ psql  -Atc "select 
> pg_last_xlog_replay_location(),pg_last_xlog_receive_location()"
> 0/19000000|0/19000000
>
>
> Neither server can connect (obviously) to the master, as the vip_repl Is not 
> brought up.
>
>
> Could you help me understand WHAT is the action/state/event that sould 
> promote one of the nodes? I see that pacemaker monitors the servers every X 
> seconds, but nothing else happens.
>
> In the log (limited to pgsql) the following sequence is repeated forewer
>
> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: Master is not exist.
> Nov 26 13:36:19 psql1 pgsql[19829]: DEBUG: Checking right of master.
> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: My data status=.
> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql1 xlog location : 
> 0000000019000000
> Nov 26 13:36:19 psql1 pgsql[19829]: INFO: psql2 xlog location : 
> 0000000019000000
> Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: PostgreSQL is running as a hot 
> standby.
> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: Master is not exist.
> Nov 26 13:36:26 psql1 pgsql[19993]: DEBUG: Checking right of master.
> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: My data status=.
> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql1 xlog location : 
> 0000000019000000
> Nov 26 13:36:26 psql1 pgsql[19993]: INFO: psql2 xlog location : 
> 0000000019000000
> Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: PostgreSQL is running as a hot 
> standby.
> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: Master is not exist.
> Nov 26 13:36:33 psql1 pgsql[20176]: DEBUG: Checking right of master.
> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: My data status=.
> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql1 xlog location : 
> 0000000019000000
> Nov 26 13:36:33 psql1 pgsql[20176]: INFO: psql2 xlog location : 
> 0000000019000000
> Nov 26 13:36:41 psql1 pgsql[20343]: DEBUG: PostgreSQL is running as a hot 
> standby.
>
>
> Any help is appreciated!
>
> Regards,
> Attila
>
>
>
>
> -----Original Message-----
> From: Yoshiharu Mori [mailto:y-m...@sraoss.co.jp]
> Sent: 2011. november 25. 14:17
> To: The Pacemaker cluster resource manager
> Cc: Attila Megyeri
> Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed
>
> Hi　Attila
>
>> A quick snippet from the corosync.log
>>
>> Nov 23 05:43:05 psql1 pgsql[2845]: DEBUG: Checking right of master.
>> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: My data status=.
>> Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql1 xlog location :
>> 000000000D000000 Nov 23 05:43:05 psql1 pgsql[2845]: INFO: psql2 xlog
>> location : 0000000008000000
>>
>> As you see, the "my data status" returns an empty string.
>
> My log is same. but it works.
>
> Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Master is not exist.
> Nov 18 19:28:26 osspc24-1 pgsql[17350]: INFO: Checking right of master.
> Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: My data status=.
> Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm01 xlog location : 
> 0000000005000020 Nov 18 19:28:19 osspc24-1 pgsql[17138]: INFO: pm02 xlog 
> location : 0000000005000000
>
> In my log, the following logs are outputted and started after checking xlog 
> location(3 times).
>
> Nov 18 19:29:39 osspc24-1 pgsql[18720]: INFO: I have a master right.
>
> Please show us more corosync.log.
>
>
>>
>>
>> -----Original Message-----
>> From: Attila Megyeri [mailto:amegy...@minerva-soft.com]
>> Sent: 2011. november 25. 9:28
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] Postgresql streaming replication failover -
>> RA needed
>>
>> Hi Takatoshi,
>>
>> I have restored the PSQL to run without corosync so I cannot send you the 
>> crm_mon output now.
>>
>> What I can tell for sure:
>> - RA never promoted any of the nodes, no matter what the status was. It also 
>> did not promote the node, when it was the only one.
>> - I believe the issue is in the comparison of the xlogs. How could I 
>> troubleshoot that? I see from the logs that crm NEVER tried to invoke pgsql 
>> with "promote"
>> - I tried previously the crm_mon -A option, but there was never a "
>> pgsql-data-status" attribute. The other attribs were there, including
>> the HS:alone
>> - In the corosync log the only relevant RA message I see is " Master is not 
>> exist. " I never saw a message like  "My data is out-of-date"
>>
>> Thank you!
>>
>> Attila
>>
>>
>> -----Original Message-----
>> From: Takatoshi MATSUO [mailto:matsuo....@gmail.com]
>> Sent: 2011. november 25. 8:56
>> To: The Pacemaker cluster resource manager
>> Subject: Re: [Pacemaker] Postgresql streaming replication failover -
>> RA needed
>>
>> Hi Attila
>>
>> 2011/11/24 Attila Megyeri <amegy...@minerva-soft.com>:
>> > Hi Takatoshi, All,
>> >
>> > Thanks for your reply.
>> > I see that you have invested significant effort in the development of the 
>> > RA. I spent the last day trying to set up the RA, but without much success.
>> >
>> > My infrastructure is very similar to yours, except for the fact that 
>> > currently I am testing with a single network adapter.
>> >
>> > Replication works nicely when I start the databases manually, not using 
>> > corosync.
>> >
>> > When I try to start using corosync,I see that the ping resources start 
>> > normally, but the msPostgresql starts on both nodes in slave mode, and I 
>> > see "HS:alone"
>>
>> To see "HS:alone" is normal.
>> And RA compares xlog locations and promote the postgresql having new data.
>>
>> > In the Wiki you state, the if I start on a signle node only, PSQL should 
>> > start in Master mode (PRI), but this is not the case.
>>
>> If the data is old, the node can't be master.
>> To be master needs pgsql-data-status="LATEST" or "STREAMING|SYNC".
>> Plese check it using "crm_mon -A".
>>
>>
>>
>>
>> And to become a master from stopped takes a few minutes because the RA 
>> compares xlog location on monitor.
>>
>>
>> > The recovery.conf file is created immediately, and from the logs I see no 
>> > attempt at all to promote the node.
>> > In the postgres logs I see that node1, which is supposed to be a master, 
>> > tries to connect to the vip-rep IP address, which is NOT brought up, 
>> > because it depends on the Master role...
>> >
>> > Do you have any idea?
>>
>> Please check HA log.
>> My RA outputs "My data is out-of-date. status=********" to log if the data 
>> is old.
>>
>> Regards,
>> Takatoshi MATSUO
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org Getting started:
>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> --
> Yoshiharu Mori <y-m...@sraoss.co.jp>
> SRA OSS, Inc Japan http://www.sraoss.co.jp
> TEL: 03-5979-2701
> FAX: 03-5979-2702
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Postgresql streaming replication failover - RA needed

Reply via email to