On Fri, Nov 25, 2011 at 8:44 AM, Attila Megyeri <amegy...@minerva-soft.com> wrote: > Hi Gents, > > I see from time to time that you are asking for "cibadmin -Ql" type outputs > to help people troubleshoot their problems. > > Currenty I have an issue promoting a MS resource (the PSQL issue in the > previous mail) - and I would like to start troubleshooting the problem, but > did not find any howto's or documentation on this topic. > Could you provide me any details on how to troubleshoot cib states?
Start with crm_mon -o Then check what crm_simulate -L says. Try adding additional -V arguments and grepping for your resource name. > My current issue is that I have a MS resource that is started in slave/slave > mode, and the "promote" is never even called by the cib. I'd like to start > the research but have no idea how to do it. Are you sure the promote doesnt happen? No mention of it in the logs? > > I have read the pacemaker doc, as well as the cluster from srcatch doc, but > there are no troubleshooting hints. > > Thank you in advance, > > Attila > > -----Original Message----- > From: Attila Megyeri [mailto:amegy...@minerva-soft.com] > Sent: 2011. november 23. 16:53 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed > > Hi Takatoshi, All, > > Thanks for your reply. > I see that you have invested significant effort in the development of the RA. > I spent the last day trying to set up the RA, but without much success. > > My infrastructure is very similar to yours, except for the fact that > currently I am testing with a single network adapter. > > Replication works nicely when I start the databases manually, not using > corosync. > > When I try to start using corosync,I see that the ping resources start > normally, but the msPostgresql starts on both nodes in slave mode, and I see > "HS:alone" > > In the Wiki you state, the if I start on a signle node only, PSQL should > start in Master mode (PRI), but this is not the case. > > The recovery.conf file is created immediately, and from the logs I see no > attempt at all to promote the node. > In the postgres logs I see that node1, which is supposed to be a master, > tries to connect to the vip-rep IP address, which is NOT brought up, because > it depends on the Master role... > > Do you have any idea? > > > My environment: > Debian Squeeze, with backported pacemaker (Version: 1.1.5) - official > pacemaker in debian is rather old and buggy Postgres 9.1, streaming > replication, sync mode > Node1: psql1, 10.12.1.21 > Node1: psql2, 10.12.1.22 > > Crm config: > > node psql1 \ > attributes standby="off" > node psql2 \ > attributes standby="off" > primitive pingCheck ocf:pacemaker:ping \ > params name="default_ping_set" host_list="10.12.1.1" multiplier="100" \ > op start interval="0s" timeout="60s" on-fail="restart" \ > op monitor interval="10s" timeout="60s" on-fail="restart" \ > op stop interval="0s" timeout="60s" on-fail="ignore" > primitive postgresql ocf:heartbeat:pgsql \ > params pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" psql="/usr/bin/psql" > pgdata="/var/lib/postgresql/9.1/main" > config="/etc/postgresql/9.1/main/postgresql.conf" > pgctldata="/usr/lib/postgresql/9.1/bin/pg_controldata" rep_mode="sync" > node_list="psql1 psql2" restore_command="cp > /var/lib/postgresql/9.1/main/pg_archive/%f %p" master_ip="10.12.1.28" \ > op start interval="0s" timeout="60s" on-fail="restart" \ > op monitor interval="7s" timeout="60s" on-fail="restart" \ > op monitor interval="2s" role="Master" timeout="60s" on-fail="restart" > \ > op promote interval="0s" timeout="60s" on-fail="restart" \ > op demote interval="0s" timeout="60s" on-fail="block" \ > op stop interval="0s" timeout="60s" on-fail="block" \ > op notify interval="0s" timeout="60s" > primitive vip-master ocf:heartbeat:IPaddr2 \ > params ip="10.12.1.20" nic="eth0" cidr_netmask="24" \ > op start interval="0s" timeout="60s" on-fail="restart" \ > op monitor interval="10s" timeout="60s" on-fail="restart" \ > op stop interval="0s" timeout="60s" on-fail="block" \ > meta target-role="Started" > primitive vip-rep ocf:heartbeat:IPaddr2 \ > params ip="10.12.1.28" nic="eth0" cidr_netmask="24" \ > op start interval="0s" timeout="60s" on-fail="restart" \ > op monitor interval="10s" timeout="60s" on-fail="restart" \ > op stop interval="0s" timeout="60s" on-fail="block" \ > meta target-role="Started" > primitive vip-slave ocf:heartbeat:IPaddr2 \ > params ip="10.12.1.27" nic="eth0" cidr_netmask="24" \ > meta resource-stickiness="1" \ > op start interval="0s" timeout="60s" on-fail="restart" \ > op monitor interval="10s" timeout="60s" on-fail="restart" \ > op stop interval="0s" timeout="60s" on-fail="block" > group master-group vip-master vip-rep > ms msPostgresql postgresql \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" target-role="Master" > clone clnPingCheck pingCheck > location rsc_location-1 vip-slave \ > rule $id="rsc_location-1-rule" 200: pgsql-status eq HS:sync \ > rule $id="rsc_location-1-rule-0" 100: pgsql-status eq PRI \ > rule $id="rsc_location-1-rule-1" -inf: not_defined pgsql-status \ > rule $id="rsc_location-1-rule-2" -inf: pgsql-status ne HS:sync and > pgsql-status ne PRI location rsc_location-2 msPostgresql \ > rule $id="rsc_location-2-rule" $role="master" 200: #uname eq psql1 \ > rule $id="rsc_location-2-rule-0" $role="master" 100: #uname eq psql2 \ > rule $id="rsc_location-2-rule-1" $role="master" -inf: defined > fail-count-vip-master \ > rule $id="rsc_location-2-rule-2" $role="master" -inf: defined > fail-count-vip-rep \ > rule $id="rsc_location-2-rule-3" -inf: not_defined default_ping_set or > default_ping_set lt 100 colocation rsc_colocation-1 inf: msPostgresql > clnPingCheck colocation rsc_colocation-2 inf: master-group > msPostgresql:Master order rsc_order-1 0: clnPingCheck msPostgresql order > rsc_order-2 0: msPostgresql:promote master-group:start symmetrical=false > order rsc_order-3 0: msPostgresql:demote master-group:stop symmetrical=false > property $id="cib-bootstrap-options" \ > dc-version="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \ > cluster-infrastructure="openais" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" > rsc_defaults $id="rsc-options" \ > resource-stickiness="INFINITY" \ > migration-threshold="1" > > > > Regards, > Attila > > > > -----Original Message----- > From: Takatoshi MATSUO [mailto:matsuo....@gmail.com] > Sent: 2011. november 17. 8:04 > To: The Pacemaker cluster resource manager > Subject: Re: [Pacemaker] Postgresql streaming replication failover - RA needed > > Hi All > > I create a RA for PosstgrSQL 9.1 Streaming Replication based on pgsql. > > RA > https://github.com/t-matsuo/resource-agents/blob/pgsql91/heartbeat/pgsql > Documents > https://github.com/t-matsuo/resource-agents/wiki > > It is almost totally changed from previous patch > http://lists.linux-ha.org/pipermail/linux-ha-dev/2011-February/018193.html > . > It create recovery.conf and promote PostgreSQL automatically. > Additionally it can switch between the synchronous and asynchronous > replication automatically. > > If you please, use them and comment. > > Regards, > Takatoshi MATSUO > > 2011/11/17 Serge Dubrouski <serge...@gmail.com>: >> >> >> On Wed, Nov 16, 2011 at 12:55 PM, Attila Megyeri >> <amegy...@minerva-soft.com> >> wrote: >>> >>> Hi Florian, >>> >>> -----Original Message----- >>> From: Florian Haas [mailto:flor...@hastexo.com] >>> Sent: 2011. november 16. 11:49 >>> To: The Pacemaker cluster resource manager >>> Subject: Re: [Pacemaker] Postgresql streaming replication failover - >>> RA needed >>> >>> Hi Attila, >>> >>> On 2011-11-16 10:27, Attila Megyeri wrote: >>> > Hi All, >>> > >>> > >>> > >>> > We have a two-node postgresql 9.1 system configured using streaming >>> > replicaiton(active/active with a read-only slave). >>> > >>> > We want to automate the failover process and I couldn't really find >>> > a resource agent that could do the job. >>> >>> That is correct; the pgsql resource agent (unlike its mysql >>> counterpart) does not support streaming replication. We've had a >>> contributor submit a patch at one point, but it was somewhat >>> ill-conceived and thus did not make it into the upstream repo. The relevant >>> thread is here: >>> >>> http://lists.linux-ha.org/pipermail/linux-ha-dev/2011-February/018195 >>> .html >>> >>> Would you feel comfortable modifying the pgsql resource agent to >>> support replication? If so, we could revisit this issue and >>> potentially add streaming replication support to pgsql. >>> >>> >>> Well I'm not sure I would be able to do that change. Failover is >>> relatively easy to do but I really have no idea how to do the failback part. >> >> And that's exactly the reason why I haven't implemented it yet. With >> the current way how replication is done in PostgreSQL there is no easy >> way to switch between roles, or at least I don't know about a such way. >> Implementing just fail-over functionality by creating a trigger file >> on a slave server in the case of failure on master side doesn't create >> a full master-slave implementation in my opinion. >> >>> >>> I will definitively have to sort this out somehow, I am just unsure >>> whether I will try to use the repmgr mentioned in the video, or >>> pacemaker with some level of customization... >>> >>> Is the resource agent that you mentioned available somewhere? >>> >>> Thanks. >>> Attila >>> >>> >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacem >>> aker >> >> >> >> -- >> Serge Dubrouski. >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacema >> ker >> >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org