On 17 Jan 2014, at 4:33 pm, Michael Monette <mmone...@2keys.ca> wrote:
> Hi, > > I have 2 servers setup with Postgres and /dev/drbd1 is mounted at > /var/lib/pgsql. I also have pacemaker setup and it's setup to fail back and > forth between the 2 nodes. It works really well for the most part. > > I am having this one problem and it is happening to all 4 of my clusters. If > the "web_services" resource group is running on database-2.hehe.org and I do > a hard reset on it, it fails over fine and within a few seconds the DB is > running on database-1.hehe.org. I turn the system back on and everything is > fine. It comes back online with no issue and everything continues to run > normally on database-1. crm_mon shows no errors at all, the node simply goes > into online status. > > HOWEVER, If I do a hard shutdown on database-1(or any of my primary nodes, > ldap-1,idp-1,acc-1), it fails over to database-2 just fine. But, when it > comes back into online status it seems like pacemaker tries to move the > resources back to database-1, fails and then the services get restarted on > database-2 because they are moving back. Check out resource-stickiness. Set it to 100 (or so) and you should get the behaviour you want. If not, you might find database-1 is starting pgsql or drbd at boot time. > > Why is it that all of my 1st nodes are trying to take the resources back when > they come back online but none of the 2nd nodes do this? Is there any way to > prevent this? Can PaceMaker not check to see if the resources in the cluster > are already running, and if so, just become an available node for the next > time? > > I tried putting sticky resources to infinity. I have tried starting up the > corosync/pacemaker service with the node in standby beforehand and it's > always the same thing. Once node-1 is online, all the services on node-2 get > interrupted trying to failback, which fails(probably just because drbd is > already in use on the other end). > > Here is my config: > > node database-1.hehe.org \ > attributes standby="off" > node database-2.hehe.org \ > attributes standby="off" > primitive drbd_data ocf:linbit:drbd \ > params drbd_resource="res1" \ > op monitor interval="29s" role="Master" \ > op monitor interval="31s" role="Slave" > primitive fs_data ocf:heartbeat:Filesystem \ > params device="/dev/drbd1" directory="/var/lib/pgsql" fstype="ext4" > primitive httpd lsb:postgresql > primitive ip_httpd ocf:heartbeat:IPaddr2 \ > params ip="10.199.0.11" > group web_services fs_data ip_httpd httpd > ms ms_drbd_data drbd_data \ > meta master-max="1" master-node-max="1" clone-max="2" > clone-node-max="1" notify="true" > colocation web_services_on_drbd inf: httpd ms_drbd_data:Master > order web_services_after_drbd inf: ms_drbd_data:promote web_services:start > property $id="cib-bootstrap-options" \ > dc-version="1.1.10-14.el6_5.1-368c726" \ > cluster-infrastructure="classic openais (with plugin)" \ > expected-quorum-votes="2" \ > stonith-enabled="false" \ > no-quorum-policy="ignore" \ > last-lrm-refresh="1389926961" > > Thanks > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org