Re: [Pacemaker] When the ex-live server comes back online, it tries to failback causing a failure and restart in services

Andrew Beekhof Sun, 16 Feb 2014 16:22:25 -0800

On 17 Jan 2014, at 4:33 pm, Michael Monette <mmone...@2keys.ca> wrote:


> Hi,
> 
> I have 2 servers setup with Postgres and /dev/drbd1 is mounted at 
> /var/lib/pgsql. I also have pacemaker setup and it's setup to fail back and 
> forth between the 2 nodes. It works really well for the most part.
> 
> I am having this one problem and it is happening to all 4 of my clusters. If 
> the "web_services" resource group is running on database-2.hehe.org and I do 
> a hard reset on it, it fails over fine and within a few seconds the DB is 
> running on database-1.hehe.org. I turn the system back on and everything is 
> fine. It comes back online with no issue and everything continues to run 
> normally on database-1. crm_mon shows no errors at all, the node simply goes 
> into online status.
> 
> HOWEVER, If I do a hard shutdown on database-1(or any of my primary nodes, 
> ldap-1,idp-1,acc-1), it fails over to database-2 just fine. But, when it 
> comes back into online status it seems like pacemaker tries to move the 
> resources back to database-1, fails and then the services get restarted on 
> database-2 because they are moving back.

Check out resource-stickiness.  Set it to 100 (or so) and you should get the 
behaviour you want.
If not, you might find database-1 is starting pgsql or drbd at boot time.

> 
> Why is it that all of my 1st nodes are trying to take the resources back when 
> they come back online but none of the 2nd nodes do this? Is there any way to 
> prevent this? Can PaceMaker not check to see if the resources in the cluster 
> are already running, and if so, just become an available node for the next 
> time? 
> 
> I tried putting sticky resources to infinity. I have tried starting up the 
> corosync/pacemaker service with the node in standby beforehand and it's 
> always the same thing. Once node-1 is online, all the services on node-2 get 
> interrupted trying to failback, which fails(probably just because drbd is 
> already in use on the other end).
> 
> Here is my config:
> 
> node database-1.hehe.org \
>       attributes standby="off"
> node database-2.hehe.org \
>       attributes standby="off"
> primitive drbd_data ocf:linbit:drbd \
>       params drbd_resource="res1" \
>       op monitor interval="29s" role="Master" \
>       op monitor interval="31s" role="Slave"
> primitive fs_data ocf:heartbeat:Filesystem \
>       params device="/dev/drbd1" directory="/var/lib/pgsql" fstype="ext4"
> primitive httpd lsb:postgresql
> primitive ip_httpd ocf:heartbeat:IPaddr2 \
>       params ip="10.199.0.11"
> group web_services fs_data ip_httpd httpd
> ms ms_drbd_data drbd_data \
>       meta master-max="1" master-node-max="1" clone-max="2" 
> clone-node-max="1" notify="true"
> colocation web_services_on_drbd inf: httpd ms_drbd_data:Master
> order web_services_after_drbd inf: ms_drbd_data:promote web_services:start
> property $id="cib-bootstrap-options" \
>       dc-version="1.1.10-14.el6_5.1-368c726" \
>       cluster-infrastructure="classic openais (with plugin)" \
>       expected-quorum-votes="2" \
>       stonith-enabled="false" \
>       no-quorum-policy="ignore" \
>       last-lrm-refresh="1389926961"
> 
> Thanks
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] When the ex-live server comes back online, it tries to failback causing a failure and restart in services

Reply via email to