On Tue, Dec 21, 2010 at 6:35 PM, Marc Wilmots <desj...@gmail.com> wrote: > Hi, > > I have two nodes rspa and rspa2 (both Centos 5.3 32bits) with the following > packages: > > drbd83-8.3.8-1.el5.centos > heartbeat-3.0.3-2.3.el5 > pacemaker-1.0.10-1.4.el5 > > rspa is stopped, and rspa2 has all the resources (IP, FileSystem, Mysql, > Apache and DRBD Master) > When I start heatbeat on rspa, for some reason (I don't have any > resource_location specified) it tries to move all resources to that node,
I'm guessing that drbd wants to be promoted there - this would result in the group moving too due to the colocation constraint. Perhaps the drbd guys can comment on why this is or why the partition becomes unresponsive. > but when trying to demote drbd on rspa2 (node2) and promote drbd on rspa > (node1) something must go wrong as my DRBD partition (being used by MySQL) > gets unresponsive. > > Next it stops Apache (works), and tries to stop MySQL which fails because it > uses the unresponsive partition. > As a result of this my high availability cluster ends up in the limbo; it > doesn't migrate to node1, neither to node2. > > Any help is welcome here... > > [root@rspa2 ~]# crm status > ============ > Last updated: Tue Dec 21 18:12:47 2010 > Stack: Heartbeat > Current DC: rspa2.sadiel.es (2680c85b-7e6c-4610-88b2-510feb60c4b4) - > partition with quorum > Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > 2 Nodes configured, 2 expected votes > 2 Resources configured. > ============ > > Online: [ rspa2.domain rspa.domain ] > > Resource Group: mysql > fs_mysql (ocf::heartbeat:Filesystem): Started rspa2.domain > ip_mysql (ocf::heartbeat:IPaddr2): Started rspa2.domain > mysqld (lsb:mysqld): Started rspa2.domain (unmanaged) FAILED > apache (lsb:httpd): Stopped > Master/Slave Set: ms_drbd_mysql > Masters: [ rspa2.domain ] > Slaves: [ rspa.domain ] > > Failed actions: > mysqld_stop_0 (node=rspa2.domain, call=18, rc=-2, status=Timed Out): > unknown exec error > > Please see my Pacemaker config: > > node $id="2680c85b-7e6c-4610-88b2-510feb60c4b4" rspa2.domain \ > attributes standby="off" > node $id="f9be4a80-ec2a-42e3-8d86-62dd050b437b" rspa.domain \ > attributes standby="off" > primitive apache lsb:httpd > primitive drbd_mysql ocf:linbit:drbd \ > params drbd_resource="r0" \ > op monitor interval="15s" \ > op monitor interval="16s" role="Master" > primitive fs_mysql ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/opt/drbd/" fstype="xfs" > primitive ip_mysql ocf:heartbeat:IPaddr2 \ > params ip="172.18.2.150" nic="eth0:1" > primitive mysqld lsb:mysqld > group mysql fs_mysql ip_mysql mysqld apache > ms ms_drbd_mysql drbd_mysql \ > meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" > notify="true" target-role="Started" is-managed="true" > colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master > order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start > property $id="cib-bootstrap-options" \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > expected-quorum-votes="2" \ > dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ > cluster-infrastructure="Heartbeat" > > This is what's printed in /var/log/messages: http://pastebin.com/W68jPQKJ > And /var/log/ha.log : http://pastebin.com/SBQz1gU3 > > My DRBD partition (dev/drbd0) is mounted on /opt/drbd and when I do "ls" it > just hangs. > In case it's useful, please see here lsof output: > > [root@rspa2 ~]# lsof | grep drbd > drbd0_wor 3422 root cwd DIR 8,2 4096 2 / > drbd0_wor 3422 root rtd DIR 8,2 4096 2 / > drbd0_wor 3422 root txt unknown > /proc/3422/exe > drbd0_rec 3425 root cwd DIR 8,2 4096 2 / > drbd0_rec 3425 root rtd DIR 8,2 4096 2 / > drbd0_rec 3425 root txt unknown > /proc/3425/exe > drbd0_ase 4876 root cwd DIR 8,2 4096 2 / > drbd0_ase 4876 root rtd DIR 8,2 4096 2 / > drbd0_ase 4876 root txt unknown > /proc/4876/exe > mysqld 12322 mysql cwd DIR 147,0 96 131 > /opt/drbd/mysql > mysqld 12322 mysql 3uW REG 147,0 10485760 135 > /opt/drbd/mysql/ibdata1 > mysqld 12322 mysql 8uW REG 147,0 5242880 133 > /opt/drbd/mysql/ib_logfile0 > mysqld 12322 mysql 9uW REG 147,0 5242880 134 > /opt/drbd/mysql/ib_logfile1 > ls 12729 root 3r DIR 147,0 51 128 > /opt/drbd > bash 12889 root 3r DIR 147,0 51 128 > /opt/drbd > ls 13117 root 3r DIR 147,0 51 128 > /opt/drbd > > Heartbeat configuration file: > [root@rspa2 ~]# cat /etc/ha.d/ha.cf > use_logd no > logfile /var/log/ha.log > autojoin none > warntime 5 > deadtime 15 > initdead 30 > ucast eth0 172.18.2.137 > node rspa.domain rspa2.domain > crm yes > > And last but not least, my DRBD configuration on both nodes: > > global { > usage-count yes; > } > common { > protocol C; > syncer { > rate 10M; > } > } > resource r0 { > net { > data-integrity-alg md5; > } > on rspa.domain { > device /dev/drbd0; > disk /dev/sda4; > address IP:7789; > meta-disk internal; > } > on rspa2.domain { > device /dev/drbd0; > disk /dev/sda4; > address IP:7789; > meta-disk internal; > } > } > > > > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker