Hi, I have two nodes rspa and rspa2 (both Centos 5.3 32bits) with the following packages:
drbd83-8.3.8-1.el5.centos heartbeat-3.0.3-2.3.el5 pacemaker-1.0.10-1.4.el5 rspa is stopped, and rspa2 has all the resources (IP, FileSystem, Mysql, Apache and DRBD Master) When I start heatbeat on rspa, for some reason (I don't have any resource_location specified) it tries to move all resources to that node, but when trying to demote drbd on rspa2 (node2) and promote drbd on rspa (node1) something must go wrong as my DRBD partition (being used by MySQL) gets unresponsive. Next it stops Apache (works), and tries to stop MySQL which fails because it uses the unresponsive partition. As a result of this my high availability cluster ends up in the limbo; it doesn't migrate to node1, neither to node2. Any help is welcome here... [r...@rspa2 ~]# crm status ============ Last updated: Tue Dec 21 18:12:47 2010 Stack: Heartbeat Current DC: rspa2.sadiel.es (2680c85b-7e6c-4610-88b2-510feb60c4b4) - partition with quorum Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ rspa2.domain rspa.domain ] Resource Group: mysql fs_mysql (ocf::heartbeat:Filesystem): Started rspa2.domain ip_mysql (ocf::heartbeat:IPaddr2): Started rspa2.domain mysqld (lsb:mysqld): Started rspa2.domain (unmanaged) FAILED apache (lsb:httpd): Stopped Master/Slave Set: ms_drbd_mysql Masters: [ rspa2.domain ] Slaves: [ rspa.domain ] Failed actions: mysqld_stop_0 (node=rspa2.domain, call=18, rc=-2, status=Timed Out): unknown exec error Please see my Pacemaker config: node $id="2680c85b-7e6c-4610-88b2-510feb60c4b4" rspa2.domain \ attributes standby="off" node $id="f9be4a80-ec2a-42e3-8d86-62dd050b437b" rspa.domain \ attributes standby="off" primitive apache lsb:httpd primitive drbd_mysql ocf:linbit:drbd \ params drbd_resource="r0" \ op monitor interval="15s" \ op monitor interval="16s" role="Master" primitive fs_mysql ocf:heartbeat:Filesystem \ params device="/dev/drbd0" directory="/opt/drbd/" fstype="xfs" primitive ip_mysql ocf:heartbeat:IPaddr2 \ params ip="172.18.2.150" nic="eth0:1" primitive mysqld lsb:mysqld group mysql fs_mysql ip_mysql mysqld apache ms ms_drbd_mysql drbd_mysql \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" is-managed="true" colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start property $id="cib-bootstrap-options" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ expected-quorum-votes="2" \ dc-version="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3" \ cluster-infrastructure="Heartbeat" This is what's printed in /var/log/messages: http://pastebin.com/W68jPQKJ And /var/log/ha.log : http://pastebin.com/SBQz1gU3 My DRBD partition (dev/drbd0) is mounted on /opt/drbd and when I do "ls" it just hangs. In case it's useful, please see here lsof output: [r...@rspa2 ~]# lsof | grep drbd drbd0_wor 3422 root cwd DIR 8,2 4096 2 / drbd0_wor 3422 root rtd DIR 8,2 4096 2 / drbd0_wor 3422 root txt unknown /proc/3422/exe drbd0_rec 3425 root cwd DIR 8,2 4096 2 / drbd0_rec 3425 root rtd DIR 8,2 4096 2 / drbd0_rec 3425 root txt unknown /proc/3425/exe drbd0_ase 4876 root cwd DIR 8,2 4096 2 / drbd0_ase 4876 root rtd DIR 8,2 4096 2 / drbd0_ase 4876 root txt unknown /proc/4876/exe mysqld 12322 mysql cwd DIR 147,0 96 131 /opt/drbd/mysql mysqld 12322 mysql 3uW REG 147,0 10485760 135 /opt/drbd/mysql/ibdata1 mysqld 12322 mysql 8uW REG 147,0 5242880 133 /opt/drbd/mysql/ib_logfile0 mysqld 12322 mysql 9uW REG 147,0 5242880 134 /opt/drbd/mysql/ib_logfile1 ls 12729 root 3r DIR 147,0 51 128 /opt/drbd bash 12889 root 3r DIR 147,0 51 128 /opt/drbd ls 13117 root 3r DIR 147,0 51 128 /opt/drbd Heartbeat configuration file: [r...@rspa2 ~]# cat /etc/ha.d/ha.cf use_logd no logfile /var/log/ha.log autojoin none warntime 5 deadtime 15 initdead 30 ucast eth0 172.18.2.137 node rspa.domain rspa2.domain crm yes And last but not least, my DRBD configuration on both nodes: global { usage-count yes; } common { protocol C; syncer { rate 10M; } } resource r0 { net { data-integrity-alg md5; } on rspa.domain { device /dev/drbd0; disk /dev/sda4; address IP:7789; meta-disk internal; } on rspa2.domain { device /dev/drbd0; disk /dev/sda4; address IP:7789; meta-disk internal; } }
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker