Hi, I have setup a 2 node cluster, using the following packages:
pacemaker 1.1.10+git20130802-1ubuntu2 corosync 2.3.3-1ubuntu1 My cluster config is as so: node $id="12303" ldb03 node $id="12304" ldb04 primitive p_fence_ldb03 stonith:external/vcenter \ params VI_SERVER="10.17.248.10" VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" HOSTLIST="ldb03=ldb03" RESETPOWERON="0" pcmk_host_check="static-list" pcmk_host_list="ldb03" \ op start interval="0" timeout="500s" primitive p_fence_ldb04 stonith:external/vcenter \ params VI_SERVER="10.17.248.10" VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" HOSTLIST="ldb04=ldb04" RESETPOWERON="0" pcmk_host_check="static-list" pcmk_host_list="ldb04" \ op start interval="0" timeout="500s" primitive p_fs_mysql ocf:heartbeat:Filesystem \ params device="nfsserver:/LDB_Cluster1" directory="/var/lib/mysql" fstype="nfs" options="relatime,rw,hard,nointr,rsize=32768,wsize=32768,bg,vers=3,proto=tcp" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="120s" \ op monitor interval="60s" timeout="60s" \ meta is-managed="true" primitive p_ip_1 ocf:heartbeat:IPaddr2 \ params ip="10.10.10.11" cidr_netmask="25" \ op monitor interval="30s" \ meta target-role="Started" is-managed="true" primitive p_ip_2 ocf:heartbeat:IPaddr2 \ params ip="10.10.10.12" cidr_netmask="25" \ op monitor interval="30s" \ meta target-role="Started" is-managed="true" primitive p_ip_3 ocf:heartbeat:IPaddr2 \ params ip="10.10.10.13" cidr_netmask="25" \ op monitor interval="30s" \ meta target-role="Started" is-managed="true" primitive p_mysql ocf:heartbeat:mysql \ params datadir="/var/lib/mysql" binary="/usr/bin/mysqld_safe" socket="/var/run/mysqld/mysqld.sock" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ op monitor interval="20" timeout="30" \ meta target-role="Started" is-managed="true" group g_mysql p_fs_mysql p_mysql p_ip_1 p_ip_2 p_ip_3 \ location l_fence_ldb03 p_fence_ldb03 -inf: ldb03 location l_fence_ldb04 p_fence_ldb04 -inf: ldb04 property $id="cib-bootstrap-options" \ dc-version="1.1.10-42f2063" \ cluster-infrastructure="corosync" \ no-quorum-policy="ignore" \ stonith-enabled="true" \ stop-all-resources="false" \ expected-quorum-votes="2" \ last-lrm-refresh="1407325251" This exact configuration has worked during the setup, but I have encountered a problem with my inactive node ldb03. Corosync shows this node as up: root@ldb03:~# corosync-cmapctl | grep members runtime.totem.pg.mrp.srp.members.12303.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.12303.ip (str) = r(0) ip(10.10.10.8) runtime.totem.pg.mrp.srp.members.12303.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.12303.status (str) = joined runtime.totem.pg.mrp.srp.members.12304.config_version (u64) = 0 runtime.totem.pg.mrp.srp.members.12304.ip (str) = r(0) ip(10.10.10.9) runtime.totem.pg.mrp.srp.members.12304.join_count (u32) = 1 runtime.totem.pg.mrp.srp.members.12304.status (str) = joined and crm status and crm node status show it as online: Last updated: Wed Aug 6 14:16:24 2014 Last change: Wed Aug 6 14:02:00 2014 via crm_resource on ldb04 Stack: corosync Current DC: ldb04 (12304) - partition with quorum Version: 1.1.10-42f2063 2 Nodes configured 7 Resources configured Online: [ ldb03 ldb04 ] root@ldb03:~# crm node status <nodes> <node id="12304" uname="ldb04"/> <node id="12303" uname="ldb03"/> </nodes> but....after seeing this entry in my logs: Aug 6 13:26:23 ldb03 cibadmin[2140]: notice: crm_log_args: Invoked: cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03" ha="active" in_ccm="false" crmd="offline" join="member" expected="down" crm-debug-origin="manual_clear" shutdown="0"/> I noticed that cibadmin shows it as normal(offline) root@ldb03:~# crm node show ldb04(12304): normal ldb03(12303): normal(offline) The offline state is not present in anything but cibadmin. Not the cib.xml, not corosync-quorumtool and a tcpdump shows multicast traffic from both hosts. I tried (hesitantly) to delete the line using cibadmin, but I couldn't quite get the syntax right. Any tips on how to get this node to show as online and subsequently be able to run resources? Currently, when I run crm resource move, this has no effect, no errors and nothing noticeable in the logfiles either. Sorry for long thread....I can attach more logs/config if necessary. Thanks, Jamie.
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org