Hi, Sorry, I have managed to fix this now. I noticed in the logline:
Aug 6 13:26:23 ldb03 cibadmin[2140]: notice: crm_log_args: Invoked: cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03" ha="active" in_ccm="false" crmd="offline" join="member" expected="down" crm- debug-origin="manual_clear" shutdown="0"/> the id is ldb03, not the ID of the node, 12303. I removed using: crm_node -R "ldb03" --force and rebooted. Nodes are now in sync. Thanks, Jamie. On Wed, Aug 6, 2014 at 2:43 PM, Jamie <thisbodyd...@gmail.com> wrote: > Hi, > > I have setup a 2 node cluster, using the following packages: > > pacemaker 1.1.10+git20130802-1ubuntu2 > corosync 2.3.3-1ubuntu1 > > My cluster config is as so: > > node $id="12303" ldb03 > node $id="12304" ldb04 > primitive p_fence_ldb03 stonith:external/vcenter \ > params VI_SERVER="10.17.248.10" > VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" > HOSTLIST="ldb03=ldb03" RESETPOWERON="0" pcmk_host_check="static-list" > pcmk_host_list="ldb03" \ > op start interval="0" timeout="500s" > primitive p_fence_ldb04 stonith:external/vcenter \ > params VI_SERVER="10.17.248.10" > VI_CREDSTORE="/root/.vmware/credstore/vicredentials.xml" > HOSTLIST="ldb04=ldb04" RESETPOWERON="0" pcmk_host_check="static-list" > pcmk_host_list="ldb04" \ > op start interval="0" timeout="500s" > primitive p_fs_mysql ocf:heartbeat:Filesystem \ > params device="nfsserver:/LDB_Cluster1" directory="/var/lib/mysql" > fstype="nfs" > > options="relatime,rw,hard,nointr,rsize=32768,wsize=32768,bg,vers=3,proto=tcp > " \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="120s" \ > op monitor interval="60s" timeout="60s" \ > meta is-managed="true" > primitive p_ip_1 ocf:heartbeat:IPaddr2 \ > params ip="10.10.10.11" cidr_netmask="25" \ > op monitor interval="30s" \ > meta target-role="Started" is-managed="true" > primitive p_ip_2 ocf:heartbeat:IPaddr2 \ > params ip="10.10.10.12" cidr_netmask="25" \ > op monitor interval="30s" \ > meta target-role="Started" is-managed="true" > primitive p_ip_3 ocf:heartbeat:IPaddr2 \ > params ip="10.10.10.13" cidr_netmask="25" \ > op monitor interval="30s" \ > meta target-role="Started" is-managed="true" > primitive p_mysql ocf:heartbeat:mysql \ > params datadir="/var/lib/mysql" binary="/usr/bin/mysqld_safe" > socket="/var/run/mysqld/mysqld.sock" \ > op start interval="0" timeout="120" \ > op stop interval="0" timeout="120" \ > op monitor interval="20" timeout="30" \ > meta target-role="Started" is-managed="true" > group g_mysql p_fs_mysql p_mysql p_ip_1 p_ip_2 p_ip_3 \ > location l_fence_ldb03 p_fence_ldb03 -inf: ldb03 > location l_fence_ldb04 p_fence_ldb04 -inf: ldb04 > property $id="cib-bootstrap-options" \ > dc-version="1.1.10-42f2063" \ > cluster-infrastructure="corosync" \ > no-quorum-policy="ignore" \ > stonith-enabled="true" \ > stop-all-resources="false" \ > expected-quorum-votes="2" \ > last-lrm-refresh="1407325251" > > > This exact configuration has worked during the setup, but I have > encountered > a problem with my inactive node ldb03. Corosync shows this node as up: > > root@ldb03:~# corosync-cmapctl | grep members > runtime.totem.pg.mrp.srp.members.12303.config_version (u64) = 0 > runtime.totem.pg.mrp.srp.members.12303.ip (str) = r(0) ip(10.10.10.8) > runtime.totem.pg.mrp.srp.members.12303.join_count (u32) = 1 > runtime.totem.pg.mrp.srp.members.12303.status (str) = joined > runtime.totem.pg.mrp.srp.members.12304.config_version (u64) = 0 > runtime.totem.pg.mrp.srp.members.12304.ip (str) = r(0) ip(10.10.10.9) > runtime.totem.pg.mrp.srp.members.12304.join_count (u32) = 1 > runtime.totem.pg.mrp.srp.members.12304.status (str) = joined > > and crm status and crm node status show it as online: > > Last updated: Wed Aug 6 14:16:24 2014 > Last change: Wed Aug 6 14:02:00 2014 via crm_resource on ldb04 > Stack: corosync > Current DC: ldb04 (12304) - partition with quorum > Version: 1.1.10-42f2063 > 2 Nodes configured > 7 Resources configured > Online: [ ldb03 ldb04 ] > > root@ldb03:~# crm node status > <nodes> > <node id="12304" uname="ldb04"/> > <node id="12303" uname="ldb03"/> > </nodes> > > > but....after seeing this entry in my logs: > Aug 6 13:26:23 ldb03 cibadmin[2140]: notice: crm_log_args: Invoked: > cibadmin -M -c -o status --xml-text <node_state id="ldb03" uname="ldb03" > ha="active" in_ccm="false" crmd="offline" join="member" expected="down" > crm- > debug-origin="manual_clear" shutdown="0"/> > > I noticed that cibadmin shows it as normal(offline) > root@ldb03:~# crm node show > ldb04(12304): normal > ldb03(12303): normal(offline) > > The offline state is not present in anything but cibadmin. Not the cib.xml, > not corosync-quorumtool and a tcpdump shows multicast traffic from both > hosts. > > I tried (hesitantly) to delete the line using cibadmin, but I couldn't > quite > get the syntax right. Any tips on how to get this node to show as online > and > subsequently be able to run resources? Currently, when I run crm resource > move, this has no effect, no errors and nothing noticeable in the logfiles > either. > > Sorry for long thread....I can attach more logs/config if necessary. > > Thanks, > > Jamie. > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org