On Mon, Mar 3, 2014 at 9:29 PM, Digimer wrote: > Two possible problems; > > 1. cman's cluster.conf needs the '<cman two_node="1" expected_votes="1" />'. > > 2. You don't have fencing setup. The 'fence_pcmk' script only works if > pacemaker's stonith is enabled and configured properly. Likewise, you will > need to configure DRBD to use the 'crm-fence-peer.sh' handler and have the > 'fencing resource-and-stonith;' policy. > > digimer >
Thanks for your answer digimer, so that no-quorum-policy=ignore part is only for resources, while the cluster.conf has to be put as in RHCS for cluster memberships? I think the problem is partly due to missing stonith configuration, but actually the drbd crm-fence-peer.sh script takes important part too. See below. As my test nodes are vSphere VMs I have installed VMware-vSphere-CLI-5.5 so that next days I can test fence_vmware agent and stonith (I verified that basic "status" and "off" commands work) An insight about configuration: Note that my hostnames are node01.localdomain.com node02.localdomain.com with their ip on 192.168.33.x network I also have another network interface where I used these names: iclnode01 iclnode02 with their ip on 192.168.230.x network and I want to use it for drbd and cluster communication As drbd needs hostnames in its config (at least I read so), I have configured the drbd resource this way: on node01.localdomain.local { address 192.168.230.221:7788; meta-disk internal; } using hostname (node01.localdomain.local) but the ip on the other network (the one of iclnode01). Is this correct? I also put iclnode01 and iclnode02 names in cluster.conf And so pacemaker knows the nodes with iclnode01/02 So in normal situation, crm_mon gives: " Online: [ icloveng01 icloveng02 ] Master/Slave Set: ms_OvirtData [OvirtData] Masters: [ icloveng01 ] Slaves: [ icloveng02 ] " Suppose I power off the slave host (node02), I still get a stop of the drbd resource on master node01 and so of the whole group, because when crm-fence-peer.sh runs it put this kind of constraint Mar 5 10:42:39 node01 crm-fence-peer.sh[18113]: invoked for res0 Mar 5 10:42:39 node01 cibadmin[18144]: notice: crm_log_args: Invoked: cibadmin -C -o constraints -X <rsc_location rsc="ms_MyData" id="drbd-fence-by-handler-res0-ms_MyData">#012 <rule role="Master" score="-INFINITY" id="drbd-fen ce-by-handler-res0-rule-ms_MyData">#012 <expression attribute="#uname" operation="ne" value="node01.localdomai n.local" id="drbd-fence-by-handler-res0-expr-ms_MyData"/>#012 </rule>#012</rsc_location> Mar 5 10:42:39 node01 crmd[1972]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input= I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Mar 5 10:42:39 node01 stonith-ng[1968]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: Diff: --- 0.142.3 Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: Diff: +++ 0.143.1 7a98665c4dd4697f6ed0be42e8c49de5 Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: -- <cib admin_epoch="0" epoch="142" num_updates="3"/> Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++ <rsc_location rsc="ms_MyData" id="drbd-fence-by-han dler-res0-ms_MyData"> Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++ <rule role="Master" score="-INFINITY" id="drbd-fence -by-handler-res0-rule-ms_MyData"> Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++ <expression attribute="#uname" operation="ne" valu e="node01.localdomain.local" id="drbd-fence-by-handler-res0-expr-ms_MyData"/> Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++ </rule> Mar 5 10:42:39 node01 cib[1967]: notice: cib:diff: ++ </rsc_location> Mar 5 10:42:39 node01 pengine[1971]: notice: unpack_config: On loss of CCM Quorum: Ignore Mar 5 10:42:39 node01 pengine[1971]: notice: LogActions: Demote MyData:0#011(Master -> Slave iclnode01) And note that it uses the hostname (node01.localdomain.local), not the intracluster node name (iclnode01) So, as it puts -INFINITY to all that is different by node01.localdomain.local it demotes iclnode01 itself that was master.... Hope to have let more clear my opinion... I walked through the script and found it gets cluster properties this way dc-version=1.1.10-14.el6_5.2-368c726 cluster-infrastructure=cman stonith-enabled=false last-lrm-refresh=1393868222 no-quorum-policy=ignore default-resource-stickiness=200 the constraint is put as: <rsc_location rsc=\"$master_id\" id=\"$id_prefix-$master_id\"> <rule role=\"$role\" score=\"-INFINITY\" id=\"$id_prefix-rule-$master_id\"> <expression attribute=\"$fencing_attribute\" operation=\"ne\" value=\"$fencing_value\" id=\"$id_prefix-expr-$master_id\"/> </rule> </rsc_location>" and $fencing_value is the actor in place.. but it is assigned this way I don't completely understand (#uname in particular...) if [[ $fencing_attribute = "#uname" ]]; then fencing_value=$HOSTNAME elif ! fencing_value=$(crm_attribute -Q -t nodes -n $fencing_attribute 2>/dev/null); then fencing_attribute="#uname" fencing_value=$HOSTNAME fi Inside the script there is statically HOSTNAME=$(uname -n) so there is no much chance to customize.. I suppose and I have to use hostname and its network for my intracluster??? BTW inside the script there are some references to crm_attibute that I don't find in the manual page (eg option -t with "status" and "nodes" values that I don't find in manual page or --help invocation) BBTW: clustering is going to become more and more difficult and confusing: I'm still trying to use now cman instead of corosync with pacemaker as recommended for 6.5 and I just found that in RH EL 7 beta cman is out completely and there is again corosync.... ;-) https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/7-Beta/html/High_Availability_Add-On_Reference/s1-configfileoverview-HAAR.html Gianluca _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org