Hello,

Thanks to this mailing list I did some changes to our cluster configuration because it was having some trouble (declaring other node unclean quite often).

I did change mode of bond0 interface between nodes to mode "1" (active-backup). Also previously I had two drbd resources - r1 and r2, and they were used as PVs for LVM.

Now it is other way round - I have VG named "local" which hosts LVs which are used as devices for DRBD. And it works this way smoothly until... it joins cluster.

It tries to promote DRBD and then it demotes it quickly, starts and stops. And it goes and goes in seemingly endless loop. But if I try to "help it" manually by drdbadm from console by doing

drbdadm up X
drdbadm primary X

It shows in crm_mon as master and from now on it works fine. This is how I define DRBD resource:

<master id="ms-DRBD-ingold-root">
<meta_attributes id="ms-DRBD-ingold-root-meta_attributes">
<nvpair id="ms-DRBD-ingold-root-meta_attributes-resource-stickiness" name="resource-stickiness" value="100"/> <nvpair id="ms-DRBD-ingold-root-meta_attributes-notify" name="notify" value="true"/> <nvpair id="ms-DRBD-ingold-root-meta_attributes-master-max" name="master-max" value="2"/> <nvpair id="ms-DRBD-ingold-root-meta_attributes-clone-max" name="clone-max" value="2"/> <nvpair id="ms-DRBD-ingold-root-meta_attributes-clone-node-max" name="clone-node-max" value="1"/> <nvpair id="ms-DRBD-ingold-root-meta_attributes-interleave" name="interleave" value="true"/> <nvpair id="ms-DRBD-ingold-root-meta_attributes-globally-unique" name="globally-unique" value="False"/> <nvpair id="ms-DRBD-ingold-root-meta_attributes-target-role" name="target-role" value="Stopped"/>
</meta_attributes>
<primitive class="ocf" id="primitive-DRBD-ingold-root" provider="linbit" type="drbd">
  <instance_attributes id="primitive-DRBD-ingold-root-instance_attributes">
<nvpair id="primitive-DRBD-ingold-root-instance_attributes-drbd_resource" name="drbd_resource" value="drbd26-ingold-root"/>
  </instance_attributes>
  <operations>
<op id="primitive-DRBD-ingold-root-start-0" interval="0" name="start" timeout="240s"/> <op id="primitive-DRBD-ingold-root-stop-0" interval="0" name="stop" timeout="100s"/> <op id="primitive-DRBD-ingold-root-monitor-20" interval="20" name="monitor" role="Master" timeout="20s"/> <op id="primitive-DRBD-ingold-root-monitor-30" interval="30" name="monitor" role="Slave" timeout="20s"> <instance_attributes id="primitive-DRBD-ingold-root-monitor-30-instance_attributes"> <nvpair id="primitive-DRBD-ingold-root-monitor-30-instance_attributes-target-role" name="target-role" value="Master"/>
      </instance_attributes>
    </op>
  </operations>
</primitive>
</master>

Also I have few groupped DRBD resources and in that case it happened that one member of a group got Master and another one was Stopped. It is how I define such group:

<master id="ms-DRBD-bilbo">
<meta_attributes id="ms-DRBD-bilbo-meta_attributes">
<nvpair id="ms-DRBD-bilbo-meta_attributes-resource-stickiness" name="resource-stickiness" value="100"/> <nvpair id="ms-DRBD-bilbo-meta_attributes-notify" name="notify" value="true"/> <nvpair id="ms-DRBD-bilbo-meta_attributes-master-max" name="master-max" value="2"/> <nvpair id="ms-DRBD-bilbo-meta_attributes-clone-max" name="clone-max" value="2"/> <nvpair id="ms-DRBD-bilbo-meta_attributes-clone-node-max" name="clone-node-max" value="1"/> <nvpair id="ms-DRBD-bilbo-meta_attributes-interleave" name="interleave" value="true"/> <nvpair id="ms-DRBD-bilbo-meta_attributes-globally-unique" name="globally-unique" value="False"/> <nvpair id="ms-DRBD-bilbo-meta_attributes-target-role" name="target-role" value="Master"/> <nvpair id="ms-DRBD-bilbo-meta_attributes-is-managed" name="is-managed" value="true"/>
</meta_attributes>
<group id="group-DRBD-bilbo">
<primitive class="ocf" id="primitive-DRBD-bilbo-root" provider="linbit" type="drbd"> <instance_attributes id="primitive-DRBD-bilbo-root-instance_attributes"> <nvpair id="primitive-DRBD-bilbo-root-instance_attributes-drbd_resource" name="drbd_resource" value="drbd19-bilbo-root"/>
    </instance_attributes>
    <operations>
<op id="primitive-DRBD-bilbo-root-start-0" interval="0" name="start" timeout="240s"/> <op id="primitive-DRBD-bilbo-root-stop-0" interval="0" name="stop" timeout="100s"/> <op id="primitive-DRBD-bilbo-root-monitor-20" interval="20" name="monitor" role="Master" timeout="20s"/> <op id="primitive-DRBD-bilbo-root-monitor-30" interval="30" name="monitor" role="Slave" timeout="20s"> <instance_attributes id="primitive-DRBD-bilbo-root-monitor-30-instance_attributes"> <nvpair id="primitive-DRBD-bilbo-root-monitor-30-instance_attributes-target-role" name="target-role" value="Master"/>
        </instance_attributes>
      </op>
    </operations>
    <meta_attributes id="primitive-DRBD-bilbo-root-meta_attributes">
<nvpair id="primitive-DRBD-bilbo-root-meta_attributes-target-role" name="target-role" value="Started"/> <nvpair id="primitive-DRBD-bilbo-root-meta_attributes-is-managed" name="is-managed" value="true"/>
    </meta_attributes>
  </primitive>
<primitive class="ocf" id="primitive-DRBD-bilbo-squid" provider="linbit" type="drbd"> <instance_attributes id="primitive-DRBD-bilbo-squid-instance_attributes"> <nvpair id="primitive-DRBD-bilbo-squid-instance_attributes-drbd_resource" name="drbd_resource" value="drbd20-bilbo-squid"/>
    </instance_attributes>
    <operations>
<op id="primitive-DRBD-bilbo-squid-start-0" interval="0" name="start" timeout="240s"/> <op id="primitive-DRBD-bilbo-squid-stop-0" interval="0" name="stop" timeout="100s"/> <op id="primitive-DRBD-bilbo-squid-monitor-20" interval="20" name="monitor" role="Master" timeout="20s"/> <op id="primitive-DRBD-bilbo-squid-monitor-30" interval="30" name="monitor" role="Slave" timeout="20s"> <instance_attributes id="primitive-DRBD-bilbo-squid-monitor-30-instance_attributes"> <nvpair id="primitive-DRBD-bilbo-squid-monitor-30-instance_attributes-target-role" name="target-role" value="Master"/>
        </instance_attributes>
      </op>
    </operations>
  </primitive>
</group>
</master>

So what I am doing wrong? Now our cluster is offline (corosync is stopped) because it was promoting, demoting, starting, stopping DRBD services and finnaly such node got declared as Unclean and then shooting started.

Funny thing is that if I start drbd manually (by /etc/init.d/drbd start) it starts almost instantly and all resources are up and Primary/Primary. Of cours I don't use that init.d when I use corosync.

Any help will be greatly apperciated.

Thank you!


--
Michał Margula, alche...@uznam.net.pl, http://alchemyx.uznam.net.pl/
"W życiu piękne są tylko chwile" [Ryszard Riedel]

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to