Hi there, I have successfully configured a 2 node DRBD pacemaker cluster using the instructions provided by LINBIT here: http://www.drbd.org/users-guide-emb/ch-pacemaker.html. The cluster works perfectly and I can migrate the resources back and forth between the two nodes without a problem. However, when simulating certain cluster communication failures, I am having problems preventing the DRBD cluster from entering a split brain state. I have been led to believe that STONITH will help prevent split brain situations, but the LINBIT instructions do not provide any guidance on how to conifgure STONITH in the pacemaker cluster. The only thing I can find in LINBITs documentation is where it talks about the resource fencing options within the /etc/drbd.conf of which I have configured:
resource r0 { disk { fencing resource-only; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } I'm still at a loss to understand what actually triggers DRBD to run the above fencing scripts or how to tell if it has run them. I've searched the internet high and low for example pacemaker configs that show you how to configure STONITH resources for DRBD, but I can't find anything useful. Whilst hunting the Internet I did find this howto: ( http://www.howtoforge.com/installation-and-setup-guide-for-drbd-openais-pacemaker-xen-on-opensuse-11.1) that spells out how to configure a DRBD pacemaker cluster and even states the following: "STONITH is disabled in this [example] configuration though it is highly-recommended in any production environment to eliminate the risk of divergent data." Infuriatingly it doesn't tell you how to configure STONITH! Could someone you please, please, please give me some pointers or some helpful examples on how I go about configuring STONITH and or modifying my pacemaker configuration in any other ways to get it into a production ready state? My current configuration is listed below: The cluster is built on 2 Redhat EL 5.3 servers running the following software versions: drbd-8.3.6-1 pacemaker-1.0.5-4.1 openais-0.80.5-15.1 r...@mq001:~# crm configure show node mq001.back.live.cwwtf.local node mq002.back.live.cwwtf.local primitive activemq-emp lsb:bbc-activemq-emp primitive activemq-forge-services lsb:bbc-activemq-forge- services primitive activemq-social lsb:activemq-social primitive drbd_activemq ocf:linbit:drbd \ params drbd_resource="r0" \ op monitor interval="15s" primitive fs_activemq ocf:heartbeat:Filesystem \ params device="/dev/drbd1" directory="/drbd" fstype="ext3" primitive ip_activemq ocf:heartbeat:IPaddr2 \ params ip="172.23.8.71" nic="eth0" group activemq fs_activemq ip_activemq activemq-forge-services activemq-emp activemq-social ms ms_drbd_activemq drbd_activemq \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation activemq_on_drbd inf: activemq ms_drbd_activemq:Master order activemq_after_drbd inf: ms_drbd_activemq:promote activemq:start property $id="cib-bootstrap-options" \ dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ last-lrm-refresh="1260809203" /etc/drbd.conf global { usage-count no; } common { protocol C; } resource r0 { disk { fencing resource-only; } handlers { fence-peer "/usr/lib/drbd/crm-fence-peer. sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } syncer { rate 40M; } on mq001.back.live.cwwtf.local { device /dev/drbd1; disk /dev/cciss/c0d0p1; address 172.23.8.69:7789; meta-disk internal; } on mq002.back.live.cwwtf.local { device /dev/drbd1; disk /dev/cciss/c0d0p1; address 172.23.8.70:7789; meta-disk internal; } } r...@mq001:~# cat /etc/ais/openais.conf totem { version: 2 token: 3000 token_retransmits_before_loss_const: 10 join: 60 consensus: 1500 vsftype: none max_messages: 20 clear_node_high_bit: yes secauth: on threads: 0 rrp_mode: passive interface { ringnumber: 0 bindnetaddr: 172.59.60.0 mcastaddr: 239.94.1.1 mcastport: 5405 } interface { ringnumber: 1 bindnetaddr: 172.23.8.0 mcastaddr: 239.94.2.1 mcastport: 5405 } } logging { to_stderr: yes debug: on timestamp: on to_file: no to_syslog: yes syslog_facility: daemon } amf { mode: disabled } service { ver: 0 name: pacemaker use_mgmtd: yes } aisexec { user: root group: root } Many Thanks, Tom
_______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker