Folks, I had a working cluster... for a few minutes. Then I restarted one of the nodes in EC2 so it's IP address changed. Now the nodes come up, talk to each other, DRBD syncs, but the filesystem won't start. I'm baffled.
Following is some config info. All I did was update the IP address of aztestc4 in /etc/hosts and in /etc/drbd.d/share.res and reboot to restart everything. /var/log/syslog is so full of stuff that I can't see the trees for the forest. Any help will be greatly appreciated. -- Art Z. root@aztestc3:~# drbdadm status <drbd-status version="8.3.11" api="88"> <resources config_file="/etc/drbd.conf"> <resource minor="1" name="share" cs="Connected" ro1="Primary" ro2="Primary" ds1="UpToDate" ds2="UpToDate" /> </resources> </drbd-status> root@aztestc3:~# crm status ============ Last updated: Tue Nov 20 13:56:43 2012 Last change: Tue Nov 20 13:37:44 2012 via cibadmin on aztestc3 Stack: cman Current DC: aztestc3 - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, unknown expected votes 6 Resources configured. ============ Online: [ aztestc3 aztestc4 ] Master/Slave Set: ms_drbd_share [p_drbd_share] Masters: [ aztestc3 aztestc4 ] Clone Set: cl_o2cb [p_o2cb] Started: [ aztestc3 aztestc4 ] Failed actions: p_fs_share:0_start_0 (node=aztestc3, call=10, rc=1, status=complete): unknown error p_drbd_share:0_promote_0 (node=aztestc3, call=34, rc=1, status=complete): unknown error p_fs_share:0_start_0 (node=aztestc4, call=10, rc=1, status=complete): unknown error root@aztestc3:~# crm configure show node aztestc3 \ attributes standby="off" node aztestc4 \ attributes standby="off" primitive p_drbd_share ocf:linbit:drbd \ params drbd_resource="share" \ op monitor interval="15s" role="Master" timeout="20s" \ op monitor interval="20s" role="Slave" timeout="20s" \ op start interval="0" timeout="240s" \ op stop interval="0" timeout="100s" primitive p_fs_share ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/share" directory="/share" fstype="ocfs2" \ op start interval="0" timeout="60" \ op stop interval="0" timeout="60" \ op monitor interval="20" timeout="40" primitive p_o2cb ocf:pacemaker:o2cb \ params stack="cman" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="100" \ op monitor interval="10" timeout="20" ms ms_drbd_share p_drbd_share \ meta master-max="2" notify="true" interleave="true" clone-max="2" target-role="Started" clone cl_fs_share p_fs_share \ meta interleave="true" notify="true" globally-unique="false" target-role="Started" clone cl_o2cb p_o2cb \ meta interleave="true" globally-unique="false" colocation colo_share inf: cl_fs_share ms_drbd_share:Master cl_o2cb order o_o2cb inf: cl_o2cb cl_fs_share order o_share inf: ms_drbd_share:promote cl_fs_share property $id="cib-bootstrap-options" \ dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ cluster-infrastructure="cman" \ stonith-enabled="false" \ no-quorum-policy="ignore" -- Art Zemon, President Hen's Teeth Network <http://www.hens-teeth.net/> for reliable web hosting and programming (866)HENS-NET / (636)447-3030 ext. 200 / www.hens-teeth.net _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org