----- Original Message ----- > From: "Art Zemon" <a...@hens-teeth.net> > To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> > Sent: Monday, December 10, 2012 6:07:04 PM > Subject: Re: [Pacemaker] Trouble Starting Filesystem > > Folks, > > I am still struggling with this problem. At the moment, I cannot get > my OCSF2 filesystem to start at all. OCFS2 worked until I expanded > my cluster from 2 nodes to 4 nodes. > > I see this in /var/log/syslog. In particular, note the "FATAL: Module > scsi_hostadapter not found." on the last line.
Pretty sure this doesn't have anything to do with your problem. I had this when testing OCFS2 in the cluster and it never caused any issues... when I asked about it someone mentioned it was something not cleaned up somewhere but harmless > Dec 10 16:48:03 aztestc1 crmd: [2416]: info: do_lrm_rsc_op: > Performing key=71:14:0:a766cb8e-4813-483e-a127-d67cf25979ea > op=p_fs_share_plesk:0_start_0 ) > Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: on_msg_perform_op:2396: > copying parameters for rsc p_fs_share_plesk:0 > Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: on_msg_perform_op: add > an operation operation start[29] on p_fs_share_plesk:0 for client > 2416, its parameters: > CRM_meta_notify_start_resource=[p_fs_share_plesk:0 > p_fs_share_plesk:1 ] CRM_meta_notify_stop_resource=[ ] > fstype=[ocfs2] CRM_meta_notify_demote_resource=[ ] > CRM_meta_notify_master_uname=[ ] CRM_meta_notify_promote_uname=[ ] > CRM_meta_timeout=[60000] options=[rw,noatime] CRM_meta_name=[start] > CRM_meta_notify_inactive_resource=[p_fs_share_plesk:0 > p_fs_share_plesk:1 ] CRM_meta_notify_start_uname=[aztestc1 aztestc2 > ] crm_feature_set=[3.0 to the operation list. > Dec 10 16:48:03 aztestc1 lrmd: [2413]: info: rsc:p_fs_share_plesk:0 > start[29] (pid 4528) > Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No > match for //cib_update_result//diff-added//crm_config in > /notify/cib_update_result/diff > Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No > match for //cib_update_result//diff-added//crm_config in > /notify/cib_update_result/diff > Dec 10 16:48:03 aztestc1 lrmd: [2413]: debug: > rsc:p_drbd_share_plesk:1 monitor[16] (pid 4530) > Dec 10 16:48:03 aztestc1 crmd: [2416]: debug: get_xpath_object: No > match for //cib_update_result//diff-added//crm_config in > /notify/cib_update_result/diff > Dec 10 16:48:03 aztestc1 Filesystem[4528]: INFO: Running start for > /dev/drbd/by-res/shareplesk on /shareplesk > Dec 10 16:48:03 aztestc1 drbd[4530]: DEBUG: shareplesk: Calling > /usr/sbin/crm_master -Q -l reboot -v 10000 > Dec 10 16:48:03 aztestc1 lrmd: [2413]: info: RA output: > (p_fs_share_plesk:0:start:stderr) FATAL: Module scsi_hostadapter not > found. > > > > DRBD is running in dual-primary mode: > > root@aztestc1:~# service drbd status > drbd driver loaded OK; device status: > version: 8.3.11 (api:88/proto:86-96) > srcversion: 71955441799F513ACA6DA60 > m:res cs ro ds p > mounted fstype > 1:shareplesk Connected Primary/Primary UpToDate/UpToDate C > > > > Everything looks happy: > > root@aztestc1:~# crm_mon -1 > ============ > Last updated: Mon Dec 10 16:59:40 2012 > Last change: Mon Dec 10 16:48:02 2012 via crmd on aztestc3 > Stack: cman > Current DC: aztestc3 - partition with quorum > Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c > 4 Nodes configured, unknown expected votes > 10 Resources configured. > ============ > > Online: [ aztestc3 aztestc4 aztestc1 aztestc2 ] > > Clone Set: cl_fencing [p_stonith] > Started: [ aztestc2 aztestc1 aztestc4 aztestc3 ] > Clone Set: cl_o2cb [p_o2cb] > Started: [ aztestc1 aztestc2 ] > Master/Slave Set: ms_drbd_share_plesk [p_drbd_share_plesk] > Masters: [ aztestc2 aztestc1 ] > > Failed actions: > p_fs_share_plesk:1_start_0 (node=aztestc2, call=31, rc=1, > status=complete): unknown error > p_fs_share_plesk:0_start_0 (node=aztestc1, call=29, rc=1, > status=complete): unknown error Would help if you pastebin log output a little before and after the failed filesystem actions from aztest1/2. I don't know what a return code 1 from filesystem RA signifies but that would be a good start. > > > > Here is my complete configuration, which does not work: > > node aztestc1 \ > attributes standby="off" > node aztestc2 \ > attributes standby="off" > node aztestc3 \ > attributes standby="off" > node aztestc4 \ > attributes standby="off" > primitive p_drbd_share_plesk ocf:linbit:drbd \ > params drbd_resource="shareplesk" \ > op monitor interval="15s" role="Master" timeout="20s" \ > op monitor interval="20s" role="Slave" timeout="20s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" > primitive p_fs_share_plesk ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/shareplesk" directory="/shareplesk" > fstype="ocfs2" options="rw,noatime" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" \ > op monitor interval="20" timeout="40" > primitive p_o2cb ocf:pacemaker:o2cb \ > params stack="cman" \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" timeout="20" > primitive p_stonith stonith:fence_ec2 \ > params pcmk_host_check="static-list" pcmk_host_list="aztestc1 > aztestc2 aztestc3 aztestc4" \ > op monitor interval="600s" timeout="300s" \ > op start start-delay="10s" interval="0" > ms ms_drbd_share_plesk p_drbd_share_plesk \ > meta master-max="2" notify="true" interleave="true" clone-max="2" > is-managed="true" target-role="Started" > clone cl_fencing p_stonith \ > meta target-role="Started" > clone cl_fs_share_plesk p_fs_share_plesk \ > meta clone-max="2" interleave="true" notify="true" > globally-unique="false" target-role="Started" > clone cl_o2cb p_o2cb \ > meta clone-max="2" interleave="true" globally-unique="false" > target-role="Started" > location lo_drbd_plesk3 ms_drbd_share_plesk -inf: aztestc3 > location lo_drbd_plesk4 ms_drbd_share_plesk -inf: aztestc4 > location lo_fs_plesk3 cl_fs_share_plesk -inf: aztestc3 > location lo_fs_plesk4 cl_fs_share_plesk -inf: aztestc4 > location lo_o2cb3 cl_o2cb -inf: aztestc3 > location lo_o2cb4 cl_o2cb -inf: aztestc4 > order o_20plesk inf: ms_drbd_share_plesk:promote cl_o2cb:start > order o_40fs_plesk inf: cl_o2cb cl_fs_share_plesk > property $id="cib-bootstrap-options" \ > stonith-enabled="true" \ > stonith-timeout="180s" \ > no-quorum-policy="freeze" \ > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > cluster-infrastructure="cman" \ > last-lrm-refresh="1355179514" > rsc_defaults $id="rsc-options" \ > resource-stickiness="100" > Not sure it will fix your issues completely but I would add a couple collocation statements and remove a few location statements. The collocation statements basically say resource A must run where resource B runs; if resource B isn't running on a particular node then resource A cannot run there. This way once you've located DRBD then the other resources can only run on the nodes that DRBD is running on: Remove these: > location lo_fs_plesk3 cl_fs_share_plesk -inf: aztestc3 > location lo_fs_plesk4 cl_fs_share_plesk -inf: aztestc4 > location lo_o2cb3 cl_o2cb -inf: aztestc3 > location lo_o2cb4 cl_o2cb -inf: aztestc4 Add these: collocation c_o2cb_on_drbd_master inf: cl_o2cb ms_drbd_share_plesk:Master collocation c_fs_on_o2cb inf: cl_fs_share_plesk cl_o2cb If you wanted to make it more concise you could group a few resources which would reduce the number of statements. This assumes you didn't make the changes listed above yet: Remove these: > clone cl_fs_share_plesk p_fs_share_plesk \ > meta clone-max="2" interleave="true" notify="true" > globally-unique="false" target-role="Started" > clone cl_o2cb p_o2cb \ > meta clone-max="2" interleave="true" globally-unique="false" > target-role="Started" > location lo_fs_plesk3 cl_fs_share_plesk -inf: aztestc3 > location lo_fs_plesk4 cl_fs_share_plesk -inf: aztestc4 > location lo_o2cb3 cl_o2cb -inf: aztestc3 > location lo_o2cb4 cl_o2cb -inf: aztestc4 > order o_20plesk inf: ms_drbd_share_plesk:promote cl_o2cb:start > order o_40fs_plesk inf: cl_o2cb cl_fs_share_plesk Add these (you can't multi-state resources in a group FYI): group g_o2cb_fs p_o2cb p_fs_share_plesk clone cl_o2cb_fs g_o2cb_fs meta clone-max="2" interleave="true" notify="true" globally-unique="false" order o_drbd_then_ocfs inf: ms_drbd_share_plesk:promote cl_o2cb_fs:start collocation c_ocfs_on_drbd_master inf: cl_o2cb_fs ms_drbd_share_plesk:Master > > > and here is my previous 2-node configuration, which worked "mostly." > Sometimes I had to manually "crm resource cleanup cl_fs_share" to > get the filesystem to mount but otherwise eveyrthing was fine. I'm surprised you don't have any collocation statements in this 'working' config - maybe why you had occasional issues. HTH Jake > > node aztestc1 \ > attributes standby="off" > node aztestc2 \ > attributes standby="off" > primitive p_drbd_share ocf:linbit:drbd \ > params drbd_resource="share" \ > op monitor interval="15s" role="Master" timeout="20s" \ > op monitor interval="20s" role="Slave" timeout="20s" \ > op start interval="0" timeout="240s" \ > op stop interval="0" timeout="100s" > primitive p_fs_share ocf:heartbeat:Filesystem \ > params device="/dev/drbd/by-res/share" directory="/share" > fstype="ocfs2" options="rw,noatime" \ > op start interval="0" timeout="60" \ > op stop interval="0" timeout="60" \ > op monitor interval="20" timeout="40" > primitive p_o2cb ocf:pacemaker:o2cb \ > params stack="cman" \ > op start interval="0" timeout="90" \ > op stop interval="0" timeout="100" \ > op monitor interval="10" timeout="20" > primitive p_stonith stonith:fence_ec2 \ > params pcmk_host_check="static-list" pcmk_host_list="aztestc1 > aztestc2" \ > op monitor interval="600s" timeout="300s" \ > op start start-delay="10s" interval="0" > ms ms_drbd_share p_drbd_share \ > meta master-max="2" notify="true" interleave="true" clone-max="2" > is-managed="true" target-role="Started" > clone cl_fencing p_stonith \ > meta target-role="Started" > clone cl_fs_share p_fs_share \ > meta interleave="true" notify="true" globally-unique="false" > target-role="Started" > clone cl_o2cb p_o2cb \ > meta interleave="true" globally-unique="false" > order o_ocfs2 inf: ms_drbd_share:promote cl_o2cb > order o_share inf: cl_o2cb cl_fs_share > property $id="cib-bootstrap-options" \ > stonith-enabled="true" \ > stonith-timeout="180s" \ > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \ > cluster-infrastructure="cman" \ > last-lrm-refresh="1354808774" > > > Thoughts? Ideas? Suggestions? > > Thank you, > -- Art Z. > > -- > Art Zemon, President > [http://www.hens-teeth.net/] Hen's Teeth Network for reliable web > hosting and programming > (866)HENS-NET / (636)447-3030 ext. 200 / www.hens-teeth.net > > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org