On Wed, Apr 13, 2011 at 7:19 PM, Bob Schatz <bsch...@yahoo.com> wrote: > Andrew, > Thanks for responding. Comments inline with <Bob> > ________________________________ > From: Andrew Beekhof <and...@beekhof.net> > To: The Pacemaker cluster resource manager <pacemaker@oss.clusterlabs.org> > Cc: Bob Schatz <bsch...@yahoo.com> > Sent: Tue, April 12, 2011 11:23:14 PM > Subject: Re: [Pacemaker] Question regarding starting of master/slave > resources and ELECTIONs > > On Wed, Apr 13, 2011 at 4:54 AM, Bob Schatz <bsch...@yahoo.com> wrote: >> Hi, >> I am running Pacemaker 1.0.9 with Heartbeat 3.0.3. >> I create 5 master/slave resources in /etc/ha.d/resource.d/startstop during >> post-start. > > I had no idea this was possible. Why would you do this? > <Bob> We and I know of a couple of other companies, bundle > LinuxHA/Pacemaker into an appliance. For me, when the appliance boots, it > creates HA resources based on the hardware it discovers. I assumed that > once POST-START was called in the startstop script and we have a DC then the > cluster is up and running. I then use "crm" commands to create the > configuration, etc. I further assumed that since we have one DC in the > cluster then all "crm" commands which modify the configuration would be > ordered even if the DC fails over to a different node. Is this incorrect?
Its correct. Its just a usecase I had ever thought of :-) > >> I noticed that 4 of the master/slave resources will start right away but >> the >> 5 master/slave resource seems to take a minute or so and I am only running >> with one node. >> Is this expected? > > Probably, if the other 4 take around a minute each to start. > There is an lrmd config variable that controls how much parallelism it > allows (but i forget the name). > <Bob> It's max-children and I set it to 40 for this test to see if it would > change the behavior. (/sbin/lrmadmin -p max-children 40) Thats surprising. I'll have a look at the logs. > >> My configuration is below and I have also attached ha-debug. >> Also, what triggers a crmd election? > > Node up/down events and whenever someone replaces the cib (which the > shell used to do a lot). > <Bob> For my test, I only started one node so that I could avoid node > up/down events. I believe the log shows the cib being replaced. Since I am > using crm then I assume it must be due to crm. Do the crm_resource, etc > commands also replace the cib? No. Also I believe recent versions of the shell no longer use a replace operation by default. > Would that avoid elections as a result of > cibs being replaced? Only avoiding the replace operation in the first place. After a replace we have to repopulate the status section to ensure it is accurate. > Thanks, > Bob > >> I seemed to have a lot of elections in >> the attached log. I was assuming that on a single node I would only run >> the >> election once in the beginning and then there would not be another one >> until >> a new node joined. >> >> Thanks, >> Bob >> >> My configuration is: >> node $id="856c1f72-7cd1-4906-8183-8be87eef96f2" mgraid-s000030311-1 >> primitive SSJ000030312 ocf:omneon:ss \ >> params ss_resource="SSJ000030312" >> ssconf="/var/omneon/config/config.J000030312" \ >> op monitor interval="3s" role="Master" timeout="7s" \ >> op monitor interval="10s" role="Slave" timeout="7" \ >> op stop interval="0" timeout="20" \ >> op start interval="0" timeout="300" >> primitive SSJ000030313 ocf:omneon:ss \ >> params ss_resource="SSJ000030313" >> ssconf="/var/omneon/config/config.J000030313" \ >> op monitor interval="3s" role="Master" timeout="7s" \ >> op monitor interval="10s" role="Slave" timeout="7" \ >> op stop interval="0" timeout="20" \ >> op start interval="0" timeout="300" >> primitive SSJ000030314 ocf:omneon:ss \ >> params ss_resource="SSJ000030314" >> ssconf="/var/omneon/config/config.J000030314" \ >> op monitor interval="3s" role="Master" timeout="7s" \ >> op monitor interval="10s" role="Slave" timeout="7" \ >> op stop interval="0" timeout="20" \ >> op start interval="0" timeout="300" >> primitive SSJ000030315 ocf:omneon:ss \ >> params ss_resource="SSJ000030315" >> ssconf="/var/omneon/config/config.J000030315" \ >> op monitor interval="3s" role="Master" timeout="7s" \ >> op monitor interval="10s" role="Slave" timeout="7" \ >> op stop interval="0" timeout="20" \ >> op start interval="0" timeout="300" >> primitive SSS000030311 ocf:omneon:ss \ >> params ss_resource="SSS000030311" >> ssconf="/var/omneon/config/config.S000030311" \ >> op monitor interval="3s" role="Master" timeout="7s" \ >> op monitor interval="10s" role="Slave" timeout="7" \ >> op stop interval="0" timeout="20" \ >> op start interval="0" timeout="300" >> primitive icms lsb:S53icms \ >> op monitor interval="5s" timeout="7" \ >> op start interval="0" timeout="5" >> primitive mgraid-stonith stonith:external/mgpstonith \ >> params hostlist="mgraid-canister" \ >> op monitor interval="0" timeout="20s" >> primitive omserver lsb:S49omserver \ >> op monitor interval="5s" timeout="7" \ >> op start interval="0" timeout="5" >> ms ms-SSJ000030312 SSJ000030312 \ >> meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" >> ms ms-SSJ000030313 SSJ000030313 \ >> meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" >> ms ms-SSJ000030314 SSJ000030314 \ >> meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" >> ms ms-SSJ000030315 SSJ000030315 \ >> meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" >> ms ms-SSS000030311 SSS000030311 \ >> meta clone-max="2" notify="true" globally-unique="false" >> target-role="Started" >> clone Fencing mgraid-stonith >> clone cloneIcms icms >> clone cloneOmserver omserver >> location ms-SSJ000030312-master-w1 ms-SSJ000030312 \ >> rule $id="ms-SSJ000030312-master-w1-rule" $role="master" 100: >> #uname >> eq mgraid-s000030311-0 >> location ms-SSJ000030313-master-w1 ms-SSJ000030313 \ >> rule $id="ms-SSJ000030313-master-w1-rule" $role="master" 100: >> #uname >> eq mgraid-s000030311-0 >> location ms-SSJ000030314-master-w1 ms-SSJ000030314 \ >> rule $id="ms-SSJ000030314-master-w1-rule" $role="master" 100: >> #uname >> eq mgraid-s000030311-0 >> location ms-SSJ000030315-master-w1 ms-SSJ000030315 \ >> rule $id="ms-SSJ000030315-master-w1-rule" $role="master" 100: >> #uname >> eq mgraid-s000030311-0 >> location ms-SSS000030311-master-w1 ms-SSS000030311 \ >> rule $id="ms-SSS000030311-master-w1-rule" $role="master" 100: >> #uname >> eq mgraid-s000030311-0 >> order orderms-SSJ000030312 0: cloneIcms ms-SSJ000030312 >> order orderms-SSJ000030313 0: cloneIcms ms-SSJ000030313 >> order orderms-SSJ000030314 0: cloneIcms ms-SSJ000030314 >> order orderms-SSJ000030315 0: cloneIcms ms-SSJ000030315 >> order orderms-SSS000030311 0: cloneIcms ms-SSS000030311 >> property $id="cib-bootstrap-options" \ >> dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ >> cluster-infrastructure="Heartbeat" \ >> dc-deadtime="5s" \ >> stonith-enabled="true" >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> >> > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker