On Wed, Feb 29, 2012 at 6:32 PM, Junko IKEDA <tsukishima...@gmail.com> wrote: > Hi, > > I'm running the following simple configuration with Pacemaker 1.1.6, > and try the test case, "resource stop NG and shutdown Pacemaker". > > property \ > no-quorum-policy="ignore" \ > stonith-enabled="false" \ > crmd-transition-delay="2s" > > rsc_defaults \ > resource-stickiness="INFINITY" \ > migration-threshold="1" > > primitive dummy01 ocf:heartbeat:Dummy-stop-NG \ > op start timeout="60s" interval="0s" on-fail="restart" \ > op monitor timeout="60s" interval="7s" on-fail="restart" \ > op stop timeout="60s" interval="0s" on-fail="block" > > > "Dummy-stop-NG" RA just sends "stop NG" to Pacemaker. > > # diff -urNp Dummy Dummy-stop-NG > --- Dummy 2011-06-30 17:43:37.000000000 +0900 > +++ Dummy-stop-NG 2012-02-28 19:11:12.850207767 +0900 > @@ -108,6 +108,8 @@ dummy_start() { > } > > dummy_stop() { > + exit $OCF_ERR_GENERIC > + > dummy_monitor > if [ $? = $OCF_SUCCESS ]; then > rm ${OCF_RESKEY_state} > > > > Before the test, the resource is running on "bl460g6a". > > # crm_simulate -S -x pe-input-1.bz2 > > Current cluster status: > Online: [ bl460g6a bl460g6b ] > > dummy01 (ocf::heartbeat:Dummy-stop-NG): Stopped > > Transition Summary: > crm_simulate[14195]: 2012/02/29_15:46:57 notice: LogActions: Start > dummy01 (bl460g6a) > > Executing cluster transition: > * Executing action 6: dummy01_monitor_0 on bl460g6b > * Executing action 4: dummy01_monitor_0 on bl460g6a > * Executing action 7: dummy01_start_0 on bl460g6a > * Executing action 8: dummy01_monitor_7000 on bl460g6a > > Revised cluster status: > Online: [ bl460g6a bl460g6b ] > > dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6a > > > > Stop Pacemaker on "bl460g6a". > # service heartbeat stop > > Pacemaker tries to stop resouce and move it to "bl460g6b" at first, > # crm_simulate -S -x pe-input-2.bz2 > > Current cluster status: > Online: [ bl460g6a bl460g6b ] > > dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6a > > Transition Summary: > crm_simulate[12195]: 2012/02/29_15:35:02 notice: LogActions: Move > dummy01 (Started bl460g6a -> bl460g6b) > > Executing cluster transition: > * Executing action 6: dummy01_stop_0 on bl460g6a > * Executing action 7: dummy01_start_0 on bl460g6b > * Executing action 8: dummy01_monitor_7000 on bl460g6b > > Revised cluster status: > Online: [ bl460g6a bl460g6b ] > > dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6b > > > > but this action will fail, it means the resource goes into unmanaged state. > # crm_simulate -S -x pe-input-3.bz2 > > Current cluster status: > Online: [ bl460g6a bl460g6b ] > > dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6a > (unmanaged) FAILED > > Transition Summary: > > Executing cluster transition: > > Revised cluster status: > Online: [ bl460g6a bl460g6b ] > > dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6a > (unmanaged) FAILED > > > > Pacemaker shutdown on "bl460g6a" becomes successful, > it seems that the following patch works well. > https://github.com/ClusterLabs/pacemaker/commit/07976fe5eb04c432f1d1c9aebb1b1587ba7f0bcf#pengine/graph.c > > At this time, the resource on "bl460g6a" (pacemaker already shutdowns) > might be running because it fails to stop.
This is because we ignore the status section of any offline nodes when stonith-enabled=false. > In fact, the resource didn't start on "bl460g6b" after its stop NG and > "bl460g6a"'s shutdown, and this is an expectable behavior, > but I could start it on "bl460g6b" with crm command. > This holds the potential for the unexpected active/active status. > Is it possible to prevent it's start in this situation? Only by disabling the logic in https://github.com/ClusterLabs/pacemaker/commit/07976fe5eb04c432f1d1c9aebb1b1587ba7f0bcf#pengine/graph.c when stonith is disabled. > for example, > (1) Dummy runs on node-a > (2) Shutdown Pacemaker on node-a, and Dummy stop NG > (3) Dummy can not run on other nodes > (4) * cleanup the unmanaged status of Dummy after checking it's manual > operation on node-a > (5) * start Dummy on other nodes > This can be the safe way. > > See attached hb_report. > > Thanks, > Junko IKEDA > > NTT DATA INTELLILINK CORPORATION > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org