Hi, sorry again, I checked the latest code, and it says,
} else if (wrapper->action->rsc && wrapper->action->rsc != action->rsc && is_set(wrapper->action->rsc->flags, pe_rsc_failed) && is_not_set(wrapper->action->rsc->flags, pe_rsc_managed) && strstr(wrapper->action->uuid, "_stop_0") && action->rsc && action->rsc->variant >= pe_clone) { crm_warn("Ignoring requirement that %s comeplete before %s:" " unmanaged failed resources cannot prevent clone shutdown", wrapper->action->uuid, action->uuid); return FALSE; It seems that lf#1959 is for the clone resource issue. The behavior which I posted is the other one. In the current specification, does "stop NG action" prevent Pacemaker shutdown? Thanks, Junko 2012/2/29 Junko IKEDA <tsukishima...@gmail.com>: > Hi, > > additional information; > (1) resource is running on DC > (2) shutdown Pacemaker on DC, and resource goes into stop NG(unmanaged) > (3) the other node becomes DC > (4) resource starts on the new DC > (this resource has unmanaged status on the old DC...) > > see attached the other hb_report. > > By the way, this patch means, > if there are some unmanaged resources, the operation of "Pacemaker > shutdown" becomes successful, right? > > High: PE: Bug lf#1959 - Fail unmanaged resources should not prevent > other services from shutting down > https://github.com/ClusterLabs/pacemaker/commit/07976fe5eb04c432f1d1c9aebb1b1587ba7f0bcf#pengine/graph.c > > I don't know the detail of lf#1959, and it would be better to setup > STONITH to handle "stop" fail unmanaged resource, > but stop NG action do not permit Pacemaker to shutdown itself just in case. > > Thanks, > Junko > > 2012/2/29 Junko IKEDA <tsukishima...@gmail.com>: >> Hi, >> >> I'm running the following simple configuration with Pacemaker 1.1.6, >> and try the test case, "resource stop NG and shutdown Pacemaker". >> >> property \ >> no-quorum-policy="ignore" \ >> stonith-enabled="false" \ >> crmd-transition-delay="2s" >> >> rsc_defaults \ >> resource-stickiness="INFINITY" \ >> migration-threshold="1" >> >> primitive dummy01 ocf:heartbeat:Dummy-stop-NG \ >> op start timeout="60s" interval="0s" on-fail="restart" \ >> op monitor timeout="60s" interval="7s" on-fail="restart" \ >> op stop timeout="60s" interval="0s" on-fail="block" >> >> >> "Dummy-stop-NG" RA just sends "stop NG" to Pacemaker. >> >> # diff -urNp Dummy Dummy-stop-NG >> --- Dummy 2011-06-30 17:43:37.000000000 +0900 >> +++ Dummy-stop-NG 2012-02-28 19:11:12.850207767 +0900 >> @@ -108,6 +108,8 @@ dummy_start() { >> } >> >> dummy_stop() { >> + exit $OCF_ERR_GENERIC >> + >> dummy_monitor >> if [ $? = $OCF_SUCCESS ]; then >> rm ${OCF_RESKEY_state} >> >> >> >> Before the test, the resource is running on "bl460g6a". >> >> # crm_simulate -S -x pe-input-1.bz2 >> >> Current cluster status: >> Online: [ bl460g6a bl460g6b ] >> >> dummy01 (ocf::heartbeat:Dummy-stop-NG): Stopped >> >> Transition Summary: >> crm_simulate[14195]: 2012/02/29_15:46:57 notice: LogActions: Start >> dummy01 (bl460g6a) >> >> Executing cluster transition: >> * Executing action 6: dummy01_monitor_0 on bl460g6b >> * Executing action 4: dummy01_monitor_0 on bl460g6a >> * Executing action 7: dummy01_start_0 on bl460g6a >> * Executing action 8: dummy01_monitor_7000 on bl460g6a >> >> Revised cluster status: >> Online: [ bl460g6a bl460g6b ] >> >> dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6a >> >> >> >> Stop Pacemaker on "bl460g6a". >> # service heartbeat stop >> >> Pacemaker tries to stop resouce and move it to "bl460g6b" at first, >> # crm_simulate -S -x pe-input-2.bz2 >> >> Current cluster status: >> Online: [ bl460g6a bl460g6b ] >> >> dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6a >> >> Transition Summary: >> crm_simulate[12195]: 2012/02/29_15:35:02 notice: LogActions: Move >> dummy01 (Started bl460g6a -> bl460g6b) >> >> Executing cluster transition: >> * Executing action 6: dummy01_stop_0 on bl460g6a >> * Executing action 7: dummy01_start_0 on bl460g6b >> * Executing action 8: dummy01_monitor_7000 on bl460g6b >> >> Revised cluster status: >> Online: [ bl460g6a bl460g6b ] >> >> dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6b >> >> >> >> but this action will fail, it means the resource goes into unmanaged state. >> # crm_simulate -S -x pe-input-3.bz2 >> >> Current cluster status: >> Online: [ bl460g6a bl460g6b ] >> >> dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6a >> (unmanaged) FAILED >> >> Transition Summary: >> >> Executing cluster transition: >> >> Revised cluster status: >> Online: [ bl460g6a bl460g6b ] >> >> dummy01 (ocf::heartbeat:Dummy-stop-NG): Started bl460g6a >> (unmanaged) FAILED >> >> >> >> Pacemaker shutdown on "bl460g6a" becomes successful, >> it seems that the following patch works well. >> https://github.com/ClusterLabs/pacemaker/commit/07976fe5eb04c432f1d1c9aebb1b1587ba7f0bcf#pengine/graph.c >> >> At this time, the resource on "bl460g6a" (pacemaker already shutdowns) >> might be running because it fails to stop. >> In fact, the resource didn't start on "bl460g6b" after its stop NG and >> "bl460g6a"'s shutdown, and this is an expectable behavior, >> but I could start it on "bl460g6b" with crm command. >> This holds the potential for the unexpected active/active status. >> Is it possible to prevent it's start in this situation? >> for example, >> (1) Dummy runs on node-a >> (2) Shutdown Pacemaker on node-a, and Dummy stop NG >> (3) Dummy can not run on other nodes >> (4) * cleanup the unmanaged status of Dummy after checking it's manual >> operation on node-a >> (5) * start Dummy on other nodes >> This can be the safe way. >> >> See attached hb_report. >> >> Thanks, >> Junko IKEDA >> >> NTT DATA INTELLILINK CORPORATION _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org