Hi Andrew, > -----Original Message----- > From: Andrew Beekhof [mailto:and...@beekhof.net] > Sent: Wednesday, April 17, 2013 2:28 PM > To: The Pacemaker cluster resource manager > Cc: shimaza...@intellilink.co.jp > Subject: Re: [Pacemaker] Question about recovery policy after "Too many > failures to fence" > > > On 11/04/2013, at 7:23 PM, Kazunori INOUE <inouek...@intellilink.co.jp> wrote: > > > Hi Andrew, > > > > (13.04.08 12:01), Andrew Beekhof wrote: > >> > >> On 27/03/2013, at 7:45 PM, Kazunori INOUE <inouek...@intellilink.co.jp> > wrote: > >> > >>> Hi, > >>> > >>> I'm using pacemaker-1.1 (c7910371a5. the latest devel). > >>> > >>> When fencing failed 10 times, S_TRANSITION_ENGINE state is kept. > >>> (related: https://github.com/ClusterLabs/pacemaker/commit/e29d2f9) > >>> > >>> How should I recover? what kind of procedure should I make S_IDLE in? > >> > >> The intention was that the node should proceed to S_IDLE when this occurs, > so you shouldn't have to do anything and the cluster would try again once the > recheck-interval expired or a config change was made. > >> > >> I assume you're saying this does not occur? > >> > > > > I recognize that the timer of cluster-recheck-interval is invalid > > between S_TRANSITION_ENGINE. > > So even if waited for a long time, it was still S_TRANSITION_ENGINE. > > * I attached crm_report. > > I think > https://github.com/beekhof/pacemaker/commit/ef8068e9 > should fix this part of the problem. >
I confirmed that this problem was fixed. Thanks!! > > > > What do I have to do in order to make the cluster retry STONITH? > > For example, I need to run 'crmadmin -E' to change config? > > > > ---- > > Best Regards, > > Kazunori INOUE > > > >>> > >>> > >>> Mar 27 15:34:34 dev2 crmd[17937]: notice: tengine_stonith_callback: > >>> Stonith operation 12/22:14:0:0927a8a0-8e09-494e-acf8-7fb273ca8c9e: > Generic > >>> Pacemaker error (-1001) > >>> Mar 27 15:34:34 dev2 crmd[17937]: notice: tengine_stonith_callback: > >>> Stonith operation 12 for dev2 failed (Generic Pacemaker error): aborting > >>> transition. > >>> Mar 27 15:34:34 dev2 crmd[17937]: info: abort_transition_graph: > >>> tengine_stonith_callback:426 - Triggered transition abort (complete=0) : > >>> Stonith failed > >>> Mar 27 15:34:34 dev2 crmd[17937]: notice: tengine_stonith_notify: Peer > >>> dev2 was not terminated (st_notify_fence) by dev1 for dev2: Generic > >>> Pacemaker error (ref=05f75ab8-34ae-4aae-bbc6-aa20dbfdc845) by client > >>> crmd.17937 > >>> Mar 27 15:34:34 dev2 crmd[17937]: notice: run_graph: Transition 14 > >>> (Complete=1, Pending=0, Fired=0, Skipped=8, Incomplete=0, > >>> Source=/var/lib/pacemaker/pengine/pe-warn-2.bz2): Stopped > >>> Mar 27 15:34:34 dev2 crmd[17937]: notice: too_many_st_failures: Too many > >>> failures to fence dev2 (11), giving up > >>> > >>> $ crmadmin -S dev2 > >>> Status of crmd@dev2: S_TRANSITION_ENGINE (ok) > >>> > >>> $ crm_mon > >>> Last updated: Wed Mar 27 15:35:12 2013 > >>> Last change: Wed Mar 27 15:33:16 2013 via cibadmin on dev1 > >>> Stack: corosync > >>> Current DC: dev2 (3232261523) - partition with quorum > >>> Version: 1.1.10-1.el6-c791037 > >>> 2 Nodes configured, unknown expected votes > >>> 3 Resources configured. > >>> > >>> > >>> Node dev2 (3232261523): UNCLEAN (online) > >>> Online: [ dev1 ] > >>> > >>> prmDummy (ocf::pacemaker:Dummy): Started dev2 FAILED > >>> Resource Group: grpStonith1 > >>> prmStonith1 (stonith:external/stonith-helper): Started > dev2 > >>> Resource Group: grpStonith2 > >>> prmStonith2 (stonith:external/stonith-helper): Started > dev1 > >>> > >>> Failed actions: > >>> prmDummy_monitor_10000 (node=dev2, call=23, rc=7, status=complete): > not > >>> running > >>> > >>> ---- > >>> Best Regards, > >>> Kazunori INOUE > >>> > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: http://bugs.clusterlabs.org > >> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: http://bugs.clusterlabs.org > > > <too-many-failures-to-fence.tar.bz2>______________________________________ > _________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org