On Wed, Jan 14, 2009 at 09:59, <renayama19661...@ybb.ne.jp> wrote: > Hi, > >> > 1)I make it the state that a resource starts in a standby node. >> > 2)I change it so that a stop error occurs in a dummy resource. >> > 3)I generate the monitor error of the dummy resource in a standby >> > node. >> > 4)After a stop error, STONITH is carried out by a partner node. >> > 5)Keep STONITH from a standby node waiting. >> > 6)While STONITH is not completed, I reboot a standby node. >> >> Is this in a two-node cluster? > Yes. > >> > Though STONITH from a DC node does not succeed, a resource is started. >> > When STONITH did not succeed, the resource was not started at a non- >> > DC node. >> >> I don't understand what you're saying here. >> The first statement says a resource was started and the second says it >> wasn't... they can't both be true. > > I'm sorry. > It caused misunderstanding. > > It is time when STONITH is carried out in the environment of two nodes by a > standby node. > > A resource is started without waiting for completion of STONITH from a DC > node. > While STONITH is not completed, this problem happens if an active node fell.
So let me see if I understand this correctly... You start with two healthy nodes. You cause a resource on A to fail, at which point B tries to shoot it. The stonith op never completes and before it times out, you restart B. Resources get started on B. Questions: Is the above accurate? Is only the dummy resource started, or are other ones started too? When B comes up again, does it form a two-node cluster with A? Is A still up or has it become the DC and shot itself? > > I confirmed the same confirmation based on OpenAIS. > However, in OpenAIS, the same problem did not occur. > In OpenAIS, the start of the resource is evaded well. Sorry, parsing error... I can't tell if you're saying the problem also exists for clusters based on OpenAIS. I think you're saying it does not happen if you use OpenAIS instead of Heartbeat. > > --- Andrew Beekhof <beek...@gmail.com> wrote: > >> >> On Jan 14, 2009, at 2:52 AM, <renayama19661...@ybb.ne.jp> >> <renayama19661...@ybb.ne.jp >> > wrote: >> >> > Hi, >> > >> > About movement of STONITH, I tested it. >> > (heartbeat 2.99.2 + Pacemaker-1-0-6fd0eebd186e.tar.gz on >> > RHEL5.2(i386VM)) >> > >> > When what I confirmed carries out STONITH from a DC node and a non- >> > DC node. >> > >> > I confirmed it in the next flow. >> > >> > 1)I make it the state that a resource starts in a standby node. >> > 2)I change it so that a stop error occurs in a dummy resource. >> > 3)I generate the monitor error of the dummy resource in a standby >> > node. >> > 4)After a stop error, STONITH is carried out by a partner node. >> > 5)Keep STONITH from a standby node waiting. >> > 6)While STONITH is not completed, I reboot a standby node. >> >> Is this in a two-node cluster? >> >> > I watched log. >> >> > >> > Though STONITH from a DC node does not succeed, a resource is started. >> > When STONITH did not succeed, the resource was not started at a non- >> > DC node. >> >> I don't understand what you're saying here. >> The first statement says a resource was started and the second says it >> wasn't... they can't both be true. >> >> > >> > >> > --------------------------------------------------------------------------- >> > Jan 13 16:01:25 ais-1 crmd: [6003]: info: send_rsc_command: >> > Initiating action 7: start >> > prmDummy1_start_0 on ais-1 >> > --------------------------------------------------------------------------- >> > >> > When STONITH did not succeed, I thought that the resource did not >> > start. >> > Does not the behavior when STONITH failed from a DC node have a >> > problem? >> > >> > I attach a result of hb_report. >> > - stonith_exec_dc.tar.gz (A result when STONITH was carried out by a >> > DC node(ais-1)) >> > - stonith_exec_nodc.tar.gz(A result when STONITH was carried out by >> > a non-DC node(ais-1)) _______________________________________________ Pacemaker mailing list Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker