Zitat von Dejan Muhamedagic <deja...@fastmail.fm>:
Hi,
On Mon, Nov 29, 2010 at 02:42:42PM +0100, Uwe Grawert wrote:
Was: Re: [Pacemaker] crm resource restart doesn't restart the
correct resource
Zitat von Dejan Muhamedagic <deja...@fastmail.fm>:
This is happening, because, when the clone is created,
pacemaker stops the primitive but does not wait for the stop action
to return, and just starts the primitive over. And that off course
causes problems.
Hmm, don't quite understand what is going on. Is that primitive
part of the group? Can you describe in more detail what is going
on.
I have a group (grp_fs) consisting of a LVM and several Filesystem
resources, in that order. That group is started and all resources are
running. Now I do clone this group by issuing:
crm configure clone clo_fs grp_fs
That does stop all resources and starts them again as clone. But
Pacemaker does not seem to wait until the stop action has finished. I
have modified the LVM RA to log the action command issued to the agent
and the value returned by the agent:
14:24:11 [ 14495 ] Action: start
14:24:11 [ 14494 ] Action: stop
14:24:13 [ 14494 ] RC: 1
14:24:14 [ 14495 ] RC: 0
14:24:14 [ 14599 ] Action: monitor
14:24:14 [ 14599 ] RC: 0
In brackets you see the PID. As can be seen, Pacemaker first issues a
start command and then immediately a stop afterwards, not waiting for
the first command to return. That produces an orphan resource. That
involves that the state of the LVM resource (which is now cloned) is
uncertain. It can happen to start but it can also fail.
I see. The problem here is that as far as the cluster's
concerned, the new resources and the old resources are
unrelated: they have different names (before it was say lvm1 and
now it's lvm1:0). I'm not sure if the crmd/pengine can tell if
the resources of the group which are running actually belong to
the cloned group as well. Andrew? If not, then we'll have to
forbid creating a clone of running resources in the shell.
Ok, if it is going to be forbidden to clone a running resource, there
is a problem with groups. A stopped primitive is getting its
target-role property cleared when cloned. A group does not! If I stop
a group, make a clone and try to start the clone, nothing happens
until the target-role="stopped" is cleared manually from the CIB.
Stopping a primitive in that group (say the first one) has the same
effect. As long as some resource or group in the clone has the
target-role property set, nothing will happen.
_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker