Re: [Pacemaker] Orphan problem when creating a clone of a group

Uwe Grawert Mon, 29 Nov 2010 09:11:37 -0800

Zitat von Dejan Muhamedagic <deja...@fastmail.fm>:

Hi,

On Mon, Nov 29, 2010 at 02:42:42PM +0100, Uwe Grawert wrote:

Was: Re: [Pacemaker] crm resource restart doesn't restart thecorrect resource


Zitat von Dejan Muhamedagic <deja...@fastmail.fm>:

This is happening, because, when the clone is created,
pacemaker stops the primitive but does not wait for the stop action
to return, and just starts the primitive over. And that off course
causes problems.


Hmm, don't quite understand what is going on. Is that primitive
part of the group? Can you describe in more detail what is going
on.


I have a group (grp_fs) consisting of a LVM and several Filesystem
resources, in that order. That group is started and all resources are
running. Now I do clone this group by issuing:

crm configure clone clo_fs grp_fs

That does stop all resources and starts them again as clone. But
Pacemaker does not seem to wait until the stop action has finished. I
have modified the LVM RA to log the action command issued to the agent
and the value returned by the agent:

14:24:11 [ 14495 ] Action: start
14:24:11 [ 14494 ] Action: stop
14:24:13 [ 14494 ] RC: 1
14:24:14 [ 14495 ] RC: 0
14:24:14 [ 14599 ] Action: monitor
14:24:14 [ 14599 ] RC: 0

In brackets you see the PID. As can be seen, Pacemaker first issues a
start command and then immediately a stop afterwards, not waiting for
the first command to return. That produces an orphan resource. That
involves that the state of the LVM resource (which is now cloned) is
uncertain. It can happen to start but it can also fail.


I see. The problem here is that as far as the cluster's
concerned, the new resources and the old resources are
unrelated: they have different names (before it was say lvm1 and
now it's lvm1:0). I'm not sure if the crmd/pengine can tell if
the resources of the group which are running actually belong to
the cloned group as well. Andrew? If not, then we'll have to
forbid creating a clone of running resources in the shell.

Ok, if it is going to be forbidden to clone a running resource, thereis a problem with groups. A stopped primitive is getting itstarget-role property cleared when cloned. A group does not! If I stopa group, make a clone and try to start the clone, nothing happensuntil the target-role="stopped" is cleared manually from the CIB.Stopping a primitive in that group (say the first one) has the sameeffect. As long as some resource or group in the clone has thetarget-role property set, nothing will happen.





_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Re: [Pacemaker] Orphan problem when creating a clone of a group

Reply via email to