Re: [Pacemaker] master-slave set staggered restarts

Andrew Beekhof Thu, 06 Mar 2014 18:59:26 -0800

On 7 Mar 2014, at 2:39 am, Jay Janssen <jay.jans...@percona.com> wrote:


> 
> primitive p_service ... \
>        op monitor interval="2s" role="Master" \
>        op monitor interval="5s" role="Slave" \
>        op start timeout="10000s" interval="0"
> ms ms_service p_service \
>        meta master-max="3" clone-max="3" target-role="Started" 
> is-managed="true" ordered="false" interleave="true" notify="false"
> 
> In my case I have all three nodes happily in the ‘master’ state and for a 
> test I simultaneously cause the underlying service to fail on all of them.  
> In all cases, the next monitor operation returns OCF_FAILED_MASTER.  
> Subsequent monitor checks will then return OCF_ERR_NOT_RUNNING, since the 
> node falls out of the Master state. 
> 
> I want all the resource clones to issue the start operation more or less at 
> the same time (hence ordered=”false”) so I can use the start operation to 
> coordination amongst the nodes as they start (true start order is important 
> and depends on node state, so I’m overloading the start action for this 
> coordination in the all-down state).  For the most part, this is working fine.
> 
> However, I seem to be getting into a race condition where a node (seemingly 
> the one that happens to detect OCF_FAILED_MASTER last) ends up NOT starting 
> until AFTER the others start.   Two nodes get ‘start’ issued at the same time 
> like I want, but the 3rd gets stuck still monitoring ( and returning errors) 
> until after the others start.  Once the others are up, then the 3rd finally 
> unblocks and starts.

I think I'd want to see a crm_report for this, that will have the logs and PE 
files we'd need to diagnose any issue.

> 
> Is this expected behavior in Pacemaker?   Is there any way I can suppress 
> this behavior and get nodes to start irrespective of long start operations on 
> other nodes? 
> 
> 
> FWIW, if someone can explain to me how I can enforce start action order in a 
> group like this in a more Pacemaker friendly way, I’m all ears.  
> 
> 
> 
> Jay Janssen
> http://about.me/jay.janssen
> 
> 
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] master-slave set staggered restarts

Reply via email to