Re: [Pacemaker] Pacemaker resource migration behaviour

Andrew Beekhof Tue, 05 Mar 2013 23:44:35 -0800

Evidently this is something that has since been fixed.

In your logs pe-input-47 results in:


<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
conntrackd:1        (Master -> Slave nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
condition:1 (Master -> Slave nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
sub-ospfd:1 (Master -> Slave nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
sub-ripd:1  (Master -> Slave nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Demote
sub-squid:0 (Master -> Stopped nu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: LogActions: Move
eth1-0-192.168.1.10 (Started nu -> mu)\
<1d>Feb  6 09:37:52 mu pengine[6257]:   notice: process_pe_message:
Calculated Transition 107:
/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-47.bz2\

Testing with the latest code shows:

Transition Summary:
 * Promote conntrackd:0 (Slave -> Master mu)
 * Demote  conntrackd:1 (Master -> Slave nu)
 * Promote condition:0  (Slave -> Master mu)
 * Demote  condition:1  (Master -> Slave nu)
 * Promote sub-ospfd:0  (Slave -> Master mu)
 * Demote  sub-ospfd:1  (Master -> Slave nu)
 * Promote sub-ripd:0   (Slave -> Master mu)
 * Demote  sub-ripd:1   (Master -> Slave nu)
 * Demote  sub-squid:0  (Master -> Slave nu)
 * Start   sub-squid:1  (mu)
 * Promote sub-squid:1  (Stopped -> Master mu)
 * Move    eth1-0-192.168.1.10  (Started nu -> mu)

Which looks more like what you're after.

I'm still very confused about why you're using master/slave though.

On Wed, Feb 6, 2013 at 11:41 PM, James Guthrie <j...@open.ch> wrote:
> Hi David,
>
> Unfortunately crm_report doesn't work correctly on my hosts as we have 
> compiled from source with custom paths and apparently the crm_report and 
> associated tools are not built to use the paths that can be customised with 
> autoconf.
>
> Despite that, I have done some investigation and think I may have found an 
> inconsistency. I have attached the pacemaker-relevant syslog, including the 
> pe-input files. The logfile starts where pacemaker detects that sub-squid is 
> not running on mu. It then fails over to nu, where two further failures take 
> place. In order to recover from these failures, the pengine produces 
> transitions 106, 107, 108 and 109, with the corresponding pe-input files 46, 
> 47, 48 and 49.
>
> The way I understand it, pacemaker works through the transitions until 
> something happens from outside, at which point the transitions are 
> recalculated and pacemaker continues on.
>
> Using crm_simulate to observe the transitions that should happen tells me 
> that the transitions that were calculated from pe-input-49 ought to have 
> resulted in the resources conntrackd, condition, sub-ospfd, sub-ripd and 
> sub-squid being promote to master. In fact, this never happens, but the crmd 
> reports the transition as being complete. It appears as though nowhere is it 
> acknowledged that the current state is not the desired outcome as calculated 
> by the pengine. Is it possible that this is a bug?

Not really, it means something* happened that we didn't expect.
Pacemaker stops the current transition** and automatically asks the
pengine for another set of calculations.


* sub-squid failing by the looks of it
<1c>Feb  6 09:37:52 mu crmd[6258]:  warning: update_failcount:
Updating failcount for sub-squid on nu after failed monitor: rc=9
(update=value++, time=1360139872)\

** Thats what this line is, notice the Skipped=15:

<1d>Feb  6 09:37:52 mu crmd[6258]:   notice: run_graph: Transition 107
(Complete=21, Pending=0, Fired=0, Skipped=15, Incomplete=6,
Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-47.bz2):
Stopped\

>
> Regards,
> James
>
>
>
> On Feb 5, 2013, at 7:41 PM, David Vossel <dvos...@redhat.com> wrote:
>
>>
>>
>> ----- Original Message -----
>>> From: "James Guthrie" <j...@open.ch>
>>> To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org>
>>> Sent: Tuesday, February 5, 2013 8:12:57 AM
>>> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour
>>>
>>> Hi all,
>>>
>>> as a follow-up to this, I realised that I needed to slightly change
>>> the way the resource constraints are put together, but I'm still
>>> seeing the same behaviour.
>>>
>
>>> Below are an excerpt from the logs on the host and the revised xml
>>> configuration. In this case, I caused two failures on the host mu,
>>> which forced the resources onto nu then I forced two failures on nu.
>>> What can be seen in the logs are the two detected failures on nu
>>> (the "warning: update_failcount:" lines). After the two failures on
>>> nu, the VIP is migrated back to mu, but none of the "support"
>>> resources are promoted with it.
>>
>> I can't tell much from this output.
>>
>> Run the steps you use to reproduce this and create a crm_report of the issue 
>> so we can see both the logs and pengine transition files that proceed this.
>>
>> -- Vossel
>>
>>
>>> Regards,
>>> James
>>>
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Pacemaker resource migration behaviour

Reply via email to