Evidently this is something that has since been fixed. In your logs pe-input-47 results in:
<1d>Feb 6 09:37:52 mu pengine[6257]: notice: LogActions: Demote conntrackd:1 (Master -> Slave nu)\ <1d>Feb 6 09:37:52 mu pengine[6257]: notice: LogActions: Demote condition:1 (Master -> Slave nu)\ <1d>Feb 6 09:37:52 mu pengine[6257]: notice: LogActions: Demote sub-ospfd:1 (Master -> Slave nu)\ <1d>Feb 6 09:37:52 mu pengine[6257]: notice: LogActions: Demote sub-ripd:1 (Master -> Slave nu)\ <1d>Feb 6 09:37:52 mu pengine[6257]: notice: LogActions: Demote sub-squid:0 (Master -> Stopped nu)\ <1d>Feb 6 09:37:52 mu pengine[6257]: notice: LogActions: Move eth1-0-192.168.1.10 (Started nu -> mu)\ <1d>Feb 6 09:37:52 mu pengine[6257]: notice: process_pe_message: Calculated Transition 107: /opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-47.bz2\ Testing with the latest code shows: Transition Summary: * Promote conntrackd:0 (Slave -> Master mu) * Demote conntrackd:1 (Master -> Slave nu) * Promote condition:0 (Slave -> Master mu) * Demote condition:1 (Master -> Slave nu) * Promote sub-ospfd:0 (Slave -> Master mu) * Demote sub-ospfd:1 (Master -> Slave nu) * Promote sub-ripd:0 (Slave -> Master mu) * Demote sub-ripd:1 (Master -> Slave nu) * Demote sub-squid:0 (Master -> Slave nu) * Start sub-squid:1 (mu) * Promote sub-squid:1 (Stopped -> Master mu) * Move eth1-0-192.168.1.10 (Started nu -> mu) Which looks more like what you're after. I'm still very confused about why you're using master/slave though. On Wed, Feb 6, 2013 at 11:41 PM, James Guthrie <j...@open.ch> wrote: > Hi David, > > Unfortunately crm_report doesn't work correctly on my hosts as we have > compiled from source with custom paths and apparently the crm_report and > associated tools are not built to use the paths that can be customised with > autoconf. > > Despite that, I have done some investigation and think I may have found an > inconsistency. I have attached the pacemaker-relevant syslog, including the > pe-input files. The logfile starts where pacemaker detects that sub-squid is > not running on mu. It then fails over to nu, where two further failures take > place. In order to recover from these failures, the pengine produces > transitions 106, 107, 108 and 109, with the corresponding pe-input files 46, > 47, 48 and 49. > > The way I understand it, pacemaker works through the transitions until > something happens from outside, at which point the transitions are > recalculated and pacemaker continues on. > > Using crm_simulate to observe the transitions that should happen tells me > that the transitions that were calculated from pe-input-49 ought to have > resulted in the resources conntrackd, condition, sub-ospfd, sub-ripd and > sub-squid being promote to master. In fact, this never happens, but the crmd > reports the transition as being complete. It appears as though nowhere is it > acknowledged that the current state is not the desired outcome as calculated > by the pengine. Is it possible that this is a bug? Not really, it means something* happened that we didn't expect. Pacemaker stops the current transition** and automatically asks the pengine for another set of calculations. * sub-squid failing by the looks of it <1c>Feb 6 09:37:52 mu crmd[6258]: warning: update_failcount: Updating failcount for sub-squid on nu after failed monitor: rc=9 (update=value++, time=1360139872)\ ** Thats what this line is, notice the Skipped=15: <1d>Feb 6 09:37:52 mu crmd[6258]: notice: run_graph: Transition 107 (Complete=21, Pending=0, Fired=0, Skipped=15, Incomplete=6, Source=/opt/OSAGpcmk/pcmk/var/lib/pacemaker/pengine/pe-input-47.bz2): Stopped\ > > Regards, > James > > > > On Feb 5, 2013, at 7:41 PM, David Vossel <dvos...@redhat.com> wrote: > >> >> >> ----- Original Message ----- >>> From: "James Guthrie" <j...@open.ch> >>> To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> >>> Sent: Tuesday, February 5, 2013 8:12:57 AM >>> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour >>> >>> Hi all, >>> >>> as a follow-up to this, I realised that I needed to slightly change >>> the way the resource constraints are put together, but I'm still >>> seeing the same behaviour. >>> > >>> Below are an excerpt from the logs on the host and the revised xml >>> configuration. In this case, I caused two failures on the host mu, >>> which forced the resources onto nu then I forced two failures on nu. >>> What can be seen in the logs are the two detected failures on nu >>> (the "warning: update_failcount:" lines). After the two failures on >>> nu, the VIP is migrated back to mu, but none of the "support" >>> resources are promoted with it. >> >> I can't tell much from this output. >> >> Run the steps you use to reproduce this and create a crm_report of the issue >> so we can see both the logs and pengine transition files that proceed this. >> >> -- Vossel >> >> >>> Regards, >>> James >>> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org