Hi David, I think you might be looking at the wrong part of the logs. I assume the line you meant was the following:
<1c>Feb 6 09:52:54 mu attrd[6256]: warning: attrd_cib_callback: Update fail-count-sub-squid=(null) failed: No such device or address Despite this failure, the recovery worked correctly and the resources were started then (as can be seen when examining the pe-input files 50-54). What I had meant was the entire portion of the logs between 09:37:52 and 09:37:53 (pe-input files 46-49). There the state when the CRM returns to idle isn't that which ought to have been achieved given the transitions 106-109. I don't yet understand well enough how the CRM will decide whether to perform an action or not, additionally I can't seem to get any debug logs from pacemaker which might help with understanding why the CRM/LRM decides to do what it does. Regards, James On Feb 6, 2013, at 8:14 PM, David Vossel <dvos...@redhat.com> wrote: > > > ----- Original Message ----- >> From: "James Guthrie" <j...@open.ch> >> To: "The Pacemaker cluster resource manager" <pacemaker@oss.clusterlabs.org> >> Sent: Wednesday, February 6, 2013 6:52:07 AM >> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour >> >> A quick addendum to this message: >> >> The log files I provided actually continue until the resources do get >> started on the host. The trigger for that is the 6-minute >> failure-timeout timer that pops. As can be seen in pe-input-50, the >> resources conntrackd, condition, sub-ospfd and sub-ripd are in slave >> on both hosts and sub-squid is not started on either. This shows >> that the desired end-state of the transitions produced with >> pe-input-49 is never reached. >> > > Yep, This looks like a bug in attrd. I see the command going out to delete > the fail-count for squid, but it fails. Since the fail-count isn't properly > expired that sub-squid device can't start. > > Can you open a bugs.clusterlabs.org issue for this please. Include the logs. > > Thanks, > -- Vossel > >> James >> >> On Feb 6, 2013, at 1:41 PM, James Guthrie <j...@open.ch> wrote: >> >>> Hi David, >>> >>> Unfortunately crm_report doesn't work correctly on my hosts as we >>> have compiled from source with custom paths and apparently the >>> crm_report and associated tools are not built to use the paths >>> that can be customised with autoconf. >>> >>> Despite that, I have done some investigation and think I may have >>> found an inconsistency. I have attached the pacemaker-relevant >>> syslog, including the pe-input files. The logfile starts where >>> pacemaker detects that sub-squid is not running on mu. It then >>> fails over to nu, where two further failures take place. In order >>> to recover from these failures, the pengine produces transitions >>> 106, 107, 108 and 109, with the corresponding pe-input files 46, >>> 47, 48 and 49. >>> >>> The way I understand it, pacemaker works through the transitions >>> until something happens from outside, at which point the >>> transitions are recalculated and pacemaker continues on. >>> >>> Using crm_simulate to observe the transitions that should happen >>> tells me that the transitions that were calculated from >>> pe-input-49 ought to have resulted in the resources conntrackd, >>> condition, sub-ospfd, sub-ripd and sub-squid being promote to >>> master. In fact, this never happens, but the crmd reports the >>> transition as being complete. It appears as though nowhere is it >>> acknowledged that the current state is not the desired outcome as >>> calculated by the pengine. Is it possible that this is a bug? >>> >>> Regards, >>> James >>> >>> <pacemaker-not-starting-resources.tar.gz> >>> On Feb 5, 2013, at 7:41 PM, David Vossel <dvos...@redhat.com> >>> wrote: >>> >>>> >>>> >>>> ----- Original Message ----- >>>>> From: "James Guthrie" <j...@open.ch> >>>>> To: "The Pacemaker cluster resource manager" >>>>> <pacemaker@oss.clusterlabs.org> >>>>> Sent: Tuesday, February 5, 2013 8:12:57 AM >>>>> Subject: Re: [Pacemaker] Pacemaker resource migration behaviour >>>>> >>>>> Hi all, >>>>> >>>>> as a follow-up to this, I realised that I needed to slightly >>>>> change >>>>> the way the resource constraints are put together, but I'm still >>>>> seeing the same behaviour. >>>>> >>> >>>>> Below are an excerpt from the logs on the host and the revised >>>>> xml >>>>> configuration. In this case, I caused two failures on the host >>>>> mu, >>>>> which forced the resources onto nu then I forced two failures on >>>>> nu. >>>>> What can be seen in the logs are the two detected failures on nu >>>>> (the "warning: update_failcount:" lines). After the two failures >>>>> on >>>>> nu, the VIP is migrated back to mu, but none of the "support" >>>>> resources are promoted with it. >>>> >>>> I can't tell much from this output. >>>> >>>> Run the steps you use to reproduce this and create a crm_report of >>>> the issue so we can see both the logs and pengine transition >>>> files that proceed this. >>>> >>>> -- Vossel >>>> >>>> >>>>> Regards, >>>>> James >>>>> >>>> >>>> _______________________________________________ >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>> >>>> Project Home: http://www.clusterlabs.org >>>> Getting started: >>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>> Bugs: http://bugs.clusterlabs.org >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org