Thanks! But there is still a problem. I am now working from the master branch and building RPMs (well, I have to also rebuild from the srpm to change the build number, since the RPMs built directly are always 1.1.10-1). The patch is in the git log, and indeed things are better ... But I still see the spurious VMs shutting down. What is much improved is that they do get restarted, and basically I end up in the state I want to be. Can almost live with this, and I was going to start changing my cluster config to be asymmetric when I noticed the in the midst of the spurious transitions, crmd is dumping core.
So I'll append another crm_report to bug 5164, as well as a gdb traceback. On Fri, Jul 5, 2013 at 5:06 PM, David Vossel <dvos...@redhat.com> wrote: > ----- Original Message ----- > > From: "David Vossel" <dvos...@redhat.com> > > To: "The Pacemaker cluster resource manager" < > pacemaker@oss.clusterlabs.org> > > Sent: Wednesday, July 3, 2013 4:20:37 PM > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes > > > > ----- Original Message ----- > > > From: "Lindsay Todd" <rltodd....@gmail.com> > > > To: "The Pacemaker cluster resource manager" > > > <pacemaker@oss.clusterlabs.org> > > > Sent: Wednesday, July 3, 2013 2:12:05 PM > > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes > > > > > > Well, I'm not getting failures right now simply with attributes, but I > can > > > induce a failure by stopping the vm-db02 (it puts db02 into an unclean > > > state, and attempts to migrate the unrelated vm-compute-test). I've > > > collected the commands from my latest interactions, a crm_report, and > a gdb > > > traceback from the core file that crmd dumped, into bug 5164. > > > > > > Thanks, hopefully I can start investigating this Friday > > > > -- Vossel > > Yeah, this is a bad one. Adding the node attributes using crm_attribute > for the remote-node did some unexpected things to the crmd component. > Somehow the remote-node was getting entered into the cluster node cache... > which made it look like we had both a cluster-node and remote-node named > the same thing... not good. > > I think I got that part worked out. Try this patch. > > > https://github.com/ClusterLabs/pacemaker/commit/67dfff76d632f1796c9ded8fd367aa49258c8c32 > > Rather than trying to patch RCs, it might be worth trying out the master > branch on github (which already has this patch). If you aren't already, > use rpms to make your life easier. Running 'make rpm' in the source > directory will generate them for you. > > There was another bug fixed recently in pacemaker_remote involving the > directory created for resource agents to store their temporary data (stuff > like pid files). I believe the fix was not introduced until 1.1.10rc6. > > -- Vossel > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org