Hmm, I'll still submit the bug report, but it seems like crmd is dumping core while attempting to fence a node. If I use fence_node to fence a real cluster node, that also causes crmd to dump core. But apart from that, I don't really see why pacemaker is trying to fence anything.
On Wed, Jul 10, 2013 at 12:42 PM, Lindsay Todd <rltodd....@gmail.com> wrote: > Thanks! But there is still a problem. > > I am now working from the master branch and building RPMs (well, I have to > also rebuild from the srpm to change the build number, since the RPMs built > directly are always 1.1.10-1). The patch is in the git log, and indeed > things are better ... But I still see the spurious VMs shutting down. > What is much improved is that they do get restarted, and basically I end > up in the state I want to be. Can almost live with this, and I was going > to start changing my cluster config to be asymmetric when I noticed the in > the midst of the spurious transitions, crmd is dumping core. > > So I'll append another crm_report to bug 5164, as well as a gdb traceback. > > > On Fri, Jul 5, 2013 at 5:06 PM, David Vossel <dvos...@redhat.com> wrote: > >> ----- Original Message ----- >> > From: "David Vossel" <dvos...@redhat.com> >> > To: "The Pacemaker cluster resource manager" < >> pacemaker@oss.clusterlabs.org> >> > Sent: Wednesday, July 3, 2013 4:20:37 PM >> > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and attributes >> > >> > ----- Original Message ----- >> > > From: "Lindsay Todd" <rltodd....@gmail.com> >> > > To: "The Pacemaker cluster resource manager" >> > > <pacemaker@oss.clusterlabs.org> >> > > Sent: Wednesday, July 3, 2013 2:12:05 PM >> > > Subject: Re: [Pacemaker] Pacemaker remote nodes, naming, and >> attributes >> > > >> > > Well, I'm not getting failures right now simply with attributes, but >> I can >> > > induce a failure by stopping the vm-db02 (it puts db02 into an unclean >> > > state, and attempts to migrate the unrelated vm-compute-test). I've >> > > collected the commands from my latest interactions, a crm_report, and >> a gdb >> > > traceback from the core file that crmd dumped, into bug 5164. >> > >> > >> > Thanks, hopefully I can start investigating this Friday >> > >> > -- Vossel >> >> Yeah, this is a bad one. Adding the node attributes using crm_attribute >> for the remote-node did some unexpected things to the crmd component. >> Somehow the remote-node was getting entered into the cluster node cache... >> which made it look like we had both a cluster-node and remote-node named >> the same thing... not good. >> >> I think I got that part worked out. Try this patch. >> >> >> https://github.com/ClusterLabs/pacemaker/commit/67dfff76d632f1796c9ded8fd367aa49258c8c32 >> >> Rather than trying to patch RCs, it might be worth trying out the master >> branch on github (which already has this patch). If you aren't already, >> use rpms to make your life easier. Running 'make rpm' in the source >> directory will generate them for you. >> >> There was another bug fixed recently in pacemaker_remote involving the >> directory created for resource agents to store their temporary data (stuff >> like pid files). I believe the fix was not introduced until 1.1.10rc6. >> >> -- Vossel >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org