I don't really know when the trouble started. I ended up restarting pacemaker on all nodes, and it cleared things up. I'm not sure why, though. If I have the same issue come up, I'll run the crm_report and open a bug.
Thanks, Paul On Wed, Nov 7, 2012 at 9:22 PM, Andrew Beekhof <and...@beekhof.net> wrote: > On Thu, Nov 8, 2012 at 1:55 PM, Paul Archer <p...@paularcher.org> wrote: >> I'm fairly new to pacemaker, and this is hurting my head. >> I have a four-node cluster, and one of my nodes (for no reason that I >> can discern) has gone offline, and I can't get it to come back online. >> >> Offline node: >> root@vmhost2:/var/lib/heartbeat# crm_mon -1 >> ============ >> Last updated: Wed Nov 7 20:52:16 2012 >> Last change: Wed Nov 7 20:28:06 2012 via cibadmin on vmhost2 >> Stack: openais >> Current DC: NONE >> 4 Nodes configured, 4 expected votes >> 7 Resources configured. >> ============ >> >> OFFLINE: [ vgs1 vgs2 vmhost1 vmhost2 ] >> >> >> >> One of the online nodes: >> root@vmhost1:/var/lib/heartbeat/crm# crm_mon -1 >> ============ >> Last updated: Wed Nov 7 20:45:32 2012 >> Last change: Wed Nov 7 20:44:59 2012 via crm_attribute on vgs2 >> Stack: openais >> Current DC: vgs1 - partition with quorum >> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c >> 4 Nodes configured, 4 expected votes >> 7 Resources configured. >> ============ >> >> Node vmhost2: standby >> Online: [ vgs1 vgs2 vmhost1 ] >> >> focus (ocf::heartbeat:VirtualDomain): Started vmhost1 >> logger (ocf::heartbeat:VirtualDomain): Started vmhost1 >> mother (ocf::heartbeat:VirtualDomain): Started vmhost2 >> vgsIP (ocf::heartbeat:IPaddr2): Started vgs2 >> vgsWebServer (ocf::heartbeat:apache): Started vgs2 >> >> >> I don't know what's relevant as far as log files, so I will post as >> people ask for specifics, rather than just dumping everything here to >> start with. > > You should have crm_report and/or hb_report. > Use it to gather everything from around about the time the node went offline. > Probably best to open a bug at http://bugs.clusterlabs.org and attach > the resulting tarball there. > > If the cluster is still in this state, it would also be useful to see > the corosync-objctl -a output from vgs1 and vmhost2. > As well as the output from cibadmin -Ql from vgs1. > >> >> >> Thanks for any help, >> >> Paul >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org