On Thu, Nov 8, 2012 at 2:56 PM, Paul Archer <p...@paularcher.org> wrote: > I don't really know when the trouble started. > I ended up restarting pacemaker on all nodes, and it cleared things > up. I'm not sure why, though.
You /may/ have been experiencing a known membership issue in older versions of pacemaker and corosync. But I can't say for sure based on your email. I'd highly encourage an upgrade of at least corosync. > If I have the same issue come up, I'll run the crm_report and open a bug. Great. > > Thanks, > > Paul > > On Wed, Nov 7, 2012 at 9:22 PM, Andrew Beekhof <and...@beekhof.net> wrote: >> On Thu, Nov 8, 2012 at 1:55 PM, Paul Archer <p...@paularcher.org> wrote: >>> I'm fairly new to pacemaker, and this is hurting my head. >>> I have a four-node cluster, and one of my nodes (for no reason that I >>> can discern) has gone offline, and I can't get it to come back online. >>> >>> Offline node: >>> root@vmhost2:/var/lib/heartbeat# crm_mon -1 >>> ============ >>> Last updated: Wed Nov 7 20:52:16 2012 >>> Last change: Wed Nov 7 20:28:06 2012 via cibadmin on vmhost2 >>> Stack: openais >>> Current DC: NONE >>> 4 Nodes configured, 4 expected votes >>> 7 Resources configured. >>> ============ >>> >>> OFFLINE: [ vgs1 vgs2 vmhost1 vmhost2 ] >>> >>> >>> >>> One of the online nodes: >>> root@vmhost1:/var/lib/heartbeat/crm# crm_mon -1 >>> ============ >>> Last updated: Wed Nov 7 20:45:32 2012 >>> Last change: Wed Nov 7 20:44:59 2012 via crm_attribute on vgs2 >>> Stack: openais >>> Current DC: vgs1 - partition with quorum >>> Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c >>> 4 Nodes configured, 4 expected votes >>> 7 Resources configured. >>> ============ >>> >>> Node vmhost2: standby >>> Online: [ vgs1 vgs2 vmhost1 ] >>> >>> focus (ocf::heartbeat:VirtualDomain): Started vmhost1 >>> logger (ocf::heartbeat:VirtualDomain): Started vmhost1 >>> mother (ocf::heartbeat:VirtualDomain): Started vmhost2 >>> vgsIP (ocf::heartbeat:IPaddr2): Started vgs2 >>> vgsWebServer (ocf::heartbeat:apache): Started vgs2 >>> >>> >>> I don't know what's relevant as far as log files, so I will post as >>> people ask for specifics, rather than just dumping everything here to >>> start with. >> >> You should have crm_report and/or hb_report. >> Use it to gather everything from around about the time the node went offline. >> Probably best to open a bug at http://bugs.clusterlabs.org and attach >> the resulting tarball there. >> >> If the cluster is still in this state, it would also be useful to see >> the corosync-objctl -a output from vgs1 and vmhost2. >> As well as the output from cibadmin -Ql from vgs1. >> >>> >>> >>> Thanks for any help, >>> >>> Paul >>> >>> _______________________________________________ >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>> >>> Project Home: http://www.clusterlabs.org >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>> Bugs: http://bugs.clusterlabs.org >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org