On Fri, 11 Apr 2014 17:17:57 +1000 Andrew Beekhof <and...@beekhof.net> wrote:
> > On 8 Apr 2014, at 8:37 pm, ma...@nucleus.it wrote: > > > On Tue, 8 Apr 2014 10:49:16 +1000 > > Andrew Beekhof <and...@beekhof.net> wrote: > > > >> > >> On 7 Apr 2014, at 8:46 pm, ma...@nucleus.it wrote: > >> > >>> Hi, > >>> in a production environment with 2 nodes ( nodeA , nodeB ) we had > >>> an hardware failure so we restart the nodeB. > >>> After the restarted nodeB came up we restart corosync/pacemaker on > >>> it but for 2 days till now che corosync/pacemaker stuff is > >>> looping. > >>> > >>> crm_mon NodeA: > >>> > >>> Stack: openais > >>> Current DC: nodeA - partition with quorum > >>> Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3 > >>> 2 Nodes configured, 2 expected votes > >>> 17 Resources configured. > >>> ============ > >>> > >>> Online: [ nodeA ] > >>> OFFLINE: [ nodeB ] > >>> > >>> > >>> crm_mon NodeB: > >>> > >>> Stack: openais > >>> Current DC: NONE > >>> 2 Nodes configured, 2 expected votes > >>> 17 Resources configured. > >>> ============ > >>> > >>> OFFLINE: [ nodeA nodeB ] > >>> > >>> This loop on nodeB reports: > >>> crmd: [7149]: debug: do_election_count_vote: Election 3 (owner: > >>> nodeA) lost: vote from nodeA (Age) > >>> > >>> So investigating around i found these message on nodeA: > >>> cib: [28755]: ERROR: send_ais_message: Not connected to AIS > >>> > >>> now this message is repeating for every operation. > >>> Is it a corosync problem or a cib/pacemaker one ? > >>> Any suggestion on what is happened ? > >> > >> For some reason the cib can't connect to corosync anymore. > >> No software got upgraded recently? > >> > >> Are there any logs from corosync? > >> Which distro is this? > >> > >>> And why the start of a cluster node crasched the DC suff ? :( > >>> > >>> > >>> Bye Marco > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: > >>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: > >>> http://bugs.clusterlabs.org > >> > > > > Hi, > > the distro in an opensuse 11.1 and there is no updates also because > > the distro is out of maintenance. > > A good reason to be using SLES (or RHEL/CentOS). Better Gentoo ;) > > > We are planning and upgrade but the interesting thing is to figure > > out the reasons of the problem. > > The log in attachment, thanks for the support > > There's nothing obvious in the logs. Just that as far as pacemaker > could tell, corosync suddenly went away. Was the corosync process > still running? > Yes , corosync was still running . _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org