So, Tom ...how do you get the failed node online? I've re-installed with the same image that is running on three other nodes, but still fails. This node was quite happy for the past 3 months. As I'm testing installs, this and other nodes have been installed a significant number of times without this sort of failure. I'd whack the whole HA cluster ... except that I don't want to run into this failure again without better solution than "reinstall the system" ;-)
I'm looking at the information retuned with corosync debug enabled. After startup, everything looks fine to me until hitting this apparent local ipc delivery failure: Jan 13 10:09:10 corosync [TOTEM ] Delivering 2 to 3 Jan 13 10:09:10 corosync [TOTEM ] Delivering MCAST message with seq 3 to pending delivery queue Jan 13 10:09:10 corosync [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2) Jan 13 10:09:10 corosync [pcmk ] Msg[6486] (dest=local:crmd, from=r1lead1:crmd.11229, remote=true, size=181): <create_request_adv origin="post_cache_update" t="crmd" version="3.0.2" subt="request" ref Jan 13 10:09:10 corosync [TOTEM ] mcasted message added to pending queue Guess that I'll have to renew my acquaintance with ipc. Bob Haxo On Thu, 2011-01-13 at 19:17 +0100, Tom Tux wrote: > I don't know. I still have this issue (and it seems, that I'm not the > only one...). I'll have a look, if there are pacemaker-updates through > the zypper-update-channel available (sles11-sp1). > > Regards, > Tom > > > 2011/1/13 Bob Haxo <bh...@sgi.com>: > > Tom, others, > > > > Please, what was the solution to this issue? > > > > Thanks, > > Bob Haxo > > > > On Mon, 2010-09-06 at 09:50 +0200, Tom Tux wrote: > > > > Yes, corosync is running after the reboot. It comes up with the > > regular init-procedure (runlevel 3 in my case). > > > > 2010/9/6 Andrew Beekhof <and...@beekhof.net>: > >> On Mon, Sep 6, 2010 at 7:57 AM, Tom Tux <tomtu...@gmail.com> wrote: > >>> No, I don't have such failed-messages. In my case, the "Connection to > >>> our AIS plugin" was established. > >>> > >>> The /dev/shm is also not full. > >> > >> Is corosync running? > >> > >>> Kind regards, > >>> Tom > >>> > >>> 2010/9/3 Michael Smith <msm...@cbnco.com>: > >>>> Tom Tux wrote: > >>>> > >>>>> If I disjoin one clusternode (node01) for maintenance-purposes > >>>>> (/etc/init.d/openais stop) and reboot this node, then it will not join > >>>>> himself automatically into the cluster. After the reboot, I have the > >>>>> following error- and warn-messages in the log: > >>>>> > >>>>> Sep 3 07:34:15 node01 mgmtd: [9202]: info: login to cib failed: live > >>>> > >>>> Do you have messages like this, too? > >>>> > >>>> Aug 30 15:48:10 xen-test1 corosync[5851]: [IPC ] Invalid IPC > >>>> credentials. > >>>> Aug 30 15:48:10 xen-test1 cib: [5858]: info: init_ais_connection: > >>>> Connection to our AIS plugin (9) failed: unknown (100) > >>>> > >>>> Aug 30 15:48:10 xen-test1 cib: [5858]: CRIT: cib_init: Cannot sign in to > >>>> the cluster... terminating > >>>> > >>>> > >>>> > >>>> http://news.gmane.org/find-root.php?message_id=%3c4C7C0EC7.2050708%40cbnco.com%3e > >>>> > >>>> Mike > >>>> > >>>> _______________________________________________ > >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>>> > >>>> Project Home: http://www.clusterlabs.org > >>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>>> Bugs: > >>>> > >>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > >>>> > >>> > >>> _______________________________________________ > >>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >>> > >>> Project Home: http://www.clusterlabs.org > >>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >>> Bugs: > >>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > >>> > >> > >> _______________________________________________ > >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker > >> > >> Project Home: http://www.clusterlabs.org > >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > >> Bugs: > >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > >> > > > > _______________________________________________ > > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: > > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker