That will be tough but I'll see if I can give it a try sometime soon. Have had no luck tracking down that error so running out of other options :/
Jake -----Original Message----- From: Andrew Beekhof [mailto:and...@beekhof.net] Sent: Monday, February 23, 2015 7:43 PM To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] Pacemaker won't start after node was fenced > On 27 Jan 2015, at 5:23 pm, Jake Smith <jsm...@argotec.com> wrote: > > Had a failover of my active/passive cluster and now the passive node will > not rejoin the cluster. > > 2 nodes running Ubuntu 12.04 > coro 1.4.2-2, openais 1.1.4-4, pcmk 1.1.6-2ubuntu3 > > Corosync ring membership is fine on both rings. > > Tried stopping coro/pace and clearing /var/lib/heartbeat/crm/ and then > restarting on passive node without success. > Tried rebooting passive node (again – it was successfully fenced) > Tried updating pacemaker to latest in distro (1.1.6-2ubuntu3.3) then > went back on passive node Tried putting active node in maintenance mode > and stopping pacemaker and corosync on both nodes. Then restarting on > both nodes. Corosync came back fine as before but now I have the same > problem on both nodes with pacemaker not starting successfully. Both show > exactly same now - attrd: [24883]: ERROR: main: HA Signon failed. > > Log: > Jan 27 01:09:59 Condor crmd: [24885]: info: crmd_init: Starting crmd > Jan 27 01:09:59 Condor cib: [24881]: info: validate_with_relaxng: > Creating RNG parser context Jan 27 01:09:59 Condor lrmd: [24882]: > info: enabling coredumps Jan 27 01:09:59 Condor lrmd: [24882]: info: > Started. > Jan 27 01:09:59 Condor corosync[24778]: [IPC ] Invalid IPC > credentials. This seems to be the root of the errors. Pacemaker looks a little old, could you consider updating? > Jan 27 01:09:59 Condor attrd: [24883]: ERROR: main: HA Signon failed > Jan 27 01:09:59 Condor attrd: [24883]: ERROR: main: Aborting startup > Jan 27 01:09:59 Condor pacemakerd: [24877]: ERROR: pcmk_child_exit: > Child process attrd exited (pid=24883, rc=100) Jan 27 01:09:59 Condor > pacemakerd: [24877]: notice: pcmk_child_exit: Child process attrd no > longer wishes to be respawned Jan 27 01:09:59 Condor pacemakerd: > [24877]: info: update_node_processes: Node Condor now has process > list: 00000000000000000000000000110312 (was > 00000000000000000000000000111312) Jan 27 01:09:59 Condor stonith-ng: > [24880]: info: init_ais_connection_classic: AIS connection established > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: get_ais_nodeid: > Server details: id=167837962 uname=Condor cname=pcmk Jan 27 01:09:59 > Condor stonith-ng: [24880]: info: init_ais_connection_once: Connection > to 'classic openais (with plugin)': established Jan 27 01:09:59 Condor > stonith-ng: [24880]: info: crm_new_peer: Node Condor now has id: 167837962 > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_new_peer: Node > 167837962 is now known as Condor Jan 27 01:09:59 Condor stonith-ng: > [24880]: info: main: Starting stonith-ng mainloop Jan 27 01:09:59 Condor > stonith-ng: [24880]: info: crm_update_peer: Node Condor: id=167837962 > state=unknown addr=(null) votes=0 born=0 seen=0 > proc=00000000000000000000000000110312 (new) Jan 27 01:09:59 Condor cib: > [24881]: info: startCib: CIB Initialization completed successfully Jan 27 > 01:09:59 Condor cib: [24881]: info: get_cluster_type: Cluster type is: > 'openais' > Jan 27 01:09:59 Condor cib: [24881]: notice: crm_cluster_connect: > Connecting to cluster infrastructure: classic openais (with plugin) Jan 27 > 01:09:59 Condor cib: [24881]: info: init_ais_connection_classic: Creating > connection to our Corosync plugin > Jan 27 01:09:59 Condor corosync[24778]: [IPC ] Invalid IPC > credentials. > Jan 27 01:09:59 Condor cib: [24881]: info: > init_ais_connection_classic: Connection to our AIS plugin (9) failed: > unknown (100) Jan 27 01:09:59 Condor cib: [24881]: CRIT: cib_init: > Cannot sign in to the cluster... terminating Jan 27 01:09:59 Condor > pacemakerd: [24877]: ERROR: pcmk_child_exit: Child process cib exited > (pid=24881, rc=100) Jan 27 01:09:59 Condor pacemakerd: [24877]: > notice: pcmk_child_exit: Child process cib no longer wishes to be > respawned Jan 27 01:09:59 Condor pacemakerd: [24877]: info: > update_node_processes: Node Condor now has process list: > 00000000000000000000000000110212 (was > 00000000000000000000000000110312) Jan 27 01:09:59 Condor stonith-ng: > [24880]: info: crm_update_peer: Node Condor: id=167837962 > state=unknown addr=(null) votes=0 born=0 seen=0 > proc=00000000000000000000000000110212 (new) Jan 27 01:10:00 Condor > crmd: [24885]: info: do_cib_control: Could not connect to the CIB > service: connection failed Jan 27 01:10:00 Condor crmd: [24885]: WARN: > do_cib_control: Couldn't complete CIB registration 1 times... pause > and retry Jan 27 01:10:00 Condor crmd: [24885]: info: crmd_init: > Starting crmd's mainloop Jan 27 01:10:01 Condor CRON[24888]: (root) > CMD (/etc/init.d/watchdog -e >/dev/null 2>&1) Jan 27 01:10:02 Condor > crmd: [24885]: info: crm_timer_popped: Wait Timer (I_NULL) just popped > (2000ms) Jan 27 01:10:03 Condor crmd: [24885]: info: do_cib_control: > Could not connect to the CIB service: connection failed Jan 27 > 01:10:03 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 2 times... pause and retry Jan 27 01:10:05 Condor > crmd: [24885]: info: crm_timer_popped: Wait Timer (I_NULL) just popped > (2000ms) Jan 27 01:10:06 Condor crmd: [24885]: info: do_cib_control: > Could not connect to the CIB service: connection failed Jan 27 > 01:10:06 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 3 times... pause and retry Jan 27 01:10:08 Condor > crmd: [24885]: info: crm_timer_popped: Wait Timer (I_NULL) just popped > (2000ms) Jan 27 01:10:09 Condor crmd: [24885]: info: do_cib_control: > Could not connect to the CIB service: connection failed Jan 27 > 01:10:09 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 4 times... pause and retry Jan 27 01:10:11 Condor > crmd: [24885]: info: crm_timer_popped: Wait Timer (I_NULL) just popped > (2000ms) Jan 27 01:10:12 Condor crmd: [24885]: info: do_cib_control: > Could not connect to the CIB service: connection failed Jan 27 > 01:10:12 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 5 times... pause and retry > > Jacob A. Smith > IT Manager > Argotec, LLC > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org Getting started: > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org