> On 27 Jan 2015, at 5:23 pm, Jake Smith <jsm...@argotec.com> wrote: > > Had a failover of my active/passive cluster and now the passive node will not > rejoin the cluster. > > 2 nodes running Ubuntu 12.04 > coro 1.4.2-2, openais 1.1.4-4, pcmk 1.1.6-2ubuntu3 > > Corosync ring membership is fine on both rings. > > Tried stopping coro/pace and clearing /var/lib/heartbeat/crm/ and then > restarting on passive node without success. > Tried rebooting passive node (again – it was successfully fenced) > Tried updating pacemaker to latest in distro (1.1.6-2ubuntu3.3) then went > back on passive node > Tried putting active node in maintenance mode and stopping pacemaker and > corosync on both nodes. Then restarting on both nodes. Corosync came back > fine as before but now I have the same problem on both nodes with pacemaker > not starting successfully. Both show exactly same now - attrd: [24883]: > ERROR: main: HA Signon failed. > > Log: > Jan 27 01:09:59 Condor crmd: [24885]: info: crmd_init: Starting crmd > Jan 27 01:09:59 Condor cib: [24881]: info: validate_with_relaxng: Creating > RNG parser context > Jan 27 01:09:59 Condor lrmd: [24882]: info: enabling coredumps > Jan 27 01:09:59 Condor lrmd: [24882]: info: Started. > Jan 27 01:09:59 Condor corosync[24778]: [IPC ] Invalid IPC credentials.
This seems to be the root of the errors. Pacemaker looks a little old, could you consider updating? > Jan 27 01:09:59 Condor attrd: [24883]: ERROR: main: HA Signon failed > Jan 27 01:09:59 Condor attrd: [24883]: ERROR: main: Aborting startup > Jan 27 01:09:59 Condor pacemakerd: [24877]: ERROR: pcmk_child_exit: Child > process attrd exited (pid=24883, rc=100) > Jan 27 01:09:59 Condor pacemakerd: [24877]: notice: pcmk_child_exit: Child > process attrd no longer wishes to be respawned > Jan 27 01:09:59 Condor pacemakerd: [24877]: info: update_node_processes: Node > Condor now has process list: 00000000000000000000000000110312 (was > 00000000000000000000000000111312) > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: > init_ais_connection_classic: AIS connection established > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: get_ais_nodeid: Server > details: id=167837962 uname=Condor cname=pcmk > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: init_ais_connection_once: > Connection to 'classic openais (with plugin)': established > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_new_peer: Node Condor > now has id: 167837962 > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_new_peer: Node > 167837962 is now known as Condor > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: main: Starting stonith-ng > mainloop > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_update_peer: Node > Condor: id=167837962 state=unknown addr=(null) votes=0 born=0 seen=0 > proc=00000000000000000000000000110312 (new) > Jan 27 01:09:59 Condor cib: [24881]: info: startCib: CIB Initialization > completed successfully > Jan 27 01:09:59 Condor cib: [24881]: info: get_cluster_type: Cluster type is: > 'openais' > Jan 27 01:09:59 Condor cib: [24881]: notice: crm_cluster_connect: Connecting > to cluster infrastructure: classic openais (with plugin) > Jan 27 01:09:59 Condor cib: [24881]: info: init_ais_connection_classic: > Creating connection to our Corosync plugin > Jan 27 01:09:59 Condor corosync[24778]: [IPC ] Invalid IPC credentials. > Jan 27 01:09:59 Condor cib: [24881]: info: init_ais_connection_classic: > Connection to our AIS plugin (9) failed: unknown (100) > Jan 27 01:09:59 Condor cib: [24881]: CRIT: cib_init: Cannot sign in to the > cluster... terminating > Jan 27 01:09:59 Condor pacemakerd: [24877]: ERROR: pcmk_child_exit: Child > process cib exited (pid=24881, rc=100) > Jan 27 01:09:59 Condor pacemakerd: [24877]: notice: pcmk_child_exit: Child > process cib no longer wishes to be respawned > Jan 27 01:09:59 Condor pacemakerd: [24877]: info: update_node_processes: Node > Condor now has process list: 00000000000000000000000000110212 (was > 00000000000000000000000000110312) > Jan 27 01:09:59 Condor stonith-ng: [24880]: info: crm_update_peer: Node > Condor: id=167837962 state=unknown addr=(null) votes=0 born=0 seen=0 > proc=00000000000000000000000000110212 (new) > Jan 27 01:10:00 Condor crmd: [24885]: info: do_cib_control: Could not connect > to the CIB service: connection failed > Jan 27 01:10:00 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 1 times... pause and retry > Jan 27 01:10:00 Condor crmd: [24885]: info: crmd_init: Starting crmd's > mainloop > Jan 27 01:10:01 Condor CRON[24888]: (root) CMD (/etc/init.d/watchdog -e > >/dev/null 2>&1) > Jan 27 01:10:02 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped (2000ms) > Jan 27 01:10:03 Condor crmd: [24885]: info: do_cib_control: Could not connect > to the CIB service: connection failed > Jan 27 01:10:03 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 2 times... pause and retry > Jan 27 01:10:05 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped (2000ms) > Jan 27 01:10:06 Condor crmd: [24885]: info: do_cib_control: Could not connect > to the CIB service: connection failed > Jan 27 01:10:06 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 3 times... pause and retry > Jan 27 01:10:08 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped (2000ms) > Jan 27 01:10:09 Condor crmd: [24885]: info: do_cib_control: Could not connect > to the CIB service: connection failed > Jan 27 01:10:09 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 4 times... pause and retry > Jan 27 01:10:11 Condor crmd: [24885]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped (2000ms) > Jan 27 01:10:12 Condor crmd: [24885]: info: do_cib_control: Could not connect > to the CIB service: connection failed > Jan 27 01:10:12 Condor crmd: [24885]: WARN: do_cib_control: Couldn't complete > CIB registration 5 times... pause and retry > > Jacob A. Smith > IT Manager > Argotec, LLC > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org