On Tue, Jul 19, 2011 at 9:58 AM, Andrew Beekhof <and...@beekhof.net> wrote: > On Tue, Jul 19, 2011 at 1:17 AM, Nikita Michalko > <michalko.sys...@a-i-p.com> wrote: >> Hi all! >> >> I have succesfully configured and running 2-nodes-cluster. By testing >> different scenaries became I that error. >> Situation: >> 1st node was running, the 2nd was rebooted and heartbeat started only on the >> 1st node - it was OK, all resources were running on the 1st node. >> Then I removed on the 2nd node all files in /var/lib/heartbeat/crm/ and in >> /var/lib//pengine/. >> After starting the heartbeat/PM on the 2nd node, I'm facing to the following >> errors: >> --- SNIP --- >> Jul 18 15:54:25 pollux cib: [16884]: info: retrieveCib: Reading cluster >> configuration from: /var/lib/heartbeat/crm/cib.xml (digest: >> /var/lib/heartbeat/crm/cib.xml.sig) >> Jul 18 15:54:25 pollux cib: [16884]: WARN: validate_cib_digest: No on-disk >> digest present >> Jul 18 15:54:25 pollux cib: [16884]: info: validate_with_relaxng: Creating >> RNG >> parser context >> Jul 18 15:54:25 pollux cib: [16884]: info: startCib: CIB Initialization >> completed successfully >> Jul 18 15:54:25 pollux cib: [16884]: info: crm_cluster_connect: Connecting to >> cluster infrastructure: heartbeat >> Jul 18 15:54:25 pollux cib: [16884]: ERROR: crm_abort: crm_cluster_connect: >> Triggered fatal assert at cluster.c:65 : hb_conn != NULL >> Jul 18 15:54:25 pollux heartbeat: [16824]: WARN: Managed >> /usr/lib64/heartbeat/cib process 16884 killed by signal 6 [SIGABRT - Abort]. >> Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Managed >> /usr/lib64/heartbeat/cib process 16884 dumped core >> Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Client >> /usr/lib64/heartbeat/cib "respawning too fast" >> Jul 18 15:54:26 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer >> (I_NULL) just popped! (2000ms) >> Jul 18 15:54:27 pollux crmd: [16850]: info: do_cib_control: Could not connect >> to the CIB service: connection failed >> Jul 18 15:54:27 pollux crmd: [16850]: WARN: do_cib_control: Couldn't complete >> CIB registration 5 times... pause and retry >> Jul 18 15:54:29 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer >> (I_NULL) just popped! (2000ms) >> ... >> crm_verify -V -x /var/lib/heartbeat/crm/cib.xml - > OK! >> After stopping the PM/HA on the 1st node and removing all relevant PM/HA >> files, it is the same on the 1st node. Making new configuration with crm >> configure shows errors: >> Signon to CIB failed: connection failed >> Init failed, could not perform requested operations >> ERROR: cannot parse xml: no element found: line 1, column 0 >> >> Versions: >> >> pacemaker : 1.1.5 (Build: c86cb93c5a57c1f507a21be69d24fd28dee85397) > > Mercurial has no record of this changeset. > Where did you get the packages from?
Specifically because it does not look like they support heartbeat, which is what it triggering this error. >> cluster-glue : 1.0.7 (Build: 6fa74ce2ed7ef6df41be2b634cd4aa89c318a8dc) >> resource-agents: 1.0.4 (Build: 7a11934b142d1daf42a04fbaa0391a3ac47cee4c) >> heartbeat: 3.0.5 >> >> What do I wrong? >> Configuration attached... >> >> >> TIA! >> Nikita Michalko >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: >> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >> >> > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker