On Tue, Jul 19, 2011 at 1:17 AM, Nikita Michalko <michalko.sys...@a-i-p.com> wrote: > Hi all! > > I have succesfully configured and running 2-nodes-cluster. By testing > different scenaries became I that error. > Situation: > 1st node was running, the 2nd was rebooted and heartbeat started only on the > 1st node - it was OK, all resources were running on the 1st node. > Then I removed on the 2nd node all files in /var/lib/heartbeat/crm/ and in > /var/lib//pengine/. > After starting the heartbeat/PM on the 2nd node, I'm facing to the following > errors: > --- SNIP --- > Jul 18 15:54:25 pollux cib: [16884]: info: retrieveCib: Reading cluster > configuration from: /var/lib/heartbeat/crm/cib.xml (digest: > /var/lib/heartbeat/crm/cib.xml.sig) > Jul 18 15:54:25 pollux cib: [16884]: WARN: validate_cib_digest: No on-disk > digest present > Jul 18 15:54:25 pollux cib: [16884]: info: validate_with_relaxng: Creating RNG > parser context > Jul 18 15:54:25 pollux cib: [16884]: info: startCib: CIB Initialization > completed successfully > Jul 18 15:54:25 pollux cib: [16884]: info: crm_cluster_connect: Connecting to > cluster infrastructure: heartbeat > Jul 18 15:54:25 pollux cib: [16884]: ERROR: crm_abort: crm_cluster_connect: > Triggered fatal assert at cluster.c:65 : hb_conn != NULL > Jul 18 15:54:25 pollux heartbeat: [16824]: WARN: Managed > /usr/lib64/heartbeat/cib process 16884 killed by signal 6 [SIGABRT - Abort]. > Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Managed > /usr/lib64/heartbeat/cib process 16884 dumped core > Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Client > /usr/lib64/heartbeat/cib "respawning too fast" > Jul 18 15:54:26 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped! (2000ms) > Jul 18 15:54:27 pollux crmd: [16850]: info: do_cib_control: Could not connect > to the CIB service: connection failed > Jul 18 15:54:27 pollux crmd: [16850]: WARN: do_cib_control: Couldn't complete > CIB registration 5 times... pause and retry > Jul 18 15:54:29 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer > (I_NULL) just popped! (2000ms) > ... > crm_verify -V -x /var/lib/heartbeat/crm/cib.xml - > OK! > After stopping the PM/HA on the 1st node and removing all relevant PM/HA > files, it is the same on the 1st node. Making new configuration with crm > configure shows errors: > Signon to CIB failed: connection failed > Init failed, could not perform requested operations > ERROR: cannot parse xml: no element found: line 1, column 0 > > Versions: > > pacemaker : 1.1.5 (Build: c86cb93c5a57c1f507a21be69d24fd28dee85397)
Mercurial has no record of this changeset. Where did you get the packages from? > cluster-glue : 1.0.7 (Build: 6fa74ce2ed7ef6df41be2b634cd4aa89c318a8dc) > resource-agents: 1.0.4 (Build: 7a11934b142d1daf42a04fbaa0391a3ac47cee4c) > heartbeat: 3.0.5 > > What do I wrong? > Configuration attached... > > > TIA! > Nikita Michalko > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: > http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker