Hi all! I have succesfully configured and running 2-nodes-cluster. By testing different scenaries became I that error. Situation: 1st node was running, the 2nd was rebooted and heartbeat started only on the 1st node - it was OK, all resources were running on the 1st node. Then I removed on the 2nd node all files in /var/lib/heartbeat/crm/ and in /var/lib//pengine/. After starting the heartbeat/PM on the 2nd node, I'm facing to the following errors: --- SNIP --- Jul 18 15:54:25 pollux cib: [16884]: info: retrieveCib: Reading cluster configuration from: /var/lib/heartbeat/crm/cib.xml (digest: /var/lib/heartbeat/crm/cib.xml.sig) Jul 18 15:54:25 pollux cib: [16884]: WARN: validate_cib_digest: No on-disk digest present Jul 18 15:54:25 pollux cib: [16884]: info: validate_with_relaxng: Creating RNG parser context Jul 18 15:54:25 pollux cib: [16884]: info: startCib: CIB Initialization completed successfully Jul 18 15:54:25 pollux cib: [16884]: info: crm_cluster_connect: Connecting to cluster infrastructure: heartbeat Jul 18 15:54:25 pollux cib: [16884]: ERROR: crm_abort: crm_cluster_connect: Triggered fatal assert at cluster.c:65 : hb_conn != NULL Jul 18 15:54:25 pollux heartbeat: [16824]: WARN: Managed /usr/lib64/heartbeat/cib process 16884 killed by signal 6 [SIGABRT - Abort]. Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Managed /usr/lib64/heartbeat/cib process 16884 dumped core Jul 18 15:54:25 pollux heartbeat: [16824]: ERROR: Client /usr/lib64/heartbeat/cib "respawning too fast" Jul 18 15:54:26 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! (2000ms) Jul 18 15:54:27 pollux crmd: [16850]: info: do_cib_control: Could not connect to the CIB service: connection failed Jul 18 15:54:27 pollux crmd: [16850]: WARN: do_cib_control: Couldn't complete CIB registration 5 times... pause and retry Jul 18 15:54:29 pollux crmd: [16850]: info: crm_timer_popped: Wait Timer (I_NULL) just popped! (2000ms) ... crm_verify -V -x /var/lib/heartbeat/crm/cib.xml - > OK! After stopping the PM/HA on the 1st node and removing all relevant PM/HA files, it is the same on the 1st node. Making new configuration with crm configure shows errors: Signon to CIB failed: connection failed Init failed, could not perform requested operations ERROR: cannot parse xml: no element found: line 1, column 0
Versions: pacemaker : 1.1.5 (Build: c86cb93c5a57c1f507a21be69d24fd28dee85397) cluster-glue : 1.0.7 (Build: 6fa74ce2ed7ef6df41be2b634cd4aa89c318a8dc) resource-agents: 1.0.4 (Build: 7a11934b142d1daf42a04fbaa0391a3ac47cee4c) heartbeat: 3.0.5 What do I wrong? Configuration attached... TIA! Nikita Michalko
NM_cib.xml
Description: XML document
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker