Hello,

The configuration:
redhat 5.5 64bits
pacemaker-libs-1.0.10-1.4.el5.x86_64
pacemaker-1.0.10-1.4.el5.x86_64
corosync-1.2.7-1.1.el5.x86_64
corosynclib-1.2.7-1.1.el5.x86_64

when working : [root@ujboss1 cluster]# crm_mon -1
============
Last updated: Tue Jan 17 16:27:33 2012
Stack: openais
Current DC: ujboss2 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ ujboss1 ujboss2 ]

 Resource Group: vifGroup
     clusterIP  (ocf::heartbeat:IPaddr2):       Started ujboss1
     routing-jboss      (lsb:routing-jboss):    Started ujboss1


Now, the problem : Just after running crm_mode offline on ujboss1 (12:51:44), crmd seems to loop with always the same messages :
I have restarted corosync on both node, and now it's working.
But can you help me avoiding this "loop".

on ujboss2:
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Set DC to ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Set DC to ujboss1 (3.0.1)
Jan 17 12:51:49 ujboss2 crmd: [18796]: info: update_dc: Unset DC ujboss1
loop...

and on ujboss1:
Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jan 17 12:38:46 ujboss1 crmd: [28369]: info: unpack_graph: Unpacked transition 8776: 0 actions in 0 synapses Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_te_invoke: Processing graph 8776 (ref=pe_calc-dc-1326800326-8977) derived from /var/lib/pengine/pe-input-7829.bz2 Jan 17 12:38:46 ujboss1 crmd: [28369]: info: run_graph: ==================================================== Jan 17 12:38:46 ujboss1 crmd: [28369]: notice: run_graph: Transition 8776 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-7829.bz2): Complete Jan 17 12:38:46 ujboss1 crmd: [28369]: info: te_graph_trigger: Transition 8776 is now complete Jan 17 12:38:46 ujboss1 crmd: [28369]: info: notify_crmd: Transition 8776 status: done - <null> Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jan 17 12:38:46 ujboss1 crmd: [28369]: info: do_state_transition: Starting PEngine Recheck Timer Jan 17 12:38:46 ujboss1 pengine: [28368]: info: process_pe_message: Transition 8776: PEngine Input stored in: /var/lib/pengine/pe-input-7829.bz2 Jan 17 12:46:27 ujboss1 cib: [28365]: info: cib_stats: Processed 1 operations (0.00us average, 0% utilization) in the last 10min Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - <cib admin_epoch="0" epoch="233" num_updates="5" > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - <configuration > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - <nodes > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - <node id="ujboss1" > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - <instance_attributes id="nodes-ujboss1" > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - <nvpair value="off" id="nodes-ujboss1-standby" /> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - </instance_attributes> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - </node> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - </nodes> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - </configuration> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: - </cib> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + <cib admin_epoch="0" epoch="234" num_updates="1" > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + <configuration > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + <nodes > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + <node id="ujboss1" > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + <instance_attributes id="nodes-ujboss1" > Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + <nvpair value="on" id="nodes-ujboss1-standby" /> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + </instance_attributes> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + </node> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + </nodes> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + </configuration> Jan 17 12:51:44 ujboss1 cib: [28365]: info: log_data_element: cib:diff: + </cib> Jan 17 12:51:44 ujboss1 cib: [28365]: info: cib_process_request: Operation complete: op cib_modify for section nodes (origin=local/crm_attribute/4, version=0.234.1): ok (rc=0) Jan 17 12:51:44 ujboss1 crmd: [28369]: info: abort_transition_graph: need_abort:59 - Triggered transition abort (complete=1) : Non-status change Jan 17 12:51:44 ujboss1 crmd: [28369]: info: need_abort: Aborting on change to admin_epoch Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. Jan 17 12:51:44 ujboss1 crmd: [28369]: info: do_pe_invoke: Query 8981: Requesting the current CIB: S_POLICY_ENGINE Jan 17 12:51:46 ujboss1 cib: [28365]: ERROR: send_ais_text: Sending message 251: FAILED (rc=2): Library error: Connection timed out (110) Jan 17 12:51:46 ujboss1 crmd: [28369]: info: do_pe_invoke_callback: Invoking the PE: query=8981, ref=pe_calc-dc-1326801106-8978, seq=560, quorate=1 Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 17 12:51:47 ujboss1 pengine: [28368]: info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0 Jan 17 12:51:47 ujboss1 pengine: [28368]: info: unpack_status: Node ujboss1 is in standby-mode Jan 17 12:51:47 ujboss1 pengine: [28368]: info: determine_online_status: Node ujboss1 is standby Jan 17 12:51:47 ujboss1 pengine: [28368]: info: determine_online_status: Node ujboss2 is online Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: group_print: Resource Group: vifGroup Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: native_print: clusterIP (ocf::heartbeat:IPaddr2): Started ujboss1 Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: native_print: routing-jboss (lsb:routing-jboss): Started ujboss1 Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: RecurringOp: Start recurring monitor (30s) for clusterIP on ujboss2 Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: RecurringOp: Start recurring monitor (30s) for routing-jboss on ujboss2 Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: LogActions: Move resource clusterIP (Started ujboss1 -> ujboss2) Jan 17 12:51:47 ujboss1 pengine: [28368]: notice: LogActions: Move resource routing-jboss (Started ujboss1 -> ujboss2) Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jan 17 12:51:48 ujboss1 crmd: [28369]: info: unpack_graph: Unpacked transition 8777: 11 actions in 11 synapses Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_te_invoke: Processing graph 8777 (ref=pe_calc-dc-1326801106-8978) derived from /var/lib/pengine/pe-input-7830.bz2 Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo action 15 fired and confirmed Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating action 10: stop routing-jboss_stop_0 on ujboss1 (local) Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: cancel_op: operation monitor[79] on lsb::routing-jboss::routing-jboss for client 28369, its parameters: CRM_meta_interval=[30000] CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] cancelled Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_lrm_rsc_op: Performing key=10:8777:0:39671e48-9519-4b61-b781-2efcd379df7a op=routing-jboss_stop_0 )
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: rsc:routing-jboss:80: stop
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM operation routing-jboss_monitor_30000 (call=79, status=1, cib-update=0, confirmed=true) Cancelled Jan 17 12:51:48 ujboss1 lrmd: [5533]: WARN: For LSB init script, no additional parameters are needed. Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: (routing-jboss:stop:stdout) Disabling traffic redirection from 128.1.13.9 to 128.1.13.7 Jan 17 12:51:48 ujboss1 pengine: [28368]: info: process_pe_message: Transition 8777: PEngine Input stored in: /var/lib/pengine/pe-input-7830.bz2 Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: (routing-jboss:stop:stdout) [ Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: (routing-jboss:stop:stdout) OK Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: (routing-jboss:stop:stdout) ] Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: (routing-jboss:stop:stdout) Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: (routing-jboss:stop:stdout)

Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM operation routing-jboss_stop_0 (call=80, rc=0, cib-update=8982, confirmed=true) ok Jan 17 12:51:48 ujboss1 cib: [28365]: ERROR: send_ais_message: Not connected to AIS Jan 17 12:51:48 ujboss1 crmd: [28369]: info: match_graph_event: Action routing-jboss_stop_0 (10) confirmed on ujboss1 (rc=0) Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating action 7: stop clusterIP_stop_0 on ujboss1 (local) Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: cancel_op: operation monitor[77] on ocf::IPaddr2::clusterIP for client 28369, its parameters: CRM_meta_interval=[30000] ip=[128.1.13.9] cidr_netmask=[32] CRM_meta_timeout=[20000] crm_feature_set=[3.0.1] CRM_meta_name=[monitor] iflabel=[jbossfailover] cancelled Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_lrm_rsc_op: Performing key=7:8777:0:39671e48-9519-4b61-b781-2efcd379df7a op=clusterIP_stop_0 )
Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: rsc:clusterIP:81: stop
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM operation clusterIP_monitor_30000 (call=77, status=1, cib-update=0, confirmed=true) Cancelled Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: (clusterIP:stop:stderr) logger: unknown facility name: none.

Jan 17 12:51:48 ujboss1 lrmd: [28366]: info: RA output: (clusterIP:stop:stderr) logger: unknown facility name: none.

Jan 17 12:51:48 ujboss1 crmd: [28369]: info: process_lrm_event: LRM operation clusterIP_stop_0 (call=81, rc=0, cib-update=8983, confirmed=true) ok Jan 17 12:51:48 ujboss1 cib: [28365]: ERROR: send_ais_message: Not connected to AIS Jan 17 12:51:48 ujboss1 crmd: [28369]: info: match_graph_event: Action clusterIP_stop_0 (7) confirmed on ujboss1 (rc=0) Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo action 16 fired and confirmed Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo action 3 fired and confirmed Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_pseudo_action: Pseudo action 13 fired and confirmed Jan 17 12:51:48 ujboss1 crmd: [28369]: info: te_rsc_command: Initiating action 8: start clusterIP_start_0 on ujboss2 Jan 17 12:51:48 corosync [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 568: memb=1, new=0, lost=1 Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: memb: ujboss1 34406784 Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: lost: ujboss2 51184000 Jan 17 12:51:48 corosync [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 568: memb=2, new=1, lost=0 Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: NEW: ujboss2 51184000 Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: MEMB: ujboss1 34406784 Jan 17 12:51:48 corosync [pcmk ] info: pcmk_peer_update: MEMB: ujboss2 51184000 Jan 17 12:51:48 ujboss1 crmd: [28369]: ERROR: crmd_ha_msg_filter: Another DC detected: ujboss2 (op=noop) Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_TRANSITION_ENGINE -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ]
Jan 17 12:51:48 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff 0.233.5 -> 0.233.6 not applied to 0.234.3: current "epoch" is greater than required Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff 0.233.6 -> 0.233.7 not applied to 0.234.3: current "epoch" is greater than required Jan 17 12:51:48 ujboss1 cib: [28365]: WARN: cib_process_diff: Diff 0.233.7 -> 0.234.1 not applied to 0.234.3: current "epoch" is greater than required Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=do_election_check ] Jan 17 12:51:48 ujboss1 crmd: [28369]: info: do_dc_takeover: Taking over DC status for this partition Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_readwrite: We are now in R/O mode Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_request: Operation complete: op cib_slave_all for section 'all' (origin=local/crmd/8984, version=0.234.3): ok (rc=0) Jan 17 12:51:48 ujboss1 cib: [28365]: info: cib_process_readwrite: We are now in R/W mode Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: Operation complete: op cib_master for section 'all' (origin=local/crmd/8985, version=0.234.3): ok (rc=0) Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: Operation complete: op cib_modify for section cib (origin=local/crmd/8986, version=0.234.3): ok (rc=0) Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/8988, version=0.234.3): ok (rc=0) Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/8990, version=0.234.3): ok (rc=0) Jan 17 12:51:49 corosync [MAIN ] Completed service synchronization, ready to provide service. Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all: join-8: Waiting on 2 outstanding join acks Jan 17 12:51:49 ujboss1 crmd: [28369]: info: ais_dispatch: Membership 568: quorum retained Jan 17 12:51:49 ujboss1 crmd: [28369]: info: crm_ais_dispatch: Setting expected votes to 2 Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/8993, version=0.234.3): ok (rc=0) Jan 17 12:51:49 ujboss1 crmd: [28369]: info: config_query_callback: Checking for expired actions every 900000ms Jan 17 12:51:49 ujboss1 crmd: [28369]: info: config_query_callback: Sending expected-votes=2 to corosync Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to ujboss1 (3.0.1) Jan 17 12:51:49 ujboss1 crmd: [28369]: info: check_join_state: do_dc_join_filter_offer: Membership changed since join started: 560 -> 568
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: join_make_offer: Making join offers based on membership 568 Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all: join-9: Waiting on 2 outstanding join acks Jan 17 12:51:49 ujboss1 crmd: [28369]: info: ais_dispatch: Membership 568: quorum retained Jan 17 12:51:49 ujboss1 crmd: [28369]: info: crm_ais_dispatch: Setting expected votes to 2 Jan 17 12:51:49 ujboss1 cib: [28365]: info: cib_process_request: Operation complete: op cib_modify for section crm_config (origin=local/crmd/8996, version=0.234.3): ok (rc=0) Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to ujboss1 (3.0.1) Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2 cluster nodes responded to the join offer. Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize: join-9: Syncing the CIB from ujboss1 to the rest of the cluster Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not connected to AIS Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/8998, version=0.234.3): not connected (rc=-3) Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback: Sync from ujboss1 resulted in an error: not connected Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input I_ELECTION_DC from finalize_sync_callback() received in state S_FINALIZE_JOIN Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=finalize_sync_callback ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all: join-10: Waiting on 2 outstanding join acks Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to ujboss1 (3.0.1) Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2 cluster nodes responded to the join offer. Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize: join-10: Syncing the CIB from ujboss1 to the rest of the cluster Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not connected to AIS Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/9000, version=0.234.3): not connected (rc=-3) Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback: Sync from ujboss1 resulted in an error: not connected Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input I_ELECTION_DC from finalize_sync_callback() received in state S_FINALIZE_JOIN Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_FINALIZE_JOIN -> S_INTEGRATION [ input=I_ELECTION_DC cause=C_FSA_INTERNAL origin=finalize_sync_callback ]
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Unset DC ujboss1
Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_offer_all: join-11: Waiting on 2 outstanding join acks Jan 17 12:51:49 ujboss1 crmd: [28369]: info: update_dc: Set DC to ujboss1 (3.0.1) Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: State transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED cause=C_FSA_INTERNAL origin=check_join_state ] Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_state_transition: All 2 cluster nodes responded to the join offer. Jan 17 12:51:49 ujboss1 crmd: [28369]: info: do_dc_join_finalize: join-11: Syncing the CIB from ujboss1 to the rest of the cluster Jan 17 12:51:49 ujboss1 cib: [28365]: ERROR: send_ais_message: Not connected to AIS Jan 17 12:51:49 ujboss1 cib: [28365]: WARN: cib_process_request: Operation complete: op cib_sync for section 'all' (origin=local/crmd/9002, version=0.234.3): not connected (rc=-3) Jan 17 12:51:49 ujboss1 crmd: [28369]: ERROR: finalize_sync_callback: Sync from ujboss1 resulted in an error: not connected Jan 17 12:51:49 ujboss1 crmd: [28369]: WARN: do_log: FSA: Input I_ELECTION_DC from finalize_sync_callback() received in state S_FINALIZE_JOIN
.... loop too

after restarting corosync :

17/01/12 13H10 : crm_mon -1
============
Last updated: Tue Jan 17 13:10:39 2012
Stack: openais
Current DC: ujboss1 - partition with quorum
Version: 1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3
2 Nodes configured, 2 expected votes
1 Resources configured.
============

Online: [ ujboss1 ujboss2 ]

 Resource Group: vifGroup
     clusterIP  (ocf::heartbeat:IPaddr2):       Started ujboss2 FAILED
     routing-jboss      (lsb:routing-jboss):    Stopped

Failed actions:
    clusterIP_start_0 (node=ujboss2, call=-1, rc=1, status=Timed Out): unknown 
error



Both linux servers were very busy, crmd, cib and corosync using all the cpu.
Best regards,
Philippe

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Reply via email to