Hello, we are facing the problem with the simple (I hope) cluster configuration with 2 nodes ims0 and ims1 and 3 primitives (no shared storage or something like this where data corruption is a danger):
- master-slave Java application ims (to be run normally on both nodes in as master/slave, with our own OCF script) with embedded web server (to be accessed by clients) - ims-ip and ims-ip-src: shared IP address and outgoing address to be run on the ims master solely Below are listed the software versions, crm configuration and portions of corosync log. The problem is that although most of the time the setup works (i.e if master ims application dies, slave one is promoted and ip addresses are remapped) but sometimes when master ims application stops (fails or is killed), the failover does not occur - the slave ims application remains the slave and the shared IP address remains mapped on the node with died ims. I even created a testbed of 2 servers, killing the ims application from cron every 15 minutes on supposed MAIN server to simulate the failure and observe the failover and to replicate the problem (sometimes it works properly for hours/days). For example today (July 4, 23:45 local time) the ims at ims0 was killed, but remained Master - no failover of IP addresses was performed and ims on ims1 remained Slave: ============ Last updated: Fri Jul 5 02:07:18 2013 Last change: Thu Jul 4 23:33:46 2013 Stack: openais Current DC: ims0 - partition with quorum Version: 1.1.7-61a079313275f3e9d0e85671f62c721d32ce3563 2 Nodes configured, 2 expected votes 6 Resources configured. ============ Online: [ ims1 ims0 ] Master/Slave Set: ms-ims [ims] Masters: [ ims0 ] Slaves: [ ims1 ] Clone Set: clone-cluster-mon [cluster-mon] Started: [ ims0 ims1 ] Resource Group: on-ims-master ims-ip (ocf::heartbeat:IPaddr2): Started ims0 ims-ip-src (ocf::heartbeat:IPsrcaddr): Started ims0 The command 'crm node standby' on ims0 did not fix the thing: ims0 remained master (although standby): Node ims0: standby Online: [ ims1 ] Master/Slave Set: ms-ims [ims] ims:0 (ocf::microstepmis:imsMS): Slave ims0 FAILED Slaves: [ ims1 ] Clone Set: clone-cluster-mon [cluster-mon] Started: [ ims1 ] Stopped: [ cluster-mon:0 ] Failed actions: ims:0_demote_0 (node=ims0, call=3179, rc=7, status=complete): not running Stoppping openais service on ims0 completely did the thing. Could someone provide me with a hint, what to do ? - provide more information (logs, ocf script) ? - change something in configuration ? - change the environment / versions ? Thanks a lot Martin Gazak Software versions: ------------------ libpacemaker3-1.1.7-42.1 pacemaker-1.1.7-42.1 corosync-1.4.3-21.1 libcorosync4-1.4.3-21.1 SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 2 Configuration: -------------- node ims0 \ attributes standby="off" node ims1 \ attributes standby="off" primitive cluster-mon ocf:pacemaker:ClusterMon \ params htmlfile="/opt/ims/tomcat/webapps/ims/html/crm_status.html" \ op monitor interval="10" primitive ims ocf:microstepmis:imsMS \ op monitor interval="1" role="Master" timeout="20" \ op monitor interval="2" role="Slave" timeout="20" \ op start interval="0" timeout="1800s" \ op stop interval="0" timeout="120s" \ op promote interval="0" timeout="180s" \ meta failure-timeout="360s" primitive ims-ip ocf:heartbeat:IPaddr2 \ params ip="192.168.141.13" nic="bond1" iflabel="ims" cidr_netmask="24" \ op monitor interval="15s" \ meta failure-timeout="60s" primitive ims-ip-src ocf:heartbeat:IPsrcaddr \ params ipaddress="192.168.141.13" cidr_netmask="24" \ op monitor interval="15s" \ meta failure-timeout="60s" group on-ims-master ims-ip ims-ip-src ms ms-ims ims \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Started" migration-threshold="1" clone clone-cluster-mon cluster-mon colocation ims_master inf: on-ims-master ms-ims:Master order ms-ims-before inf: ms-ims:promote on-ims-master:start property $id="cib-bootstrap-options" \ dc-version="1.1.7-61a079313275f3e9d0e85671f62c721d32ce3563" \ cluster-infrastructure="openais" \ expected-quorum-votes="2" \ no-quorum-policy="ignore" \ stonith-enabled="false" \ cluster-recheck-interval="1m" \ default-resource-stickiness="1000" \ last-lrm-refresh="1372951736" \ maintenance-mode="false" corosync.log from ims0: ----------------------- Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims:0_monitor_1000 (call=3046, rc=7, cib-update=6229, confirmed=false) not running Jul 04 23:45:02 ims0 crmd: [3935]: info: process_graph_event: Detected action ims:0_monitor_1000 from a different transition: 4024 vs. 4035 Jul 04 23:45:02 ims0 crmd: [3935]: info: abort_transition_graph: process_graph_event:476 - Triggered transition abort (complete=1, tag=lrm_rsc_op, id=ims:0_last_failure_0, magic=0:7;7:4024:8:e3f096a7-4eb5-4810-9310-eb144f595e20, cib=0.717.6) : Old event Jul 04 23:45:02 ims0 crmd: [3935]: WARN: update_failcount: Updating failcount for ims:0 on ims0 after failed monitor: rc=7 (update=value++, time=1372952702) Jul 04 23:45:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jul 04 23:45:02 ims0 attrd: [3932]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ims:0 (1) Jul 04 23:45:02 ims0 pengine: [3933]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 04 23:45:02 ims0 pengine: [3933]: WARN: unpack_rsc_op: Processing failed op ims:0_last_failure_0 on ims0: not running (7) Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0 (Master ims0) Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip (Started ims0) Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip-src (Started ims0) Jul 04 23:45:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jul 04 23:45:02 ims0 crmd: [3935]: info: do_te_invoke: Processing graph 4036 (ref=pe_calc-dc-1372952702-11907) derived from /var/lib/pengine/pe-input-2819.bz2 Jul 04 23:45:02 ims0 crmd: [3935]: info: te_rsc_command: Initiating action 51: stop ims-ip-src_stop_0 on ims0 (local) Jul 04 23:45:02 ims0 attrd: [3932]: notice: attrd_perform_update: Sent update 4439: fail-count-ims:0=1 Jul 04 23:45:02 ims0 attrd: [3932]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ims:0 (1372952702) Jul 04 23:45:02 ims0 lrmd: [3931]: info: cancel_op: operation monitor[3049] on ims-ip-src for client 3935, its parameters: CRM_meta_name=[monitor] cidr_netmask=[24] crm_feature_set=[3.0.6] CRM_meta_timeout=[20000] CRM_meta_interval=[15000] ipaddress=[192.168.141.13] cancelled Jul 04 23:45:02 ims0 attrd: [3932]: notice: attrd_perform_update: Sent update 4441: last-failure-ims:0=1372952702 Jul 04 23:45:02 ims0 lrmd: [3931]: info: rsc:ims-ip-src stop[3052] (pid 12111) Jul 04 23:45:02 ims0 crmd: [3935]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, id=status-ims0-fail-count-ims.0, name=fail-count-ims:0, value=1, magic=NA, cib=0.717.7) : Transient attribute: update Jul 04 23:45:02 ims0 crmd: [3935]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=0, tag=nvpair, id=status-ims0-last-failure-ims.0, name=last-failure-ims:0, value=1372952702, magic=NA, cib=0.717.8) : Transient attribute: update Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims-ip-src_monitor_15000 (call=3049, status=1, cib-update=0, confirmed=true) Cancelled Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4036: PEngine Input stored in: /var/lib/pengine/pe-input-2819.bz2 Jul 04 23:45:02 ims0 lrmd: [3931]: info: operation stop[3052] on ims-ip-src for client 3935: pid 12111 exited with return code 0 Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims-ip-src_stop_0 (call=3052, rc=0, cib-update=6231, confirmed=true) ok Jul 04 23:45:02 ims0 crmd: [3935]: notice: run_graph: ==== Transition 4036 (Complete=3, Pending=0, Fired=0, Skipped=32, Incomplete=19, Source=/var/lib/pengine/pe-input-2819.bz2): Stopped Jul 04 23:45:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Jul 04 23:45:02 ims0 pengine: [3933]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 04 23:45:02 ims0 pengine: [3933]: notice: get_failcount: Failcount for ms-ims on ims0 has expired (limit was 360s) Jul 04 23:45:02 ims0 pengine: [3933]: notice: unpack_rsc_op: Clearing expired failcount for ims:0 on ims0 Jul 04 23:45:02 ims0 pengine: [3933]: notice: get_failcount: Failcount for ms-ims on ims0 has expired (limit was 360s) Jul 04 23:45:02 ims0 pengine: [3933]: notice: unpack_rsc_op: Clearing expired failcount for ims:0 on ims0 Jul 04 23:45:02 ims0 pengine: [3933]: WARN: unpack_rsc_op: Processing failed op ims:0_last_failure_0 on ims0: not running (7) Jul 04 23:45:02 ims0 pengine: [3933]: notice: get_failcount: Failcount for ms-ims on ims0 has expired (limit was 360s) Jul 04 23:45:02 ims0 pengine: [3933]: notice: get_failcount: Failcount for ms-ims on ims0 has expired (limit was 360s) Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Recover ims:0 (Master ims0) Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Restart ims-ip (Started ims0) Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Start ims-ip-src (ims0) Jul 04 23:45:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jul 04 23:45:02 ims0 crmd: [3935]: info: do_te_invoke: Processing graph 4037 (ref=pe_calc-dc-1372952702-11909) derived from /var/lib/pengine/pe-input-2820.bz2 Jul 04 23:45:02 ims0 crmd: [3935]: info: te_crm_command: Executing crm-event (3): clear_failcount on ims0 Jul 04 23:45:02 ims0 crmd: [3935]: info: te_rsc_command: Initiating action 49: stop ims-ip_stop_0 on ims0 (local) Jul 04 23:45:02 ims0 lrmd: [3931]: info: cancel_op: operation monitor[3047] on ims-ip for client 3935, its parameters: cidr_netmask=[24] nic=[bond1] crm_feature_set=[3.0.6] ip=[192.168.141.13] iflabel=[ims] CRM_meta_name=[monitor] CRM_meta_timeout=[20000] CRM_meta_interval=[15000] cancelled Jul 04 23:45:02 ims0 lrmd: [3931]: info: rsc:ims-ip stop[3053] (pid 12154) Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims-ip_monitor_15000 (call=3047, status=1, cib-update=0, confirmed=true) Cancelled Jul 04 23:45:02 ims0 crmd: [3935]: info: te_rsc_command: Initiating action 72: notify ims:0_pre_notify_demote_0 on ims0 (local) Jul 04 23:45:02 ims0 lrmd: [3931]: info: rsc:ims:0 notify[3054] (pid 12155) Jul 04 23:45:02 ims0 crmd: [3935]: info: te_rsc_command: Initiating action 74: notify ims:1_pre_notify_demote_0 on ims1 Jul 04 23:45:02 ims0 lrmd: [3931]: info: operation notify[3054] on ims:0 for client 3935: pid 12155 exited with return code 0 Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims:0_notify_0 (call=3054, rc=0, cib-update=0, confirmed=true) ok Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4037: PEngine Input stored in: /var/lib/pengine/pe-input-2820.bz2 Jul 04 23:45:02 ims0 lrmd: [3931]: info: RA output: (ims-ip:stop:stderr) 2013/07/04_23:45:02 INFO: IP status = ok, IP_CIP= Jul 04 23:45:02 ims0 lrmd: [3931]: info: operation stop[3053] on ims-ip for client 3935: pid 12154 exited with return code 0 Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims-ip_stop_0 (call=3053, rc=0, cib-update=6233, confirmed=true) ok Jul 04 23:45:02 ims0 crmd: [3935]: info: handle_failcount_op: Removing failcount for ims:0 Jul 04 23:45:02 ims0 attrd: [3932]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ims:0 (<null>) Jul 04 23:45:02 ims0 cib: [3929]: info: cib_process_request: Operation complete: op cib_delete for section //node_state[@uname='ims0']//lrm_resource[@id='ims:0']/lrm_rsc_op[@id='ims:0_last_failure_0'] (origin=local/crmd/6234, version=0.717.11): ok (rc=0) Jul 04 23:45:02 ims0 crmd: [3935]: info: abort_transition_graph: te_update_diff:321 - Triggered transition abort (complete=0, tag=lrm_rsc_op, id=ims:0_last_failure_0, magic=0:7;7:4024:8:e3f096a7-4eb5-4810-9310-eb144f595e20, cib=0.717.11) : Resource op removal Jul 04 23:45:02 ims0 attrd: [3932]: notice: attrd_perform_update: Sent delete 4443: node=ims0, attr=fail-count-ims:0, id=<n/a>, set=(null), section=status Jul 04 23:45:02 ims0 crmd: [3935]: info: abort_transition_graph: te_update_diff:194 - Triggered transition abort (complete=0, tag=transient_attributes, id=ims0, magic=NA, cib=0.717.12) : Transient attribute: removal Jul 04 23:45:02 ims0 attrd: [3932]: notice: attrd_trigger_update: Sending flush op to all hosts for: last-failure-ims:0 (<null>) Jul 04 23:45:02 ims0 attrd: [3932]: notice: attrd_perform_update: Sent delete 4445: node=ims0, attr=last-failure-ims:0, id=<n/a>, set=(null), section=status Jul 04 23:45:02 ims0 crmd: [3935]: info: abort_transition_graph: te_update_diff:194 - Triggered transition abort (complete=0, tag=transient_attributes, id=ims0, magic=NA, cib=0.717.13) : Transient attribute: removal Jul 04 23:45:02 ims0 crmd: [3935]: notice: run_graph: ==== Transition 4037 (Complete=7, Pending=0, Fired=0, Skipped=28, Incomplete=19, Source=/var/lib/pengine/pe-input-2820.bz2): Stopped Jul 04 23:45:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=notify_crmd ] Jul 04 23:45:02 ims0 pengine: [3933]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Start ims-ip (ims0) Jul 04 23:45:02 ims0 pengine: [3933]: notice: LogActions: Start ims-ip-src (ims0) Jul 04 23:45:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jul 04 23:45:02 ims0 crmd: [3935]: info: do_te_invoke: Processing graph 4038 (ref=pe_calc-dc-1372952702-11915) derived from /var/lib/pengine/pe-input-2821.bz2 Jul 04 23:45:02 ims0 crmd: [3935]: info: te_rsc_command: Initiating action 47: start ims-ip_start_0 on ims0 (local) Jul 04 23:45:02 ims0 lrmd: [3931]: info: rsc:ims-ip start[3055] (pid 12197) Jul 04 23:45:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4038: PEngine Input stored in: /var/lib/pengine/pe-input-2821.bz2 Jul 04 23:45:02 ims0 lrmd: [3931]: info: RA output: (ims-ip:start:stderr) 2013/07/04_23:45:02 INFO: Adding IPv4 address 192.168.141.13/24 with broadcast address 192.168.141.255 to device bond1 (with label bond1:ims) Jul 04 23:45:02 ims0 lrmd: [3931]: info: RA output: (ims-ip:start:stderr) 2013/07/04_23:45:02 INFO: Bringing device bond1 up Jul 04 23:45:02 ims0 lrmd: [3931]: info: RA output: (ims-ip:start:stderr) 2013/07/04_23:45:02 INFO: /usr/lib64/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-192.168.141.13 bond1 192.168.141.13 auto not_used not_used Jul 04 23:45:02 ims0 lrmd: [3931]: info: operation start[3055] on ims-ip for client 3935: pid 12197 exited with return code 0 Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims-ip_start_0 (call=3055, rc=0, cib-update=6236, confirmed=true) ok Jul 04 23:45:02 ims0 crmd: [3935]: info: te_rsc_command: Initiating action 48: monitor ims-ip_monitor_15000 on ims0 (local) Jul 04 23:45:02 ims0 lrmd: [3931]: info: rsc:ims-ip monitor[3056] (pid 12255) Jul 04 23:45:02 ims0 crmd: [3935]: info: te_rsc_command: Initiating action 49: start ims-ip-src_start_0 on ims0 (local) Jul 04 23:45:02 ims0 lrmd: [3931]: info: rsc:ims-ip-src start[3057] (pid 12256) Jul 04 23:45:02 ims0 lrmd: [3931]: info: operation monitor[3056] on ims-ip for client 3935: pid 12255 exited with return code 0 Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims-ip_monitor_15000 (call=3056, rc=0, cib-update=6237, confirmed=false) ok Jul 04 23:45:02 ims0 lrmd: [3931]: info: operation start[3057] on ims-ip-src for client 3935: pid 12256 exited with return code 0 Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims-ip-src_start_0 (call=3057, rc=0, cib-update=6238, confirmed=true) ok Jul 04 23:45:02 ims0 crmd: [3935]: info: te_rsc_command: Initiating action 50: monitor ims-ip-src_monitor_15000 on ims0 (local) Jul 04 23:45:02 ims0 lrmd: [3931]: info: rsc:ims-ip-src monitor[3058] (pid 12336) Jul 04 23:45:02 ims0 lrmd: [3931]: info: operation monitor[3058] on ims-ip-src for client 3935: pid 12336 exited with return code 0 Jul 04 23:45:02 ims0 crmd: [3935]: info: process_lrm_event: LRM operation ims-ip-src_monitor_15000 (call=3058, rc=0, cib-update=6239, confirmed=false) ok Jul 04 23:45:02 ims0 crmd: [3935]: notice: run_graph: ==== Transition 4038 (Complete=6, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-2821.bz2): Complete Jul 04 23:45:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jul 04 23:46:02 ims0 crmd: [3935]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (60000ms) Jul 04 23:46:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Jul 04 23:46:02 ims0 crmd: [3935]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED Jul 04 23:46:02 ims0 pengine: [3933]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 04 23:46:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jul 04 23:46:02 ims0 crmd: [3935]: info: do_te_invoke: Processing graph 4039 (ref=pe_calc-dc-1372952762-11920) derived from /var/lib/pengine/pe-input-2822.bz2 Jul 04 23:46:02 ims0 crmd: [3935]: notice: run_graph: ==== Transition 4039 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-2822.bz2): Complete Jul 04 23:46:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] Jul 04 23:46:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4039: PEngine Input stored in: /var/lib/pengine/pe-input-2822.bz2 Jul 04 23:47:02 ims0 crmd: [3935]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (60000ms) Jul 04 23:47:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped ] Jul 04 23:47:02 ims0 crmd: [3935]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED Jul 04 23:47:02 ims0 pengine: [3933]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 04 23:47:02 ims0 pengine: [3933]: notice: process_pe_message: Transition 4040: PEngine Input stored in: /var/lib/pengine/pe-input-2822.bz2 Jul 04 23:47:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jul 04 23:47:02 ims0 crmd: [3935]: info: do_te_invoke: Processing graph 4040 (ref=pe_calc-dc-1372952822-11921) derived from /var/lib/pengine/pe-input-2822.bz2 Jul 04 23:47:02 ims0 crmd: [3935]: notice: run_graph: ==== Transition 4040 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-2822.bz2): Complete Jul 04 23:47:02 ims0 crmd: [3935]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] corosync.log from ims1: ----------------------- Jul 04 23:45:02 ims1 lrmd: [3913]: info: rsc:ims:1 notify[1424] (pid 25381) Jul 04 23:45:02 ims1 lrmd: [3913]: info: operation notify[1424] on ims:1 for client 3917: pid 25381 exited with return code 0 Jul 04 23:45:02 ims1 crmd: [3917]: info: process_lrm_event: LRM operation ims:1_notify_0 (call=1424, rc=0, cib-update=0, confirmed=true) ok Jul 04 23:49:35 ims1 cib: [3911]: info: cib_stats: Processed 324 operations (92.00us average, 0% utilization) in the last 10min Jul 04 23:59:35 ims1 cib: [3911]: info: cib_stats: Processed 295 operations (67.00us average, 0% utilization) in the last 10min Jul 05 00:00:03 ims1 crmd: [3917]: info: process_lrm_event: LRM operation ims:1_monitor_2000 (call=1423, rc=7, cib-update=778, confirmed=false) not running Jul 05 00:00:03 ims1 attrd: [3914]: notice: attrd_ais_dispatch: Update relayed from ims0 Jul 05 00:00:03 ims1 attrd: [3914]: notice: attrd_trigger_update: Sending flush op to all hosts for: fail-count-ims:1 (1) Jul 05 00:00:03 ims1 attrd: [3914]: notice: attrd_perform_update: Sent update 2037: fail-count-ims:1=1 Jul 05 00:00:03 ims1 attrd: [3914]: notice: attrd_ais_dispatch: Update relayed from ims0 -- Regards, Martin Gazak MicroStep-MIS, spol. s r.o. System Development Manager Tel.: +421 2 602 00 128 Fax: +421 2 602 00 180 martin.ga...@microstep-mis.sk http://www.microstep-mis.com _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org