On Sat, Oct 13, 2012 at 7:58 AM, David Parker <dpar...@utica.edu> wrote: > Hello, > > I have two nodes set up in a cluster to provide a MySQL server (mysqld) in > HA on a virtual IP address. This was working fine until I had to reboot the > servers. All I did was change the interface each node uses for its primary > IP address (changed from eth1 to eth0 on each node). Now I'm stuck in a > cycle. Let's say node 1 has the virtual IP and is running mysqld, and node > 2 is down. When node 2 boots up, it will STONITH node 1 for no apparent > reason
At a guess... because it still cant reach node 1? > and take over the resources, which shouldn't happen. Why? This would be expected in a 2-node cluster where either side comes up while the network is hosed. What you have is a stonith death match: http://ourobengr.com/ha/ > When node 1 > boots up again, it will STONITH node 2 and take over the resources, which > again shouldn't happen. > > Obviously, the goal is to have both nodes running, and whichever one boots > up second should see that the first one has the resources and then stand by. > I'll include the relevant log entries and my crm config at the end of this > message. In the logs, the only thing I see that seems unusual (other than > the fact that a STONITH is executed against a running node) is this: > > Oct 12 16:27:22 ha1 crmd: [1176]: info: populate_cib_nodes_ha: Requesting > the list of configured nodes > Oct 12 16:27:23 ha1 crmd: [1176]: WARN: get_uuid: Could not calculate UUID > for ha2 > Oct 12 16:27:23 ha1 crmd: [1176]: WARN: populate_cib_nodes_ha: Node ha2: no > uuid found > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster > nodes are eligible to run resources. > > The exact opposite shows up on the node "ha2" (it says ha1 has no uuid). > Did the UUID of each node change because the physical interface changed? > Any other ideas? Thanks in advance. > > - Dave > > Here are the log lines: > > Oct 12 16:26:13 ha1 heartbeat: [1176]: info: Starting > "/usr/lib/heartbeat/crmd" as uid 103 gid 105 (pid 1176) > Oct 12 16:26:13 ha1 crmd: [1176]: info: crm_log_init: Changed active > directory to /var/lib/heartbeat/cores/hacluster > Oct 12 16:26:13 ha1 crmd: [1176]: info: Invoked: /usr/lib/heartbeat/crmd > Oct 12 16:26:13 ha1 crmd: [1176]: info: main: CRM Hg Version: > 74392a28b7f31d7ddc86689598bd23114f58978b > Oct 12 16:26:13 ha1 crmd: [1176]: info: crmd_init: Starting crmd > Oct 12 16:26:13 ha1 crmd: [1176]: info: G_main_add_SignalHandler: Added > signal handler for signal 17 > Oct 12 16:26:14 ha1 crmd: [1176]: info: do_cib_control: CIB connection > established > Oct 12 16:26:14 ha1 crmd: [1176]: info: crm_cluster_connect: Connecting to > Heartbeat > Oct 12 16:26:15 ha1 crmd: [1176]: info: register_heartbeat_conn: Hostname: > ha1 > Oct 12 16:26:15 ha1 crmd: [1176]: info: register_heartbeat_conn: UUID: > 1b48f410-44d1-4e89-8b52-ff23b32db1bc > Oct 12 16:26:15 ha1 crmd: [1176]: info: do_ha_control: Connected to the > cluster > Oct 12 16:26:15 ha1 crmd: [1176]: info: do_ccm_control: CCM connection > established... waiting for first callback > Oct 12 16:26:15 ha1 crmd: [1176]: info: do_started: Delaying start, CCM > (0000000000100000) not connected > Oct 12 16:26:15 ha1 crmd: [1176]: info: crmd_init: Starting crmd's mainloop > Oct 12 16:26:15 ha1 crmd: [1176]: info: config_query_callback: Checking for > expired actions every 900000ms > Oct 12 16:26:15 ha1 crmd: [1176]: notice: crmd_client_status_callback: > Status update: Client ha1/crmd now has status [online] (DC=false) > Oct 12 16:26:15 ha1 crmd: [1176]: info: crm_new_peer: Node 0 is now known as > ha1 > Oct 12 16:26:15 ha1 crmd: [1176]: info: crm_update_peer_proc: ha1.crmd is > now online > Oct 12 16:26:15 ha1 crmd: [1176]: info: crmd_client_status_callback: Not the > DC > Oct 12 16:26:15 ha1 crmd: [1176]: notice: crmd_client_status_callback: > Status update: Client ha1/crmd now has status [online] (DC=false) > Oct 12 16:26:16 ha1 crmd: [1176]: info: crmd_client_status_callback: Not the > DC > Oct 12 16:26:16 ha1 crmd: [1176]: info: mem_handle_event: Got an event > OC_EV_MS_NEW_MEMBERSHIP from ccm > Oct 12 16:26:16 ha1 crmd: [1176]: info: mem_handle_event: instance=1, > nodes=1, new=1, lost=0, n_idx=0, new_idx=0, old_idx=3 > Oct 12 16:26:16 ha1 crmd: [1176]: info: crmd_ccm_msg_callback: Quorum > (re)attained after event=NEW MEMBERSHIP (id=1) > Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: NEW MEMBERSHIP: > trans=1, nodes=1, new=1, lost=0 n_idx=0, new_idx=0, old_idx=3 > Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: #011CURRENT: ha1 > [nodeid=0, born=1] > Oct 12 16:26:16 ha1 crmd: [1176]: info: ccm_event_detail: #011NEW: ha1 > [nodeid=0, born=1] > Oct 12 16:26:16 ha1 crmd: [1176]: info: crm_update_peer: Node ha1: id=0 > state=member (new) addr=(null) votes=-1 born=1 seen=1 > proc=00000000000000000000000000000200 > Oct 12 16:26:16 ha1 crmd: [1176]: info: crm_update_peer_proc: ha1.ais is now > online > Oct 12 16:26:16 ha1 crmd: [1176]: info: do_started: The local CRM is > operational > Oct 12 16:26:16 ha1 crmd: [1176]: info: do_state_transition: State > transition S_STARTING -> S_PENDING [ input=I_PENDING cause=C_FSA_INTERNAL > origin=do_started ] > Oct 12 16:27:17 ha1 crmd: [1176]: info: crm_timer_popped: Election Trigger > (I_DC_TIMEOUT) just popped! > Oct 12 16:27:17 ha1 crmd: [1176]: WARN: do_log: FSA: Input I_DC_TIMEOUT from > crm_timer_popped() received in state S_PENDING > Oct 12 16:27:17 ha1 crmd: [1176]: info: do_state_transition: State > transition S_PENDING -> S_ELECTION [ input=I_DC_TIMEOUT cause=C_TIMER_POPPED > origin=crm_timer_popped ] > Oct 12 16:27:17 ha1 crmd: [1176]: info: do_state_transition: State > transition S_ELECTION -> S_INTEGRATION [ input=I_ELECTION_DC > cause=C_FSA_INTERNAL origin=do_election_check ] > Oct 12 16:27:17 ha1 crmd: [1176]: info: do_te_control: Registering TE UUID: > 4f352feb-6be6-4af1-bb9e-af8b7ca28f58 > Oct 12 16:27:17 ha1 crmd: [1176]: WARN: cib_client_add_notify_callback: > Callback already present > Oct 12 16:27:17 ha1 crmd: [1176]: info: set_graph_functions: Setting custom > graph functions > Oct 12 16:27:17 ha1 crmd: [1176]: info: unpack_graph: Unpacked transition > -1: 0 actions in 0 synapses > Oct 12 16:27:17 ha1 crmd: [1176]: info: start_subsystem: Starting sub-system > "pengine" > Oct 12 16:27:20 ha1 crmd: [1176]: info: do_dc_takeover: Taking over DC > status for this partition > Oct 12 16:27:20 ha1 crmd: [1176]: info: join_make_offer: Making join offers > based on membership 1 > Oct 12 16:27:20 ha1 crmd: [1176]: info: do_dc_join_offer_all: join-1: > Waiting on 1 outstanding join acks > Oct 12 16:27:20 ha1 crmd: [1176]: info: te_connect_stonith: Attempting > connection to fencing daemon... > Oct 12 16:27:21 ha1 crmd: [1176]: info: te_connect_stonith: Connected > Oct 12 16:27:21 ha1 crmd: [1176]: info: config_query_callback: Checking for > expired actions every 900000ms > Oct 12 16:27:21 ha1 crmd: [1176]: info: update_dc: Set DC to ha1 (3.0.1) > Oct 12 16:27:21 ha1 crmd: [1176]: info: do_state_transition: State > transition S_INTEGRATION -> S_FINALIZE_JOIN [ input=I_INTEGRATED > cause=C_FSA_INTERNAL origin=check_join_state ] > Oct 12 16:27:21 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster > nodes responded to the join offer. > Oct 12 16:27:21 ha1 crmd: [1176]: info: do_dc_join_finalize: join-1: Syncing > the CIB from ha1 to the rest of the cluster > Oct 12 16:27:22 ha1 crmd: [1176]: info: update_attrd: Connecting to attrd... > Oct 12 16:27:22 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion of > "//node_state[@uname='ha1']/transient_attributes": ok (rc=0) > Oct 12 16:27:22 ha1 crmd: [1176]: info: do_dc_join_ack: join-1: Updating > node state to member for ha1 > Oct 12 16:27:22 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion of > "//node_state[@uname='ha1']/lrm": ok (rc=0) > Oct 12 16:27:22 ha1 crmd: [1176]: info: do_state_transition: State > transition S_FINALIZE_JOIN -> S_POLICY_ENGINE [ input=I_FINALIZED > cause=C_FSA_INTERNAL origin=check_join_state ] > Oct 12 16:27:22 ha1 crmd: [1176]: info: populate_cib_nodes_ha: Requesting > the list of configured nodes > Oct 12 16:27:23 ha1 crmd: [1176]: WARN: get_uuid: Could not calculate UUID > for ha2 > Oct 12 16:27:23 ha1 crmd: [1176]: WARN: populate_cib_nodes_ha: Node ha2: no > uuid found > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster > nodes are eligible to run resources. > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_dc_join_final: Ensuring DC, > quorum and node attributes are up-to-date > Oct 12 16:27:23 ha1 crmd: [1176]: info: crm_update_quorum: Updating quorum > status to true (call=21) > Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph: > do_te_invoke:191 - Triggered transition abort (complete=1) : Peer Cancelled > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke: Query 22: Requesting > the current CIB: S_POLICY_ENGINE > Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph: > need_abort:59 - Triggered transition abort (complete=1) : Non-status change > Oct 12 16:27:23 ha1 crmd: [1176]: info: need_abort: Aborting on change to > admin_epoch > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke: Query 23: Requesting > the current CIB: S_POLICY_ENGINE > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking the > PE: query=23, ref=pe_calc-dc-1350073643-7, seq=1, quorate=1 > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Oct 12 16:27:23 ha1 crmd: [1176]: info: unpack_graph: Unpacked transition 0: > 20 actions in 20 synapses > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 0 > (ref=pe_calc-dc-1350073643-7) derived from /var/lib/pengine/pe-warn-1370.bz2 > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 4: > monitor stonith-ha1_monitor_0 on ha1 (local) > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing > key=4:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha1_monitor_0 ) > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 5: > monitor stonith-ha2_monitor_0 on ha1 (local) > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing > key=5:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha2_monitor_0 ) > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 6: > monitor MysqlIP_monitor_0 on ha1 (local) > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing > key=6:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_monitor_0 ) > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 7: > monitor mysqld_monitor_0 on ha1 (local) > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing > key=7:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=mysqld_monitor_0 ) > Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation > stonith-ha1_monitor_0 (call=2, rc=7, cib-update=24, confirmed=true) not > running > Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation > stonith-ha2_monitor_0 (call=3, rc=7, cib-update=25, confirmed=true) not > running > Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action > stonith-ha1_monitor_0 (4) confirmed on ha1 (rc=0) > Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action > stonith-ha2_monitor_0 (5) confirmed on ha1 (rc=0) > Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation > mysqld_monitor_0 (call=5, rc=7, cib-update=26, confirmed=true) not running > Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action > mysqld_monitor_0 (7) confirmed on ha1 (rc=0) > Oct 12 16:27:23 ha1 crmd: [1176]: info: process_lrm_event: LRM operation > MysqlIP_monitor_0 (call=4, rc=7, cib-update=27, confirmed=true) not running > Oct 12 16:27:23 ha1 crmd: [1176]: info: match_graph_event: Action > MysqlIP_monitor_0 (6) confirmed on ha1 (rc=0) > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 3: > probe_complete probe_complete on ha1 (local) - no waiting > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 2 > fired and confirmed > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 8: > start stonith-ha2_start_0 on ha1 (local) > Oct 12 16:27:23 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing > key=8:0:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=stonith-ha2_start_0 ) > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 14 > fired and confirmed > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 20 > fired and confirmed > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 19 > fired and confirmed > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 15 > fired and confirmed > Oct 12 16:27:23 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 12 > fired and confirmed > Oct 12 16:27:23 ha1 crmd: [1176]: info: abort_transition_graph: > te_update_diff:146 - Triggered transition abort (complete=0, > tag=transient_attributes, id=1b48f410-44d1-4e89-8b52-ff23b32db1bc, magic=NA, > cib=0.61.6) : Transient attribute: > update > Oct 12 16:27:23 ha1 crmd: [1176]: info: update_abort_priority: Abort > priority upgraded from 0 to 1000000 > Oct 12 16:27:23 ha1 crmd: [1176]: info: update_abort_priority: Abort action > done superceeded by restart > Oct 12 16:27:34 ha1 crmd: [1176]: info: process_lrm_event: LRM operation > stonith-ha2_start_0 (call=6, rc=0, cib-update=28, confirmed=true) ok > Oct 12 16:27:34 ha1 crmd: [1176]: info: match_graph_event: Action > stonith-ha2_start_0 (8) confirmed on ha1 (rc=0) > Oct 12 16:27:34 ha1 crmd: [1176]: info: run_graph: > ==================================================== > Oct 12 16:27:34 ha1 crmd: [1176]: notice: run_graph: Transition 0 > (Complete=12, Pending=0, Fired=0, Skipped=8, Incomplete=0, > Source=/var/lib/pengine/pe-warn-1370.bz2): Stopped > Oct 12 16:27:34 ha1 crmd: [1176]: info: te_graph_trigger: Transition 0 is > now complete > Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_FSA_INTERNAL origin=notify_crmd ] > Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster > nodes are eligible to run resources. > Oct 12 16:27:34 ha1 crmd: [1176]: info: do_pe_invoke: Query 29: Requesting > the current CIB: S_POLICY_ENGINE > Oct 12 16:27:34 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking the > PE: query=29, ref=pe_calc-dc-1350073654-14, seq=1, quorate=1 > Oct 12 16:27:34 ha1 crmd: [1176]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Oct 12 16:27:34 ha1 crmd: [1176]: info: unpack_graph: Unpacked transition 1: > 9 actions in 9 synapses > Oct 12 16:27:34 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 1 > (ref=pe_calc-dc-1350073654-14) derived from > /var/lib/pengine/pe-warn-1371.bz2 > Oct 12 16:27:34 ha1 stonithd: [1174]: info: client tengine [pid: 1176] > requests a STONITH operation RESET on node ha2 > Oct 12 16:27:34 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 9 > fired and confirmed > Oct 12 16:27:34 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 13 > fired and confirmed > Oct 12 16:27:34 ha1 crmd: [1176]: info: te_fence_node: Executing reboot > fencing operation (15) on ha2 (timeout=60000) > Oct 12 16:28:08 ha1 crmd: [1176]: info: tengine_stonith_callback: call=1266, > optype=1, node_name=ha2, result=0, node_list=ha1, > action=15:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 > Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 1 > fired and confirmed > Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 14 > fired and confirmed > Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 6: > start MysqlIP_start_0 on ha1 (local) > Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing > key=6:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_start_0 ) > Oct 12 16:28:08 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion of > "//node_state[@uname='ha2']/lrm": ok (rc=0) > Oct 12 16:28:08 ha1 crmd: [1176]: info: erase_xpath_callback: Deletion of > "//node_state[@uname='ha2']/transient_attributes": ok (rc=0) > Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation > MysqlIP_start_0 (call=7, rc=0, cib-update=33, confirmed=true) ok > Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action > MysqlIP_start_0 (6) confirmed on ha1 (rc=0) > Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 7: > monitor MysqlIP_monitor_10000 on ha1 (local) > Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing > key=7:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=MysqlIP_monitor_10000 ) > Oct 12 16:28:08 ha1 crmd: [1176]: info: te_rsc_command: Initiating action 8: > start mysqld_start_0 on ha1 (local) > Oct 12 16:28:08 ha1 crmd: [1176]: info: do_lrm_rsc_op: Performing > key=8:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58 op=mysqld_start_0 ) > Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation > mysqld_start_0 (call=9, rc=0, cib-update=34, confirmed=true) ok > Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action > mysqld_start_0 (8) confirmed on ha1 (rc=0) > Oct 12 16:28:08 ha1 crmd: [1176]: info: te_pseudo_action: Pseudo action 10 > fired and confirmed > Oct 12 16:28:08 ha1 crmd: [1176]: info: process_lrm_event: LRM operation > MysqlIP_monitor_10000 (call=8, rc=0, cib-update=35, confirmed=false) ok > Oct 12 16:28:08 ha1 crmd: [1176]: info: match_graph_event: Action > MysqlIP_monitor_10000 (7) confirmed on ha1 (rc=0) > Oct 12 16:28:08 ha1 crmd: [1176]: info: run_graph: > ==================================================== > Oct 12 16:28:08 ha1 crmd: [1176]: notice: run_graph: Transition 1 > (Complete=9, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pengine/pe-warn-1371.bz2): Complete > Oct 12 16:28:08 ha1 crmd: [1176]: info: te_graph_trigger: Transition 1 is > now complete > Oct 12 16:28:08 ha1 crmd: [1176]: info: notify_crmd: Transition 1 status: > done - <null> > Oct 12 16:28:08 ha1 crmd: [1176]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Oct 12 16:28:08 ha1 crmd: [1176]: info: do_state_transition: Starting > PEngine Recheck Timer > Oct 12 16:43:08 ha1 crmd: [1176]: info: crm_timer_popped: PEngine Recheck > Timer (I_PE_CALC) just popped! > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State > transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED > origin=crm_timer_popped ] > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: Progressed to > state S_POLICY_ENGINE after C_TIMER_POPPED > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: All 1 cluster > nodes are eligible to run resources. > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_pe_invoke: Query 36: Requesting > the current CIB: S_POLICY_ENGINE > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_pe_invoke_callback: Invoking the > PE: query=36, ref=pe_calc-dc-1350074588-18, seq=1, quorate=1 > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=handle_response ] > Oct 12 16:43:08 ha1 crmd: [1176]: info: unpack_graph: Unpacked transition 2: > 0 actions in 0 synapses > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_te_invoke: Processing graph 2 > (ref=pe_calc-dc-1350074588-18) derived from > /var/lib/pengine/pe-input-4140.bz2 > Oct 12 16:43:08 ha1 crmd: [1176]: info: run_graph: > ==================================================== > Oct 12 16:43:08 ha1 crmd: [1176]: notice: run_graph: Transition 2 > (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, > Source=/var/lib/pengine/pe-input-4140.bz2): Complete > Oct 12 16:43:08 ha1 crmd: [1176]: info: te_graph_trigger: Transition 2 is > now complete > Oct 12 16:43:08 ha1 crmd: [1176]: info: notify_crmd: Transition 2 status: > done - <null> > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: State > transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_FSA_INTERNAL origin=notify_crmd ] > Oct 12 16:43:08 ha1 crmd: [1176]: info: do_state_transition: Starting > PEngine Recheck Timer > > Here's my config: > > root@ha1:~# cibadmin --query > <cib validate-with="pacemaker-1.0" crm_feature_set="3.0.1" have-quorum="1" > admin_epoch="0" epoch="61" num_updates="11" cib-last-written="Fri Oct 12 > 16:07:33 2012" dc-uuid="1b48f410-44d1-4e89-8b52-ff23b32db1bc"> > <configuration> > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" > value="1.0.9-74392a28b7f31d7ddc86689598bd23114f58978b"/> > <nvpair id="cib-bootstrap-options-cluster-infrastructure" > name="cluster-infrastructure" value="Heartbeat"/> > <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" > value="true"/> > </cluster_property_set> > </crm_config> > <nodes> > <node id="1b48f410-44d1-4e89-8b52-ff23b32db1bc" uname="ha1" type="normal"/> > <node id="9790fe6e-67b2-4817-abf4-966b5aa6948c" uname="ha2" type="normal"/> > </nodes> > <resources> > <primitive class="stonith" id="stonith-ha1" type="external/riloe"> > <instance_attributes id="stonith-ha1-instance_attributes"> > <nvpair id="stonith-ha1-instance_attributes-hostlist" name="hostlist" > value="ha1"/> > <nvpair id="stonith-ha1-instance_attributes-ilo_hostname" > name="ilo_hostname" value="10.0.1.111"/> > <nvpair id="stonith-ha1-instance_attributes-ilo_user" name="ilo_user" > value="Administrator"/> > <nvpair id="stonith-ha1-instance_attributes-ilo_password" > name="ilo_password" value="xxx"/> > <nvpair id="stonith-ha1-instance_attributes-ilo_can_reset" > name="ilo_can_reset" value="1"/> > <nvpair id="stonith-ha1-instance_attributes-ilo_protocol" > name="ilo_protocol" value="2.0"/> > <nvpair id="stonith-ha1-instance_attributes-ilo_powerdown_method" > name="ilo_powerdown_method" value="button"/> > </instance_attributes> > </primitive> > <primitive class="stonith" id="stonith-ha2" type="external/riloe"> > <instance_attributes id="stonith-ha2-instance_attributes"> > <nvpair id="stonith-ha2-instance_attributes-hostlist" name="hostlist" > value="ha2"/> > <nvpair id="stonith-ha2-instance_attributes-ilo_hostname" > name="ilo_hostname" value="10.0.1.112"/> > <nvpair id="stonith-ha2-instance_attributes-ilo_user" name="ilo_user" > value="Administrator"/> > <nvpair id="stonith-ha2-instance_attributes-ilo_password" > name="ilo_password" value="xxx"/> > <nvpair id="stonith-ha2-instance_attributes-ilo_can_reset" > name="ilo_can_reset" value="1"/> > <nvpair id="stonith-ha2-instance_attributes-ilo_protocol" > name="ilo_protocol" value="2.0"/> > <nvpair id="stonith-ha2-instance_attributes-ilo_powerdown_method" > name="ilo_powerdown_method" value="button"/> > </instance_attributes> > </primitive> > <group id="mysql-resources"> > <primitive class="ocf" id="MysqlIP" provider="heartbeat" type="IPaddr2"> > <instance_attributes id="MysqlIP-instance_attributes"> > <nvpair id="MysqlIP-instance_attributes-ip" name="ip" value="192.168.25.9"/> > <nvpair id="MysqlIP-instance_attributes-cidr_netmask" name="cidr_netmask" > value="32"/> > </instance_attributes> > <operations> > <op id="MysqlIP-monitor-10s" interval="10s" name="monitor"/> > </operations> > </primitive> > <primitive id="mysqld" class="lsb" type="mysqld"/> > </group> > </resources> > <constraints> > <rsc_location id="loc-3" rsc="stonith-ha1" node="ha1" score="-INFINITY"/> > <rsc_location id="loc-4" rsc="stonith-ha2" node="ha2" score="-INFINITY"/> > <rsc_location id="loc-5" rsc="mysql-resources" node="ha1" score="100"/> > </constraints> > <rsc_defaults/> > <op_defaults/> > </configuration> > <status> > <node_state id="1b48f410-44d1-4e89-8b52-ff23b32db1bc" uname="ha1" > ha="active" in_ccm="true" crmd="online" join="member" expected="member" > crm-debug-origin="do_update_resource" shutdown="0"> > <lrm id="1b48f410-44d1-4e89-8b52-ff23b32db1bc"> > <lrm_resources> > <lrm_resource id="stonith-ha1" type="external/riloe" class="stonith"> > <lrm_rsc_op id="stonith-ha1_monitor_0" operation="monitor" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" > transition-key="4:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58" > transition-magic="0:7;4:0:7:4f352feb-6be6-4 > af1-bb9e-af8b7ca28f58" call-id="2" rc-code="7" op-status="0" interval="0" > last-run="1350073642" last-rc-change="1350073642" exec-time="10" > queue-time="0" op-digest="f7c04bf4ee3eb60cf4f9e841449a0fe6"/> > </lrm_resource> > <lrm_resource id="stonith-ha2" type="external/riloe" class="stonith"> > <lrm_rsc_op id="stonith-ha2_monitor_0" operation="monitor" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" > transition-key="5:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58" > transition-magic="0:7;5:0:7:4f352feb-6be6-4 > af1-bb9e-af8b7ca28f58" call-id="3" rc-code="7" op-status="0" interval="0" > last-run="1350073643" last-rc-change="1350073643" exec-time="0" > queue-time="0" op-digest="c8773e5caa7ae2a75f71a79adc582d54"/> > <lrm_rsc_op id="stonith-ha2_start_0" operation="start" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" > transition-key="8:0:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58" > transition-magic="0:0;8:0:0:4f352feb-6be6-4af1- > bb9e-af8b7ca28f58" call-id="6" rc-code="0" op-status="0" interval="0" > last-run="1350073643" last-rc-change="1350073643" exec-time="10920" > queue-time="0" op-digest="c8773e5caa7ae2a75f71a79adc582d54"/> > </lrm_resource> > <lrm_resource id="mysqld" type="mysqld" class="lsb"> > <lrm_rsc_op id="mysqld_monitor_0" operation="monitor" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" > transition-key="7:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58" > transition-magic="0:7;7:0:7:4f352feb-6be6-4af1-b > b9e-af8b7ca28f58" call-id="5" rc-code="7" op-status="0" interval="0" > last-run="1350073642" last-rc-change="1350073642" exec-time="40" > queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> > <lrm_rsc_op id="mysqld_start_0" operation="start" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" > transition-key="8:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58" > transition-magic="0:0;8:1:0:4f352feb-6be6-4af1-bb9e- > af8b7ca28f58" call-id="9" rc-code="0" op-status="0" interval="0" > last-run="1350073688" last-rc-change="1350073688" exec-time="0" > queue-time="0" op-digest="f2317cad3d54cec5d7d7aa7d0bf35cf8"/> > </lrm_resource> > <lrm_resource id="MysqlIP" type="IPaddr2" class="ocf" provider="heartbeat"> > <lrm_rsc_op id="MysqlIP_monitor_0" operation="monitor" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" > transition-key="6:0:7:4f352feb-6be6-4af1-bb9e-af8b7ca28f58" > transition-magic="0:7;6:0:7:4f352feb-6be6-4af1- > bb9e-af8b7ca28f58" call-id="4" rc-code="7" op-status="0" interval="0" > last-run="1350073642" last-rc-change="1350073642" exec-time="150" > queue-time="0" op-digest="9611b7026c2dc135fbd13d3537b42d16"/> > <lrm_rsc_op id="MysqlIP_start_0" operation="start" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" > transition-key="6:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58" > transition-magic="0:0;6:1:0:4f352feb-6be6-4af1-bb9e > -af8b7ca28f58" call-id="7" rc-code="0" op-status="0" interval="0" > last-run="1350073687" last-rc-change="1350073687" exec-time="30" > queue-time="0" op-digest="9611b7026c2dc135fbd13d3537b42d16"/> > <lrm_rsc_op id="MysqlIP_monitor_10000" operation="monitor" > crm-debug-origin="do_update_resource" crm_feature_set="3.0.1" > transition-key="7:1:0:4f352feb-6be6-4af1-bb9e-af8b7ca28f58" > transition-magic="0:0;7:1:0:4f352feb-6be6-4 > af1-bb9e-af8b7ca28f58" call-id="8" rc-code="0" op-status="0" > interval="10000" last-run="1350073687" last-rc-change="1350073687" > exec-time="10" queue-time="0" op-digest="caf3454e2233540200e7a6a5319c5362"/> > </lrm_resource> > </lrm_resources> > </lrm> > <transient_attributes id="1b48f410-44d1-4e89-8b52-ff23b32db1bc"> > <instance_attributes id="status-1b48f410-44d1-4e89-8b52-ff23b32db1bc"> > <nvpair id="status-1b48f410-44d1-4e89-8b52-ff23b32db1bc-probe_complete" > name="probe_complete" value="true"/> > </instance_attributes> > </transient_attributes> > </node_state> > <node_state id="9790fe6e-67b2-4817-abf4-966b5aa6948c" uname="ha2" ha="dead" > in_ccm="false" crmd="offline" join="down" expected="down" > crm-debug-origin="send_stonith_update"/> > </status> > </cib> > > -- > > Dave Parker > Systems Administrator > Utica College > Integrated Information Technology Services > (315) 792-3229 > Registered Linux User #408177 > > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org