Andrea <a.bacchi@...> writes: > > Sorry, I used wrong device id. > Now, with the correct device id, I see 2 key reserved > > [ONE] sg_persist -n --read-keys > --device=/dev/disk/by-id/scsi-36e843b60f3d0cc6d1a11d4ff0da95cd8 > PR generation=0x4, 2 registered reservation keys follow: > 0x4d5a0001 > 0x4d5a0002 > > Tomorrow i will do some test for fencing... >
some news If I try to fence serverHA2 with this command: [ONE]pcs stonith fence serverHA2 I obtain that all seem to be ok, but serverHA2 freeze, below the log from each node (on serverHA2 after loggin these lines, freeze) The servers are 2 vmware virtual machine (I ask for an account on esx server to test fence_vmware, I'm waiting response) log serverHA1 Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice: handle_request: Client stonith_admin.1907.b13e0290 wants to fence (reboot) 'serverHA2' with device '(any)' Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice: initiate_remote_stonith_op: Initiating remote operation reboot for serverHA2: 70b75107-8919-4510-9c6c-7cc65e6a00a6 (0) Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice: can_fence_host_with_device: iscsi-stonith-device can fence (reboot) serverHA2: static-list Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: process_remote_stonith_query: Query result 1 of 2 from serverHA1 for serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6 Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: call_remote_stonith: Total remote op timeout set to 120 for fencing of node serverHA2 for stonith_admin.1907.70b75107 Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: call_remote_stonith: Requesting that serverHA1 perform op reboot serverHA2 for stonith_admin.1907 (144s) Jan 30 12:13:02 [2510] serverHA1 stonith-ng: notice: can_fence_host_with_device: iscsi-stonith-device can fence (reboot) serverHA2: static-list Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: stonith_fence_get_devices_cb: Found 1 matching devices for 'serverHA2' Jan 30 12:13:02 [2510] serverHA1 stonith-ng: warning: stonith_device_execute: Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off' action instead Jan 30 12:13:02 [2510] serverHA1 stonith-ng: info: process_remote_stonith_query: Query result 2 of 2 from serverHA2 for serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6 Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: log_operation: Operation 'reboot' [1908] (call 2 from stonith_admin.1907) for host 'serverHA2' with device 'iscsi-stonith-device' returned: 0 (OK) Jan 30 12:13:03 [2510] serverHA1 stonith-ng: warning: get_xpath_object: No match for //@st_delegate in /st-reply Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: remote_op_done: Operation reboot of serverHA2 by serverHA1 for stonith_admin.1907@serverHA1.70b75107: OK Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify: Peer serverHA2 was terminated (reboot) by serverHA1 for serverHA1: OK (ref=70b75107-8919-4510-9c6c-7cc65e6a00a6) by client stonith_admin.1907 Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify: Notified CMAN that 'serverHA2' is now fenced Jan 30 12:13:03 [2514] serverHA1 crmd: info: crm_update_peer_join: crmd_peer_down: Node serverHA2[2] - join-2 phase 4 -> 0 Jan 30 12:13:03 [2514] serverHA1 crmd: info: crm_update_peer_expected: crmd_peer_down: Node serverHA2[2] - expected state is now down (was member) Jan 30 12:13:03 [2514] serverHA1 crmd: info: erase_status_tag: Deleting xpath: //node_state[@uname='serverHA2']/lrm Jan 30 12:13:03 [2514] serverHA1 crmd: info: erase_status_tag: Deleting xpath: //node_state[@uname='serverHA2']/transient_attributes Jan 30 12:13:03 [2514] serverHA1 crmd: info: tengine_stonith_notify: External fencing operation from stonith_admin.1907 fenced serverHA2 Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph: Transition aborted: External Fencing Operation (source=tengine_stonith_notify:248, 1) Jan 30 12:13:03 [2514] serverHA1 crmd: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] Jan 30 12:13:03 [2514] serverHA1 crmd: warning: do_state_transition: Only 1 of 2 cluster nodes are eligible to run resources - continue 0 Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request: Forwarding cib_modify operation for section status to master (origin=local/crmd/333) Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request: Forwarding cib_delete operation for section //node_state[@uname='serverHA2']/lrm to master (origin=local/crmd/334) Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request: Forwarding cib_delete operation for section //node_state[@uname='serverHA2']/transient_attributes to master (origin=local/crmd/335) Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff: --- 0.51.86 2 Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff: +++ 0.51.87 (null) Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: + /cib: @num_updates=87 Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: + /cib/status/node_state[@id='serverHA2']: @crm-debug-origin=send_stonith_update, @join=down, @expected=down Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=serverHA1/crmd/333, version=0.51.87) Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff: --- 0.51.87 2 Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff: +++ 0.51.88 (null) Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: -- /cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2'] Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: + /cib: @num_updates=88 Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334, version=0.51.88) Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff: --- 0.51.88 2 Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: Diff: +++ 0.51.89 (null) Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: -- /cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2'] Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_perform_op: + /cib: @num_updates=89 Jan 30 12:13:03 [2509] serverHA1 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='serverHA2']/transient_attributes: OK (rc=0, origin=serverHA1/crmd/335, version=0.51.89) Jan 30 12:13:03 [2514] serverHA1 crmd: info: cib_fencing_updated: Fencing update 333 for serverHA2: complete Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph: Transition aborted by deletion of lrm[@id='serverHA2']: Resource state removal (cib=0.51.88, source=te_update_diff:429, path=/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2'], 1) Jan 30 12:13:03 [2514] serverHA1 crmd: info: abort_transition_graph: Transition aborted by deletion of transient_attributes[@id='serverHA2']: Transient attribute change (cib=0.51.89, source=te_update_diff:391, path=/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2 '], 1) Jan 30 12:13:03 [2513] serverHA1 pengine: info: process_pe_message: Input has not changed since last time, not saving to disk Jan 30 12:13:03 [2513] serverHA1 pengine: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status_fencing: Node serverHA2 is active Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status: Node serverHA2 is online Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status_fencing: Node serverHA1 is active Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status: Node serverHA1 is online Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone Set: ping-clone [ping] Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print: Started: [ serverHA1 serverHA2 ] Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone Set: clusterfs-clone [clusterfs] Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print: Started: [ serverHA1 serverHA2 ] Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_print: iscsi-stonith-device (stonith:fence_scsi): Started serverHA1 Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave ping:0 (Started serverHA2) Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave ping:1 (Started serverHA1) Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave clusterfs:0 (Started serverHA2) Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave clusterfs:1 (Started serverHA1) Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave iscsi-stonith-device (Started serverHA1) Jan 30 12:13:03 [2514] serverHA1 crmd: info: handle_response: pe_calc calculation pe_calc-dc-1422616383-286 is obsolete Jan 30 12:13:03 [2513] serverHA1 pengine: notice: process_pe_message: Calculated Transition 189: /var/lib/pacemaker/pengine/pe-input-145.bz2 Jan 30 12:13:03 [2513] serverHA1 pengine: notice: unpack_config: On loss of CCM Quorum: Ignore Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status_fencing: - Node serverHA2 is not ready to run resources Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status: Node serverHA2 is pending Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status_fencing: Node serverHA1 is active Jan 30 12:13:03 [2513] serverHA1 pengine: info: determine_online_status: Node serverHA1 is online Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone Set: ping-clone [ping] Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print: Started: [ serverHA1 ] Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print: Stopped: [ serverHA2 ] Jan 30 12:13:03 [2513] serverHA1 pengine: info: clone_print: Clone Set: clusterfs-clone [clusterfs] Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print: Started: [ serverHA1 ] Jan 30 12:13:03 [2513] serverHA1 pengine: info: short_print: Stopped: [ serverHA2 ] Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_print: iscsi-stonith-device (stonith:fence_scsi): Started serverHA1 Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_color: Resource ping:1 cannot run anywhere Jan 30 12:13:03 [2513] serverHA1 pengine: info: native_color: Resource clusterfs:1 cannot run anywhere Jan 30 12:13:03 [2513] serverHA1 pengine: info: probe_resources: Action probe_complete-serverHA2 on serverHA2 is unrunnable (pending) Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action ping:0_monitor_0 on serverHA2 is unrunnable (pending) Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action clusterfs:0_monitor_0 on serverHA2 is unrunnable (pending) Jan 30 12:13:03 [2513] serverHA1 pengine: warning: custom_action: Action iscsi-stonith-device_monitor_0 on serverHA2 is unrunnable (pending) Jan 30 12:13:03 [2513] serverHA1 pengine: notice: trigger_unfencing: Unfencing serverHA2: node discovery Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave ping:0 (Started serverHA1) Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave ping:1 (Stopped) Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave clusterfs:0 (Started serverHA1) Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave clusterfs:1 (Stopped) Jan 30 12:13:03 [2513] serverHA1 pengine: info: LogActions: Leave iscsi-stonith-device (Started serverHA1) Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ] Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_te_invoke: Processing graph 190 (ref=pe_calc-dc-1422616383-287) derived from /var/lib/pacemaker/pengine/pe-input-146.bz2 Jan 30 12:13:03 [2513] serverHA1 pengine: notice: process_pe_message: Calculated Transition 190: /var/lib/pacemaker/pengine/pe-input-146.bz2 Jan 30 12:13:03 [2514] serverHA1 crmd: notice: te_fence_node: Executing on fencing operation (5) on serverHA2 (timeout=60000) Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: handle_request: Client crmd.2514.b5961dc1 wants to fence (on) 'serverHA2' with device '(any)' Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: initiate_remote_stonith_op: Initiating remote operation on for serverHA2: e19629dc-bec3-4e63-baf6-a7ecd5ed44bb (0) Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info: process_remote_stonith_query: Query result 2 of 2 from serverHA2 for serverHA2/on (1 devices) e19629dc-bec3-4e63-baf6-a7ecd5ed44bb Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info: process_remote_stonith_query: All queries have arrived, continuing (2, 2, 2) Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info: call_remote_stonith: Total remote op timeout set to 60 for fencing of node serverHA2 for crmd.2514.e19629dc Jan 30 12:13:03 [2510] serverHA1 stonith-ng: info: call_remote_stonith: Requesting that serverHA2 perform op on serverHA2 for crmd.2514 (72s) Jan 30 12:13:03 [2510] serverHA1 stonith-ng: warning: get_xpath_object: No match for //@st_delegate in /st-reply Jan 30 12:13:03 [2510] serverHA1 stonith-ng: notice: remote_op_done: Operation on of serverHA2 by serverHA2 for crmd.2514@serverHA1.e19629dc: OK Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_callback: Stonith operation 9/5:190:0:4e500b84-bb92-4406-8f9c-f4140dd40ec7: OK (0) Jan 30 12:13:03 [2514] serverHA1 crmd: notice: tengine_stonith_notify: serverHA2 was successfully unfenced by serverHA2 (at the request of serverHA1) Jan 30 12:13:03 [2514] serverHA1 crmd: notice: run_graph: Transition 190 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-146.bz2): Complete Jan 30 12:13:03 [2514] serverHA1 crmd: info: do_log: FSA: Input I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE Jan 30 12:13:03 [2514] serverHA1 crmd: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ] log serverHA2 Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: can_fence_host_with_device: iscsi-stonith-device can fence (reboot) serverHA2: static-list Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: remote_op_done: Operation reboot of serverHA2 by serverHA1 for stonith_admin.1907@serverHA1.70b75107: OK Jan 30 12:13:11 [2631] serverHA2 crmd: crit: tengine_stonith_notify: We were alegedly just fenced by serverHA1 for serverHA1! Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff: --- 0.51.86 2 Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff: +++ 0.51.87 (null) Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: + /cib: @num_updates=87 Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: + /cib/status/node_state[@id='serverHA2']: @crm-debug-origin=send_stonith_update, @join=down, @expected=down Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=serverHA1/crmd/333, version=0.51.87) Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff: --- 0.51.87 2 Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff: +++ 0.51.88 (null) Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: -- /cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2'] Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: + /cib: @num_updates=88 Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334, version=0.51.88) Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff: --- 0.51.88 2 Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: Diff: +++ 0.51.89 (null) Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: -- /cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2'] Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_perform_op: + /cib: @num_updates=89 Jan 30 12:13:11 [2626] serverHA2 cib: info: cib_process_request: Completed cib_delete operation for section //node_state[@uname='serverHA2']/transient_attributes: OK (rc=0, origin=serverHA1/crmd/335, version=0.51.89) Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: can_fence_host_with_device: iscsi-stonith-device can fence (on) serverHA2: static-list Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: can_fence_host_with_device: iscsi-stonith-device can fence (on) serverHA2: static-list Jan 30 12:13:11 [2627] serverHA2 stonith-ng: info: stonith_fence_get_devices_cb: Found 1 matching devices for 'serverHA2' Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: log_operation: Operation 'on' [3037] (call 9 from crmd.2514) for host 'serverHA2' with device 'iscsi-stonith-device' returned: 0 (OK) Jan 30 12:13:11 [2627] serverHA2 stonith-ng: notice: remote_op_done: Operation on of serverHA2 by serverHA2 for crmd.2514@serverHA1.e19629dc: OK I will continue testing.... Andrea _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org