Re: [Pacemaker] Two node cluster and no hardware device for stonith.

Andrea Fri, 30 Jan 2015 03:58:01 -0800

Andrea <a.bacchi@...> writes:

> 
> Sorry, I used wrong device id.
> Now, with the correct device id, I see 2 key reserved
> 
> [ONE] sg_persist -n --read-keys
> --device=/dev/disk/by-id/scsi-36e843b60f3d0cc6d1a11d4ff0da95cd8
>   PR generation=0x4, 2 registered reservation keys follow:
>     0x4d5a0001
>     0x4d5a0002
> 
> Tomorrow i will do some test for fencing...
>


some news


If I try to fence serverHA2 with this command:
[ONE]pcs stonith fence serverHA2

I obtain that all seem to be ok, but serverHA2 freeze,
below the log from each node (on serverHA2 after loggin these lines, freeze)

The servers are  2 vmware virtual machine (I ask for an account on esx server 
to test fence_vmware, I'm waiting response)


log serverHA1


Jan 30 12:13:02 [2510] serverHA1 stonith-ng:   notice: handle_request:  Client 
stonith_admin.1907.b13e0290 wants to fence (reboot) 'serverHA2' with device 
'(any)'
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:   notice: 
initiate_remote_stonith_op:     Initiating remote operation reboot for 
serverHA2: 70b75107-8919-4510-9c6c-7cc65e6a00a6 (0)
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:   notice: 
can_fence_host_with_device:     iscsi-stonith-device can fence (reboot) 
serverHA2: static-list
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: 
process_remote_stonith_query:   Query result 1 of 2 from serverHA1 for 
serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: call_remote_stonith:     
Total remote op timeout set to 120 for fencing of node serverHA2 for 
stonith_admin.1907.70b75107
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: call_remote_stonith:     
Requesting that serverHA1 perform op reboot serverHA2 for stonith_admin.1907 
(144s)
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:   notice: 
can_fence_host_with_device:     iscsi-stonith-device can fence (reboot) 
serverHA2: static-list
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: 
stonith_fence_get_devices_cb:   Found 1 matching devices for 'serverHA2'
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:  warning: stonith_device_execute:  
Agent 'fence_scsi' does not advertise support for 'reboot', performing 'off' 
action instead
Jan 30 12:13:02 [2510] serverHA1 stonith-ng:     info: 
process_remote_stonith_query:   Query result 2 of 2 from serverHA2 for 
serverHA2/reboot (1 devices) 70b75107-8919-4510-9c6c-7cc65e6a00a6
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: log_operation:   
Operation 'reboot' [1908] (call 2 from stonith_admin.1907) for host 'serverHA2' 
with device 'iscsi-stonith-device' returned: 0 (OK)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:  warning: get_xpath_object:        
No match for //@st_delegate in /st-reply
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: remote_op_done:  
Operation reboot of serverHA2 by serverHA1 for 
stonith_admin.1907@serverHA1.70b75107: OK
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: tengine_stonith_notify:  
Peer serverHA2 was terminated (reboot) by serverHA1 for serverHA1: OK 
(ref=70b75107-8919-4510-9c6c-7cc65e6a00a6) by client stonith_admin.1907
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: tengine_stonith_notify:  
Notified CMAN that 'serverHA2' is now fenced
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: crm_update_peer_join:    
crmd_peer_down: Node serverHA2[2] - join-2 phase 4 -> 0
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: 
crm_update_peer_expected:       crmd_peer_down: Node serverHA2[2] - expected 
state is now down (was member)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: erase_status_tag:        
Deleting xpath: //node_state[@uname='serverHA2']/lrm
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: erase_status_tag:        
Deleting xpath: //node_state[@uname='serverHA2']/transient_attributes
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: tengine_stonith_notify:  
External fencing operation from stonith_admin.1907 fenced serverHA2
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: abort_transition_graph:  
Transition aborted: External Fencing Operation 
(source=tengine_stonith_notify:248, 1)
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: do_state_transition:     
State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC 
cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jan 30 12:13:03 [2514] serverHA1       crmd:  warning: do_state_transition:     
Only 1 of 2 cluster nodes are eligible to run resources - continue 0
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request:     
Forwarding cib_modify operation for section status to master 
(origin=local/crmd/333)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request:     
Forwarding cib_delete operation for section 
//node_state[@uname='serverHA2']/lrm to master (origin=local/crmd/334)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request:     
Forwarding cib_delete operation for section 
//node_state[@uname='serverHA2']/transient_attributes to master 
(origin=local/crmd/335)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  Diff: 
--- 0.51.86 2
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  Diff: 
+++ 0.51.87 (null)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  +  
/cib:  @num_updates=87
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  +  
/cib/status/node_state[@id='serverHA2']:  
@crm-debug-origin=send_stonith_update, @join=down, @expected=down
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request:     
Completed cib_modify operation for section status: OK (rc=0, 
origin=serverHA1/crmd/333, version=0.51.87)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  Diff: 
--- 0.51.87 2
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  Diff: 
+++ 0.51.88 (null)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  -- 
/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2']
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  +  
/cib:  @num_updates=88
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request:     
Completed cib_delete operation for section 
//node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334, 
version=0.51.88)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  Diff: 
--- 0.51.88 2
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  Diff: 
+++ 0.51.89 (null)
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  -- 
/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2']
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_perform_op:  +  
/cib:  @num_updates=89
Jan 30 12:13:03 [2509] serverHA1        cib:     info: cib_process_request:     
Completed cib_delete operation for section 
//node_state[@uname='serverHA2']/transient_attributes: OK (rc=0, 
origin=serverHA1/crmd/335, version=0.51.89)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: cib_fencing_updated:     
Fencing update 333 for serverHA2: complete
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: abort_transition_graph:  
Transition aborted by deletion of lrm[@id='serverHA2']: Resource state removal 
(cib=0.51.88, source=te_update_diff:429, 
path=/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2'], 1)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: abort_transition_graph:  
Transition aborted by deletion of transient_attributes[@id='serverHA2']: 
Transient attribute change (cib=0.51.89, source=te_update_diff:391, 
path=/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2
'], 1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: process_pe_message:      
Input has not changed since last time, not saving to disk
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: unpack_config:   On loss 
of CCM Quorum: Ignore
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: 
determine_online_status_fencing:        Node serverHA2 is active
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: determine_online_status: 
        Node serverHA2 is online
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: 
determine_online_status_fencing:        Node serverHA1 is active
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: determine_online_status: 
        Node serverHA1 is online
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: clone_print:      Clone 
Set: ping-clone [ping]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print:          
Started: [ serverHA1 serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: clone_print:      Clone 
Set: clusterfs-clone [clusterfs]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print:          
Started: [ serverHA1 serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: native_print:    
iscsi-stonith-device    (stonith:fence_scsi):   Started serverHA1 
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
ping:0  (Started serverHA2)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
ping:1  (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
clusterfs:0     (Started serverHA2)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
clusterfs:1     (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
iscsi-stonith-device    (Started serverHA1)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: handle_response:         
pe_calc calculation pe_calc-dc-1422616383-286 is obsolete
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: process_pe_message:      
Calculated Transition 189: /var/lib/pacemaker/pengine/pe-input-145.bz2
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: unpack_config:   On loss 
of CCM Quorum: Ignore
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: 
determine_online_status_fencing:        - Node serverHA2 is not ready to run 
resources
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: determine_online_status: 
        Node serverHA2 is pending
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: 
determine_online_status_fencing:        Node serverHA1 is active
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: determine_online_status: 
        Node serverHA1 is online
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: clone_print:      Clone 
Set: ping-clone [ping]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print:          
Started: [ serverHA1 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print:          
Stopped: [ serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: clone_print:      Clone 
Set: clusterfs-clone [clusterfs]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print:          
Started: [ serverHA1 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: short_print:          
Stopped: [ serverHA2 ]
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: native_print:    
iscsi-stonith-device    (stonith:fence_scsi):   Started serverHA1 
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: native_color:    
Resource ping:1 cannot run anywhere
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: native_color:    
Resource clusterfs:1 cannot run anywhere
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: probe_resources:         
Action probe_complete-serverHA2 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1    pengine:  warning: custom_action:   Action 
ping:0_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1    pengine:  warning: custom_action:   Action 
clusterfs:0_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1    pengine:  warning: custom_action:   Action 
iscsi-stonith-device_monitor_0 on serverHA2 is unrunnable (pending)
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: trigger_unfencing:       
Unfencing serverHA2: node discovery
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
ping:0  (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
ping:1  (Stopped)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
clusterfs:0     (Started serverHA1)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
clusterfs:1     (Stopped)
Jan 30 12:13:03 [2513] serverHA1    pengine:     info: LogActions:      Leave   
iscsi-stonith-device    (Started serverHA1)
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: do_state_transition:     
State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
cause=C_IPC_MESSAGE origin=handle_response ]
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: do_te_invoke:    
Processing graph 190 (ref=pe_calc-dc-1422616383-287) derived from 
/var/lib/pacemaker/pengine/pe-input-146.bz2
Jan 30 12:13:03 [2513] serverHA1    pengine:   notice: process_pe_message:      
Calculated Transition 190: /var/lib/pacemaker/pengine/pe-input-146.bz2
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: te_fence_node:   
Executing on fencing operation (5) on serverHA2 (timeout=60000)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: handle_request:  Client 
crmd.2514.b5961dc1 wants to fence (on) 'serverHA2' with device '(any)'
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: 
initiate_remote_stonith_op:     Initiating remote operation on for serverHA2: 
e19629dc-bec3-4e63-baf6-a7ecd5ed44bb (0)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:     info: 
process_remote_stonith_query:   Query result 2 of 2 from serverHA2 for 
serverHA2/on (1 devices) e19629dc-bec3-4e63-baf6-a7ecd5ed44bb
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:     info: 
process_remote_stonith_query:   All queries have arrived, continuing (2, 2, 2) 
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:     info: call_remote_stonith:     
Total remote op timeout set to 60 for fencing of node serverHA2 for 
crmd.2514.e19629dc
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:     info: call_remote_stonith:     
Requesting that serverHA2 perform op on serverHA2 for crmd.2514 (72s)
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:  warning: get_xpath_object:        
No match for //@st_delegate in /st-reply
Jan 30 12:13:03 [2510] serverHA1 stonith-ng:   notice: remote_op_done:  
Operation on of serverHA2 by serverHA2 for crmd.2514@serverHA1.e19629dc: OK
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: 
tengine_stonith_callback:       Stonith operation 
9/5:190:0:4e500b84-bb92-4406-8f9c-f4140dd40ec7: OK (0)
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: tengine_stonith_notify:  
serverHA2 was successfully unfenced by serverHA2 (at the request of serverHA1)
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: run_graph:       
Transition 190 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-146.bz2): Complete
Jan 30 12:13:03 [2514] serverHA1       crmd:     info: do_log:  FSA: Input 
I_TE_SUCCESS from notify_crmd() received in state S_TRANSITION_ENGINE
Jan 30 12:13:03 [2514] serverHA1       crmd:   notice: do_state_transition:     
State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS 
cause=C_FSA_INTERNAL origin=notify_crmd ]





log serverHA2



Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: 
can_fence_host_with_device:     iscsi-stonith-device can fence (reboot) 
serverHA2: static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: remote_op_done:  
Operation reboot of serverHA2 by serverHA1 for 
stonith_admin.1907@serverHA1.70b75107: OK
Jan 30 12:13:11 [2631] serverHA2       crmd:     crit: tengine_stonith_notify:  
We were alegedly just fenced by serverHA1 for serverHA1!
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  Diff: 
--- 0.51.86 2
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  Diff: 
+++ 0.51.87 (null)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  +  
/cib:  @num_updates=87
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  +  
/cib/status/node_state[@id='serverHA2']:  
@crm-debug-origin=send_stonith_update, @join=down, @expected=down
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_process_request:     
Completed cib_modify operation for section status: OK (rc=0, 
origin=serverHA1/crmd/333, version=0.51.87)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  Diff: 
--- 0.51.87 2
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  Diff: 
+++ 0.51.88 (null)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  -- 
/cib/status/node_state[@id='serverHA2']/lrm[@id='serverHA2']
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  +  
/cib:  @num_updates=88
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_process_request:     
Completed cib_delete operation for section 
//node_state[@uname='serverHA2']/lrm: OK (rc=0, origin=serverHA1/crmd/334, 
version=0.51.88)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  Diff: 
--- 0.51.88 2
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  Diff: 
+++ 0.51.89 (null)
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  -- 
/cib/status/node_state[@id='serverHA2']/transient_attributes[@id='serverHA2']
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_perform_op:  +  
/cib:  @num_updates=89
Jan 30 12:13:11 [2626] serverHA2        cib:     info: cib_process_request:     
Completed cib_delete operation for section 
//node_state[@uname='serverHA2']/transient_attributes: OK (rc=0, 
origin=serverHA1/crmd/335, version=0.51.89)
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: 
can_fence_host_with_device:     iscsi-stonith-device can fence (on) serverHA2: 
static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: 
can_fence_host_with_device:     iscsi-stonith-device can fence (on) serverHA2: 
static-list
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:     info: 
stonith_fence_get_devices_cb:   Found 1 matching devices for 'serverHA2'
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: log_operation:   
Operation 'on' [3037] (call 9 from crmd.2514) for host 'serverHA2' with device 
'iscsi-stonith-device' returned: 0 (OK)
Jan 30 12:13:11 [2627] serverHA2 stonith-ng:   notice: remote_op_done:  
Operation on of serverHA2 by serverHA2 for crmd.2514@serverHA1.e19629dc: OK



I will continue testing....


Andrea


_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Two node cluster and no hardware device for stonith.

Reply via email to