Re: [Linux-HA] Stonith - fencing mapped to error

Thomas Roth Wed, 14 Jan 2009 03:01:04 -0800

Hm, I might have misinterpreted something here. That's an Etch 64bit
system, and Heartbeat comes from the backports repository.
But you are right, it also says version 2.1.3:


aptitude show -v heartbeat
Package: heartbeat
State: installed
Automatically installed: no
Version: 2.1.3-6~bpo40+2
Priority: optional
Section: admin
Maintainer: Simon Horman <[email protected]>
Filename: pool/main/h/heartbeat/heartbeat_2.1.3-6~bpo40+2_amd64.deb
Archive: etch-backports, now
Provided by: heartbeat-2 (2.0.7-2)

It might also be due to the constant updates in these backport
repositories: perhaps my working installation is different from what's
right now in the repository.
Hm, right:
crmadmin --version
HA Version 2.1.3, CRM Version 2.0 (CIB feature set 2.0)

 Thanks,
Thomas

Andrew Beekhof wrote:
> On Tue, Jan 13, 2009 at 15:46, Thomas Roth <[email protected]> wrote:
>> Hi all,
>>
>> we are using external/ipmi on a drbd8 / heartbeat-2.0.7-2 as stonith
> 
> According to the cib you sent through, you actually have something
> based on 2.1.3
> However I can't find the changeset it lists
> (552305612591183b1628baa5bc6e903e0f1e26a3) anywhere and some of the PE
> logs look significantly newer than 2.1.3.
> 
> Where did you get the heartbeat package from?  It helps to know what
> I'm looking at before commenting.
> 
>> device. While the stonith itself works well, there are some errors which
>> recur and eventually lead to unwanted reboots. And mayby there are no
>> real causes but just my lack of understanding...
>> Anyhow, in the ha-debug log of server1 I see
>>> tengine[4459]: 2009/01/13_03:40:38 info: process_graph_event: Detected
>> action server1-fencing_monitor_120000 from a different transition: 29 vs. 31
>>> tengine[4459]: 2009/01/13_03:40:38 info: update_abort_priority: Abort
>> priority upgraded to 1000000
>>> tengine[4459]: 2009/01/13_03:40:38 WARN: update_failcount: Updating
>> failcount for server1-fencing on ebffe771-505c-4e40-b0b3-d70903ed37bc
>> after failed monitor: rc=14
>>
>> Then pengine goes on about that problem:
>>> pengine[4460]: 2009/01/13_03:40:38 info: determine_online_status: Node
>> server2 is online
>>> pengine[4460]: 2009/01/13_03:40:38 ERROR: unpack_rsc_op: Remapping
>> server1-fencing_monitor_120000 (rc=14) on server2 to an ERROR
>>> pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Processing
>> failed op server1-fencing_monitor_120000 on server2: Error
>>> pengine[4460]: 2009/01/13_03:40:38 info: determine_online_status: Node
>> server1 is online
>>> pengine[4460]: 2009/01/13_03:40:38 ERROR: unpack_rsc_op: Remapping
>> server2-fencing_start_0 (rc=1) on server1 to an ERROR
>>> pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Processing
>> failed op server2-fencing_start_0 on server1: Error
>>> pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Compatability
>> handling for failed op server2-fencing_start_0 on server1
>>
>> This was already the second instance of this problem last night, the
>> first occured 5 hours earlier, having no further consequences.
>> Now, however, pengine after a walk through all the resources  says
>>> pengine[4460]: 2009/01/13_03:40:38 WARN: stage6: Scheduling Node
>> server2 for STONITH
>>
>> There are some repetitions of this, until finally stonithd gives up:
>>> stonithd[3621]: 2009/01/13_03:42:19 ERROR: Failed to STONITH the node
>> server2: optype=RESET, op_result=TIMEOUT
>>
>>
>> The other machine, server2,  did not show any problems whatsoever that
>> should have triggered the action.
>> The intended victim this time was the slave. However, during the last
>> weekend both machines showed very strange behavior, among them rebooting
>> each other ( with a days break, there was no stonith-war going on, but
>> still...)
>>
>>
>> Now in the cib.xml, we have the following two primitives
>>
>>       <primitive class="stonith" type="external/ipmi"
>> provider="heartbeat" id="server1-fencing">
>>         <operations>
>>           <op id="server1-fencing-monitor" name="monitor"
>> interval="120s" timeout="70s" prereq="nothing" start_delay="0"
>> disabled="false" role="Started" on_fail="fence"/>
>>           <op id="server1-fencing-start" name="start" timeout="40s"
>> prereq="nothing" start_delay="0" disabled="false" role="Started"/>
>>         </operations>
>>         <instance_attributes id="server1-fencing-ia">
>>           <attributes>
>>             <nvpair id="server1-fencing-hostname" name="hostname"
>> value="server1"/>
>>             <nvpair id="server1-fencing-ipaddr" name="ipaddr"
>> value="1.1.1.1"/>
>>             <nvpair id="server1-fencing-userid" name="userid"
>> value="USER"/>
>>             <nvpair id="server1-fencing-passwd" name="passwd"
>> value="PASSWD"/>
>>           </attributes>
>>         </instance_attributes>
>>       </primitive>
>>       <primitive id="server2-fencing" class="stonith"
>> type="external/ipmi" provider="heartbeat">
>>         <operations>
>>           <op id="server2-fencing-monitor" name="monitor"
>> interval="120s" timeout="70s" prereq="nothing"/>
>>           <op id="server2-fencing-start" name="start" timeout="40s"
>> prereq="nothing"/>
>>         </operations>
>>         <instance_attributes id="server2-fencing-ia">
>>           <attributes>
>>             <nvpair id="server2-fencing-hostname" name="hostname"
>> value="server2"/>
>>             <nvpair id="server2-fencing-ipaddr" name="ipaddr"
>> value="1.1.1.2"/>
>>             <nvpair id="server2-fencing-userid" name="userid"
>> value="USER"/>
>>             <nvpair id="server2-fencing-passwd" name="passwd"
>> value="PASSWD"/>
>>           </attributes>
>>         </instance_attributes>
>>       </primitive>
>>
>>
>> The intent of this is to reboot one machine in case of a failover which
>> cannot be completed because failing heartbeat cannot release one of its
>> resources (Lustre mounts of drbd disks).
>>
>> So my question is of course whether there is something fundamentally
>> wrong with this configuration (I'm attaching the entire cib.xml and the
>> ha-debug for last night, btw). Or whether there should really be an
>> error somewhere else (hardware).
>>
>> Thanks a lot,
>> Thomas
>>
>>
>>
>> tengine[4459]: 2009/01/12_22:34:28 info: process_graph_event: Action 
>> server2-fencing_monitor_120000 arrived after a completed transition
>> tengine[4459]: 2009/01/12_22:34:28 info: update_abort_priority: Abort 
>> priority upgraded to 1000000
>> tengine[4459]: 2009/01/12_22:34:28 WARN: update_failcount: Updating 
>> failcount for server2-fencing on ebffe771-505c-4e40-b0b3-d70903ed37bc after 
>> failed monitor: rc=14
>> crmd[3623]: 2009/01/12_22:34:28 info: do_state_transition: State transition 
>> S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE 
>> origin=route_message ]
>> crmd[3623]: 2009/01/12_22:34:28 info: do_state_transition: All 2 cluster 
>> nodes are eligible to run resources.
>> tengine[4459]: 2009/01/12_22:34:28 info: extract_event: Aborting on 
>> transient_attributes changes for ebffe771-505c-4e40-b0b3-d70903ed37bc
>> pengine[4460]: 2009/01/12_22:34:28 info: determine_online_status: Node 
>> server2 is online
>> pengine[4460]: 2009/01/12_22:34:28 ERROR: unpack_rsc_op: Remapping 
>> server2-fencing_monitor_120000 (rc=14) on server2 to an ERROR
>> pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Processing failed op 
>> server2-fencing_monitor_120000 on server2: Error
>> pengine[4460]: 2009/01/12_22:34:28 info: determine_online_status: Node 
>> server1 is online
>> pengine[4460]: 2009/01/12_22:34:28 ERROR: unpack_rsc_op: Remapping 
>> server2-fencing_start_0 (rc=1) on server1 to an ERROR
>> pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Processing failed op 
>> server2-fencing_start_0 on server1: Error
>> pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Compatability 
>> handling for failed op server2-fencing_start_0 on server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: group_print: Resource Group: 
>> group_1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     commonIP   
>> (heartbeat::ocf:IPaddr):        Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     drbddisk_2 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     Filesystem_3    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     drbddisk_4 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     Filesystem_5    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     MailTo_6   
>> (heartbeat::ocf:MailTo):        Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print: server1-fencing     
>>    (stonith:external/ipmi):        Started server2
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print: server2-fencing     
>>    (stonith:external/ipmi):        Started server2 FAILED
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> commonIP        (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> drbddisk_2      (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> Filesystem_3    (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> drbddisk_4      (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> Filesystem_5    (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> MailTo_6        (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> server1-fencing (server2)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Recover resource 
>> server2-fencing       (server2)
>> pengine[4460]: 2009/01/12_22:34:28 notice: StopRsc:   server2   Stop 
>> server2-fencing
>> pengine[4460]: 2009/01/12_22:34:28 notice: StartRsc:  server2   Start 
>> server2-fencing
>> pengine[4460]: 2009/01/12_22:34:28 notice: RecurringOp: server2    
>> server2-fencing_monitor_120000
>> pengine[4460]: 2009/01/12_22:34:28 info: process_pe_message: Transition 30: 
>> PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-406.bz2
>> pengine[4460]: 2009/01/12_22:34:28 info: determine_online_status: Node 
>> server2 is online
>> pengine[4460]: 2009/01/12_22:34:28 ERROR: unpack_rsc_op: Remapping 
>> server2-fencing_monitor_120000 (rc=14) on server2 to an ERROR
>> pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Processing failed op 
>> server2-fencing_monitor_120000 on server2: Error
>> pengine[4460]: 2009/01/12_22:34:28 info: determine_online_status: Node 
>> server1 is online
>> pengine[4460]: 2009/01/12_22:34:28 ERROR: unpack_rsc_op: Remapping 
>> server2-fencing_start_0 (rc=1) on server1 to an ERROR
>> pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Processing failed op 
>> server2-fencing_start_0 on server1: Error
>> pengine[4460]: 2009/01/12_22:34:28 WARN: unpack_rsc_op: Compatability 
>> handling for failed op server2-fencing_start_0 on server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: group_print: Resource Group: 
>> group_1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     commonIP   
>> (heartbeat::ocf:IPaddr):        Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     drbddisk_2 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     Filesystem_3    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     drbddisk_4 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     Filesystem_5    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print:     MailTo_6   
>> (heartbeat::ocf:MailTo):        Started server1
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print: server1-fencing     
>>    (stonith:external/ipmi):        Started server2
>> pengine[4460]: 2009/01/12_22:34:28 notice: native_print: server2-fencing     
>>    (stonith:external/ipmi):        Started server2 FAILED
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> commonIP        (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> drbddisk_2      (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> Filesystem_3    (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> drbddisk_4      (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> Filesystem_5    (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> MailTo_6        (server1)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Leave resource 
>> server1-fencing (server2)
>> pengine[4460]: 2009/01/12_22:34:28 notice: NoRoleChange: Recover resource 
>> server2-fencing       (server2)
>> pengine[4460]: 2009/01/12_22:34:28 notice: StopRsc:   server2   Stop 
>> server2-fencing
>> pengine[4460]: 2009/01/12_22:34:28 notice: StartRsc:  server2   Start 
>> server2-fencing
>> pengine[4460]: 2009/01/12_22:34:28 notice: RecurringOp: server2    
>> server2-fencing_monitor_120000
>> crmd[3623]: 2009/01/12_22:34:28 info: do_state_transition: State transition 
>> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
>> cause=C_IPC_MESSAGE origin=route_message ]
>> tengine[4459]: 2009/01/12_22:34:28 info: unpack_graph: Unpacked transition 
>> 31: 4 actions in 4 synapses
>> tengine[4459]: 2009/01/12_22:34:28 info: te_pseudo_action: Pseudo action 13 
>> fired and confirmed
>> tengine[4459]: 2009/01/12_22:34:28 info: send_rsc_command: Initiating action 
>> 3: server2-fencing_stop_0 on server2
>> pengine[4460]: 2009/01/12_22:34:28 info: process_pe_message: Transition 31: 
>> PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-407.bz2
>> tengine[4459]: 2009/01/12_22:34:29 info: match_graph_event: Action 
>> server2-fencing_stop_0 (3) confirmed on server2 (rc=0)
>> tengine[4459]: 2009/01/12_22:34:29 info: send_rsc_command: Initiating action 
>> 32: server2-fencing_start_0 on server2
>> tengine[4459]: 2009/01/12_22:34:31 info: match_graph_event: Action 
>> server2-fencing_start_0 (32) confirmed on server2 (rc=0)
>> tengine[4459]: 2009/01/12_22:34:31 info: send_rsc_command: Initiating action 
>> 2: server2-fencing_monitor_120000 on server2
>> tengine[4459]: 2009/01/12_22:34:33 info: match_graph_event: Action 
>> server2-fencing_monitor_120000 (2) confirmed on server2 (rc=0)
>> tengine[4459]: 2009/01/12_22:34:33 info: run_graph: Transition 31: 
>> (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0)
>> tengine[4459]: 2009/01/12_22:34:33 info: notify_crmd: Transition 31 status: 
>> te_complete - <null>
>> crmd[3623]: 2009/01/12_22:34:33 info: do_state_transition: State transition 
>> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE 
>> origin=route_message ]
>> cib[3619]: 2009/01/12_22:42:19 info: cib_stats: Processed 49 operations 
>> (4285.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/12_22:52:19 info: cib_stats: Processed 39 operations 
>> (4102.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/12_23:02:19 info: cib_stats: Processed 40 operations 
>> (6000.00us average, 0% utilization) in the last 10min
>> lrmd[3620]: 2009/01/12_23:08:37 WARN: G_SIG_dispatch: Dispatch function for 
>> SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 
>> 0x514c58)
>> lrmd[3620]: 2009/01/12_23:08:37 info: G_SIG_dispatch: started at 1731484574 
>> should have started at 1731484474
>> cib[3619]: 2009/01/12_23:12:19 info: cib_stats: Processed 40 operations 
>> (4000.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/12_23:22:19 info: cib_stats: Processed 40 operations 
>> (5000.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/12_23:32:19 info: cib_stats: Processed 39 operations 
>> (7435.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/12_23:42:19 info: cib_stats: Processed 40 operations 
>> (5250.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/12_23:52:19 info: cib_stats: Processed 40 operations 
>> (6500.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_00:02:19 info: cib_stats: Processed 39 operations 
>> (5641.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_00:12:19 info: cib_stats: Processed 40 operations 
>> (6000.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_00:22:19 info: cib_stats: Processed 40 operations 
>> (4750.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_00:32:19 info: cib_stats: Processed 40 operations 
>> (4500.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_00:42:19 info: cib_stats: Processed 39 operations 
>> (5128.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_00:52:19 info: cib_stats: Processed 40 operations 
>> (4000.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_01:02:19 info: cib_stats: Processed 40 operations 
>> (5500.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_01:12:19 info: cib_stats: Processed 40 operations 
>> (4500.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_01:22:19 info: cib_stats: Processed 39 operations 
>> (4358.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_01:32:19 info: cib_stats: Processed 40 operations 
>> (4750.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_01:42:19 info: cib_stats: Processed 40 operations 
>> (7000.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_01:52:19 info: cib_stats: Processed 40 operations 
>> (5500.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_02:02:19 info: cib_stats: Processed 39 operations 
>> (5384.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_02:12:19 info: cib_stats: Processed 40 operations 
>> (5000.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_02:22:19 info: cib_stats: Processed 40 operations 
>> (6750.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_02:32:19 info: cib_stats: Processed 39 operations 
>> (5897.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_02:42:19 info: cib_stats: Processed 40 operations 
>> (3750.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_02:52:19 info: cib_stats: Processed 40 operations 
>> (5000.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_03:02:19 info: cib_stats: Processed 40 operations 
>> (5500.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_03:12:19 info: cib_stats: Processed 39 operations 
>> (6153.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_03:22:19 info: cib_stats: Processed 40 operations 
>> (4750.00us average, 0% utilization) in the last 10min
>> cib[3619]: 2009/01/13_03:32:19 info: cib_stats: Processed 40 operations 
>> (4750.00us average, 0% utilization) in the last 10min
>> tengine[4459]: 2009/01/13_03:40:38 info: process_graph_event: Detected 
>> action server1-fencing_monitor_120000 from a different transition: 29 vs. 31
>> tengine[4459]: 2009/01/13_03:40:38 info: update_abort_priority: Abort 
>> priority upgraded to 1000000
>> tengine[4459]: 2009/01/13_03:40:38 WARN: update_failcount: Updating 
>> failcount for server1-fencing on ebffe771-505c-4e40-b0b3-d70903ed37bc after 
>> failed monitor: rc=14
>> crmd[3623]: 2009/01/13_03:40:38 info: do_state_transition: State transition 
>> S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE 
>> origin=route_message ]
>> crmd[3623]: 2009/01/13_03:40:38 info: do_state_transition: All 2 cluster 
>> nodes are eligible to run resources.
>> pengine[4460]: 2009/01/13_03:40:38 info: determine_online_status: Node 
>> server2 is online
>> pengine[4460]: 2009/01/13_03:40:38 ERROR: unpack_rsc_op: Remapping 
>> server1-fencing_monitor_120000 (rc=14) on server2 to an ERROR
>> pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Processing failed op 
>> server1-fencing_monitor_120000 on server2: Error
>> pengine[4460]: 2009/01/13_03:40:38 info: determine_online_status: Node 
>> server1 is online
>> pengine[4460]: 2009/01/13_03:40:38 ERROR: unpack_rsc_op: Remapping 
>> server2-fencing_start_0 (rc=1) on server1 to an ERROR
>> pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Processing failed op 
>> server2-fencing_start_0 on server1: Error
>> pengine[4460]: 2009/01/13_03:40:38 WARN: unpack_rsc_op: Compatability 
>> handling for failed op server2-fencing_start_0 on server1
>> pengine[4460]: 2009/01/13_03:40:38 notice: group_print: Resource Group: 
>> group_1
>> pengine[4460]: 2009/01/13_03:40:38 notice: native_print:     commonIP   
>> (heartbeat::ocf:IPaddr):        Started server1
>> pengine[4460]: 2009/01/13_03:40:38 notice: native_print:     drbddisk_2 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/13_03:40:38 notice: native_print:     Filesystem_3    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/13_03:40:38 notice: native_print:     drbddisk_4 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/13_03:40:38 notice: native_print:     Filesystem_5    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/13_03:40:38 notice: native_print:     MailTo_6   
>> (heartbeat::ocf:MailTo):        Started server1
>> pengine[4460]: 2009/01/13_03:40:38 notice: native_print: server1-fencing     
>>    (stonith:external/ipmi):        Started server2 FAILED
>> pengine[4460]: 2009/01/13_03:40:38 notice: native_print: server2-fencing     
>>    (stonith:external/ipmi):        Started server2
>> pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource 
>> commonIP        (server1)
>> pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource 
>> drbddisk_2      (server1)
>> pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource 
>> Filesystem_3    (server1)
>> pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource 
>> drbddisk_4      (server1)
>> pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource 
>> Filesystem_5    (server1)
>> pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Leave resource 
>> MailTo_6        (server1)
>> pengine[4460]: 2009/01/13_03:40:38 notice: NoRoleChange: Recover resource 
>> server1-fencing       (server1)
>> pengine[4460]: 2009/01/13_03:40:38 notice: StopRsc:   server2   Stop 
>> server1-fencing
>> pengine[4460]: 2009/01/13_03:40:38 notice: StartRsc:  server1   Start 
>> server1-fencing
>> pengine[4460]: 2009/01/13_03:40:38 notice: RecurringOp: server1    
>> server1-fencing_monitor_120000
>> pengine[4460]: 2009/01/13_03:40:38 WARN: native_color: Resource 
>> server2-fencing cannot run anywhere
>> pengine[4460]: 2009/01/13_03:40:38 notice: StopRsc:   server2   Stop 
>> server2-fencing
>> pengine[4460]: 2009/01/13_03:40:38 WARN: stage6: Scheduling Node server2 for 
>> STONITH
>> pengine[4460]: 2009/01/13_03:40:38 WARN: native_stop_constraints: Stop of 
>> failed resource server1-fencing is implicit after server2 is fenced
>> pengine[4460]: 2009/01/13_03:40:38 info: native_stop_constraints: 
>> server2-fencing_stop_0 is implicit after server2 is fenced
>> crmd[3623]: 2009/01/13_03:40:38 info: do_state_transition: State transition 
>> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
>> cause=C_IPC_MESSAGE origin=route_message ]
>> tengine[4459]: 2009/01/13_03:40:38 info: unpack_graph: Unpacked transition 
>> 32: 7 actions in 7 synapses
>> tengine[4459]: 2009/01/13_03:40:38 info: te_pseudo_action: Pseudo action 2 
>> fired and confirmed
>> tengine[4459]: 2009/01/13_03:40:38 info: send_rsc_command: Initiating action 
>> 29: server1-fencing_start_0 on server1
>> tengine[4459]: 2009/01/13_03:40:38 info: te_pseudo_action: Pseudo action 31 
>> fired and confirmed
>> crmd[3623]: 2009/01/13_03:40:38 info: do_lrm_rsc_op: Performing 
>> op=server1-fencing_start_0 key=29:32:8686f9af-9ced-43ab-bf20-be6e8437abc0)
>> lrmd[3620]: 2009/01/13_03:40:38 info: rsc:server1-fencing: start
>> lrmd[27785]: 2009/01/13_03:40:38 info: Try to start STONITH resource 
>> <rsc_id=server1-fencing> : Device=external/ipmi
>> pengine[4460]: 2009/01/13_03:40:38 WARN: process_pe_message: Transition 32: 
>> WARNINGs found during PE processing. PEngine Input stored in: 
>> /var/lib/heartbeat/pengine/pe-warn-6.bz2
>> pengine[4460]: 2009/01/13_03:40:38 info: process_pe_message: Configuration 
>> WARNINGs found during PE processing.  Please run "crm_verify -L" to identify 
>> issues.
>> crmd[3623]: 2009/01/13_03:40:39 info: process_lrm_event: LRM operation 
>> server1-fencing_start_0 (call=51, rc=0) complete
>> tengine[4459]: 2009/01/13_03:40:39 info: match_graph_event: Action 
>> server1-fencing_start_0 (29) confirmed on server1 (rc=0)
>> tengine[4459]: 2009/01/13_03:40:39 info: send_rsc_command: Initiating action 
>> 30: server1-fencing_monitor_120000 on server1
>> tengine[4459]: 2009/01/13_03:40:39 info: te_pseudo_action: Pseudo action 32 
>> fired and confirmed
>> tengine[4459]: 2009/01/13_03:40:39 info: te_fence_node: Executing reboot 
>> fencing operation (33) on server2 (timeout=100000)
>> crmd[3623]: 2009/01/13_03:40:39 info: do_lrm_rsc_op: Performing 
>> op=server1-fencing_monitor_120000 
>> key=30:32:8686f9af-9ced-43ab-bf20-be6e8437abc0)
>> stonithd[3621]: 2009/01/13_03:40:39 info: client tengine [pid: 4459] want a 
>> STONITH operation RESET to node server2.
>> stonithd[3621]: 2009/01/13_03:40:39 info: Broadcasting the message 
>> succeeded: require others to stonith node server2.
>> crmd[3623]: 2009/01/13_03:40:39 info: process_lrm_event: LRM operation 
>> server1-fencing_monitor_120000 (call=52, rc=0) complete
>> tengine[4459]: 2009/01/13_03:40:39 info: match_graph_event: Action 
>> server1-fencing_monitor_120000 (30) confirmed on server1 (rc=0)
>> stonithd[3621]: 2009/01/13_03:42:19 ERROR: Failed to STONITH the node 
>> server2: optype=RESET, op_result=TIMEOUT
>> tengine[4459]: 2009/01/13_03:42:19 info: tengine_stonith_callback: call=-2, 
>> optype=1, node_name=server2, result=2, node_list=, 
>> action=33:32:8686f9af-9ced-43ab-bf20-be6e8437abc0
>> tengine[4459]: 2009/01/13_03:42:19 ERROR: tengine_stonith_callback: Stonith 
>> of server2 failed (2)... aborting transition.
>> tengine[4459]: 2009/01/13_03:42:19 info: update_abort_priority: Abort 
>> priority upgraded to 1000000
>> tengine[4459]: 2009/01/13_03:42:19 info: update_abort_priority: Abort action 
>> 0 superceeded by 2
>> tengine[4459]: 2009/01/13_03:42:19 info: run_graph: 
>> ====================================================
>> tengine[4459]: 2009/01/13_03:42:19 notice: run_graph: Transition 32: 
>> (Complete=6, Pending=0, Fired=0, Skipped=1, Incomplete=0)
>> crmd[3623]: 2009/01/13_03:42:19 info: do_state_transition: State transition 
>> S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE 
>> origin=route_message ]
>> crmd[3623]: 2009/01/13_03:42:19 info: do_state_transition: All 2 cluster 
>> nodes are eligible to run resources.
>> pengine[4460]: 2009/01/13_03:42:19 info: determine_online_status: Node 
>> server2 is online
>> pengine[4460]: 2009/01/13_03:42:19 ERROR: unpack_rsc_op: Remapping 
>> server1-fencing_monitor_120000 (rc=14) on server2 to an ERROR
>> pengine[4460]: 2009/01/13_03:42:19 WARN: unpack_rsc_op: Processing failed op 
>> server1-fencing_monitor_120000 on server2: Error
>> pengine[4460]: 2009/01/13_03:42:19 info: determine_online_status: Node 
>> server1 is online
>> pengine[4460]: 2009/01/13_03:42:19 ERROR: native_add_running: Resource 
>> stonith::external/ipmi:server1-fencing appears to be active on 2 nodes.
>> pengine[4460]: 2009/01/13_03:42:19 ERROR: See 
>> http://linux-ha.org/v2/faq/resource_too_active for more information.
>> pengine[4460]: 2009/01/13_03:42:19 ERROR: unpack_rsc_op: Remapping 
>> server2-fencing_start_0 (rc=1) on server1 to an ERROR
>> pengine[4460]: 2009/01/13_03:42:19 WARN: unpack_rsc_op: Processing failed op 
>> server2-fencing_start_0 on server1: Error
>> pengine[4460]: 2009/01/13_03:42:19 WARN: unpack_rsc_op: Compatability 
>> handling for failed op server2-fencing_start_0 on server1
>> pengine[4460]: 2009/01/13_03:42:19 notice: group_print: Resource Group: 
>> group_1
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print:     commonIP   
>> (heartbeat::ocf:IPaddr):        Started server1
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print:     drbddisk_2 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print:     Filesystem_3    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print:     drbddisk_4 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print:     Filesystem_5    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print:     MailTo_6   
>> (heartbeat::ocf:MailTo):        Started server1
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print: server1-fencing     
>>    (stonith:external/ipmi)
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print:        0 : server2
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print:        1 : server1
>> pengine[4460]: 2009/01/13_03:42:19 notice: native_print: server2-fencing     
>>    (stonith:external/ipmi):        Started server2
>> pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource 
>> commonIP        (server1)
>> pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource 
>> drbddisk_2      (server1)
>> pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource 
>> Filesystem_3    (server1)
>> pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource 
>> drbddisk_4      (server1)
>> pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource 
>> Filesystem_5    (server1)
>> pengine[4460]: 2009/01/13_03:42:19 notice: NoRoleChange: Leave resource 
>> MailTo_6        (server1)
>> pengine[4460]: 2009/01/13_03:42:19 ERROR: native_create_actions: Attempting 
>> recovery of resource server1-fencing
>> pengine[4460]: 2009/01/13_03:42:19 notice: StopRsc:   server2   Stop 
>> server1-fencing
>> pengine[4460]: 2009/01/13_03:42:19 notice: StopRsc:   server1   Stop 
>> server1-fencing
>> pengine[4460]: 2009/01/13_03:42:19 notice: StartRsc:  server1   Start 
>> server1-fencing
>> pengine[4460]: 2009/01/13_03:42:19 notice: RecurringOp: server1    
>> server1-fencing_monitor_120000
>> pengine[4460]: 2009/01/13_03:42:19 WARN: native_color: Resource 
>> server2-fencing cannot run anywhere
>> pengine[4460]: 2009/01/13_03:42:19 notice: StopRsc:   server2   Stop 
>> server2-fencing
>> pengine[4460]: 2009/01/13_03:42:19 WARN: stage6: Scheduling Node server2 for 
>> STONITH
>> pengine[4460]: 2009/01/13_03:42:19 WARN: native_stop_constraints: Stop of 
>> failed resource server1-fencing is implicit after server2 is fenced
>> pengine[4460]: 2009/01/13_03:42:19 info: native_stop_constraints: 
>> server2-fencing_stop_0 is implicit after server2 is fenced
>> crmd[3623]: 2009/01/13_03:42:19 info: do_state_transition: State transition 
>> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
>> cause=C_IPC_MESSAGE origin=route_message ]
>> tengine[4459]: 2009/01/13_03:42:19 info: unpack_graph: Unpacked transition 
>> 33: 8 actions in 8 synapses
>> tengine[4459]: 2009/01/13_03:42:19 info: te_pseudo_action: Pseudo action 2 
>> fired and confirmed
>> tengine[4459]: 2009/01/13_03:42:19 info: te_pseudo_action: Pseudo action 32 
>> fired and confirmed
>> tengine[4459]: 2009/01/13_03:42:19 notice: run_graph: 
>> ====================================================
>> tengine[4459]: 2009/01/13_03:42:19 WARN: run_graph: Transition 33: 
>> (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=6)
>> tengine[4459]: 2009/01/13_03:42:19 ERROR: te_graph_trigger: Transition 
>> failed: terminated
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Graph 33 (8 actions in 
>> 8 synapses): batch-limit=30 jobs, network-delay=60000ms
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 0 was 
>> confirmed (priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 1 is pending 
>> (priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:     [Action 8]: Pending 
>> (id: server1-fencing_monitor_120000, loc: server1, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:      * [Input 31]: 
>> Pending (id: server1-fencing_start_0, loc: server1, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 2 is pending 
>> (priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:     [Action 30]: 
>> Pending (id: server1-fencing_stop_0, loc: server1, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:      * [Input 13]: 
>> Pending (id: all_stopped, type: pseduo, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 3 is pending 
>> (priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:     [Action 31]: 
>> Pending (id: server1-fencing_start_0, loc: server1, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:      * [Input 2]: 
>> Completed (id: server1-fencing_stop_0, type: pseduo, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:      * [Input 30]: 
>> Pending (id: server1-fencing_stop_0, loc: server1, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 4 was 
>> confirmed (priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 5 is pending 
>> (priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:     [Action 13]: 
>> Pending (id: all_stopped, type: pseduo, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:      * [Input 34]: 
>> Pending (id: stonith, loc: server2, type: crm, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 6 is pending 
>> (priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:     [Action 33]: 
>> Pending (id: stonith_up, type: pseduo, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:      * [Input 31]: 
>> Pending (id: server1-fencing_start_0, loc: server1, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_graph: Synapse 7 is pending 
>> (priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:     [Action 34]: 
>> Pending (id: stonith, loc: server2, type: crm, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 WARN: print_elem:      * [Input 33]: 
>> Pending (id: stonith_up, type: pseduo, priority: 0)
>> tengine[4459]: 2009/01/13_03:42:19 info: notify_crmd: Transition 33 status: 
>> te_complete - <null>
>> crmd[3623]: 2009/01/13_03:42:19 info: do_state_transition: State transition 
>> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE 
>> origin=route_message ]
>> pengine[4460]: 2009/01/13_03:42:19 ERROR: process_pe_message: Transition 33: 
>> ERRORs found during PE processing. PEngine Input stored in: 
>> /var/lib/heartbeat/pengine/pe-error-22.bz2
>> pengine[4460]: 2009/01/13_03:42:19 info: process_pe_message: Configuration 
>> WARNINGs found during PE processing.  Please run "crm_verify -L" to identify 
>> issues.
>> cib[3619]: 2009/01/13_03:42:19 info: cib_stats: Processed 47 operations 
>> (4680.00us average, 0% utilization) in the last 10min
>> tengine[4459]: 2009/01/13_03:42:38 info: process_graph_event: Detected 
>> action server1-fencing_monitor_120000 from a different transition: 29 vs. 33
>> tengine[4459]: 2009/01/13_03:42:38 info: update_abort_priority: Abort 
>> priority upgraded to 1000000
>> crmd[3623]: 2009/01/13_03:42:38 info: do_state_transition: State transition 
>> S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE 
>> origin=route_message ]
>> crmd[3623]: 2009/01/13_03:42:38 info: do_state_transition: All 2 cluster 
>> nodes are eligible to run resources.
>> pengine[4460]: 2009/01/13_03:42:38 info: determine_online_status: Node 
>> server2 is online
>> pengine[4460]: 2009/01/13_03:42:38 info: determine_online_status: Node 
>> server1 is online
>> pengine[4460]: 2009/01/13_03:42:38 ERROR: native_add_running: Resource 
>> stonith::external/ipmi:server1-fencing appears to be active on 2 nodes.
>> pengine[4460]: 2009/01/13_03:42:38 ERROR: See 
>> http://linux-ha.org/v2/faq/resource_too_active for more information.
>> pengine[4460]: 2009/01/13_03:42:38 ERROR: unpack_rsc_op: Remapping 
>> server2-fencing_start_0 (rc=1) on server1 to an ERROR
>> pengine[4460]: 2009/01/13_03:42:38 WARN: unpack_rsc_op: Processing failed op 
>> server2-fencing_start_0 on server1: Error
>> pengine[4460]: 2009/01/13_03:42:38 WARN: unpack_rsc_op: Compatability 
>> handling for failed op server2-fencing_start_0 on server1
>> pengine[4460]: 2009/01/13_03:42:38 notice: group_print: Resource Group: 
>> group_1
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print:     commonIP   
>> (heartbeat::ocf:IPaddr):        Started server1
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print:     drbddisk_2 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print:     Filesystem_3    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print:     drbddisk_4 
>> (heartbeat:drbddisk):   Started server1
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print:     Filesystem_5    
>>    (heartbeat::ocf:Filesystem):    Started server1
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print:     MailTo_6   
>> (heartbeat::ocf:MailTo):        Started server1
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print: server1-fencing     
>>    (stonith:external/ipmi)
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print:        0 : server2
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print:        1 : server1
>> pengine[4460]: 2009/01/13_03:42:38 notice: native_print: server2-fencing     
>>    (stonith:external/ipmi):        Started server2
>> pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource 
>> commonIP        (server1)
>> pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource 
>> drbddisk_2      (server1)
>> pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource 
>> Filesystem_3    (server1)
>> pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource 
>> drbddisk_4      (server1)
>> pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource 
>> Filesystem_5    (server1)
>> pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource 
>> MailTo_6        (server1)
>> pengine[4460]: 2009/01/13_03:42:38 ERROR: native_create_actions: Attempting 
>> recovery of resource server1-fencing
>> pengine[4460]: 2009/01/13_03:42:38 notice: StopRsc:   server2   Stop 
>> server1-fencing
>> pengine[4460]: 2009/01/13_03:42:38 notice: StopRsc:   server1   Stop 
>> server1-fencing
>> pengine[4460]: 2009/01/13_03:42:38 notice: StartRsc:  server2   Start 
>> server1-fencing
>> pengine[4460]: 2009/01/13_03:42:38 notice: RecurringOp: server2    
>> server1-fencing_monitor_120000
>> pengine[4460]: 2009/01/13_03:42:38 notice: NoRoleChange: Leave resource 
>> server2-fencing (server2)
>> crmd[3623]: 2009/01/13_03:42:38 info: do_state_transition: State transition 
>> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS 
>> cause=C_IPC_MESSAGE origin=route_message ]
>> tengine[4459]: 2009/01/13_03:42:38 info: unpack_graph: Unpacked transition 
>> 34: 5 actions in 5 synapses
>> tengine[4459]: 2009/01/13_03:42:38 info: te_pseudo_action: Pseudo action 13 
>> fired and confirmed
>> tengine[4459]: 2009/01/13_03:42:38 info: send_rsc_command: Initiating action 
>> 30: server1-fencing_stop_0 on server2
>> tengine[4459]: 2009/01/13_03:42:38 info: send_rsc_command: Initiating action 
>> 31: server1-fencing_stop_0 on server1
>> crmd[3623]: 2009/01/13_03:42:38 info: do_lrm_rsc_op: Performing 
>> op=server1-fencing_stop_0 key=31:34:8686f9af-9ced-43ab-bf20-be6e8437abc0)
>> lrmd[3620]: 2009/01/13_03:42:38 info: rsc:server1-fencing: stop
>> crmd[3623]: 2009/01/13_03:42:38 info: process_lrm_event: LRM operation 
>> server1-fencing_monitor_120000 (call=52, rc=-2) Cancelled
>> pengine[4460]: 2009/01/13_03:42:38 ERROR: process_pe_message: Transition 34: 
>> ERRORs found during PE processing. PEngine Input stored in: 
>> /var/lib/heartbeat/pengine/pe-error-23.bz2
>> lrmd[28014]: 2009/01/13_03:42:38 info: Try to stop STONITH resource 
>> <rsc_id=server1-fencing> : Device=external/ipmi
>> crmd[3623]: 2009/01/13_03:42:38 info: process_lrm_event: LRM operation 
>> server1-fencing_stop_0 (call=53, rc=0) complete
>> tengine[4459]: 2009/01/13_03:42:38 info: match_graph_event: Action 
>> server1-fencing_stop_0 (31) confirmed on server1 (rc=0)
>> tengine[4459]: 2009/01/13_03:42:39 info: match_graph_event: Action 
>> server1-fencing_stop_0 (30) confirmed on server2 (rc=0)
>> tengine[4459]: 2009/01/13_03:42:39 info: send_rsc_command: Initiating action 
>> 32: server1-fencing_start_0 on server2
>> tengine[4459]: 2009/01/13_03:42:41 info: match_graph_event: Action 
>> server1-fencing_start_0 (32) confirmed on server2 (rc=0)
>> tengine[4459]: 2009/01/13_03:42:41 info: send_rsc_command: Initiating action 
>> 1: server1-fencing_monitor_120000 on server2
>> tengine[4459]: 2009/01/13_03:42:42 info: match_graph_event: Action 
>> server1-fencing_monitor_120000 (1) confirmed on server2 (rc=0)
>> tengine[4459]: 2009/01/13_03:42:42 info: run_graph: Transition 34: 
>> (Complete=5, Pending=0, Fired=0, Skipped=0, Incomplete=0)
>> tengine[4459]: 2009/01/13_03:42:42 info: notify_crmd: Transition 34 status: 
>> te_complete - <null>
>> crmd[3623]: 2009/01/13_03:42:42 info: do_state_transition: State transition 
>> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE 
>> origin=route_message ]
>> cib[3619]: 2009/01/13_03:52:19 info: cib_stats: Processed 47 operations 
>> (6382.00us average, 0% utilization) in the last 10min
>> lrmd[3620]: 2009/01/13_03:59:32 WARN: G_SIG_dispatch: Dispatch function for 
>> SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 
>> 0x514c58)
>> lrmd[3620]: 2009/01/13_03:59:32 info: G_SIG_dispatch: started at 1733230017 
>> should have started at 1733229917
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986

GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
D-64291 Darmstadt
www.gsi.de

Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528

Geschäftsführer: Professor Dr. Horst Stöcker

Vorsitzende des Aufsichtsrates: Dr. Beatrix Vierkorn-Rudolph,
Stellvertreter: Ministerialdirigent Dr. Rolf Bernhardt
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Stonith - fencing mapped to error

Reply via email to