> > On 2012-03-21T09:42:26, "Janec, Jozef" <jozef.ja...@hp.com> wrote: > > > Node b300ple0: UNCLEAN (offline) > > rs_nw_dbjj7 (ocf::heartbeat:IPaddr) Started > > rs_nw_cijj7 (ocf::heartbeat:IPaddr) Started > > Node b400ple0: online > > sbd_fense_SHARED2 (stonith:external/sbd) Started > > > > Inactive resources: > > > > rs_nw_cijj7 (ocf::heartbeat:IPaddr): Started b300ple0 > > rs_nw_dbjj7 (ocf::heartbeat:IPaddr): Started b300ple0 > > > > b400ple0:(/root/home/root)(root)#crm resource show > > rs_nw_cijj7 (ocf::heartbeat:IPaddr) Started > > sbd_fense_SHARED2 (stonith:external/sbd) Started > > rs_nw_dbjj7 (ocf::heartbeat:IPaddr) Started > > b400ple0:(/root/home/root)(root)# > > > > b400ple0:(/root/home/root)(root)#/usr/sbin/crm_resource -W -r > > rs_nw_cijj7 resource rs_nw_cijj7 is running on: b300ple0 > > b400ple0:(/root/home/root)(root)# > > > > but b300ple0 is down > > Resources are still considered owned because the node wasn't fenced yet. >
[Jozef Janec] Yes I can see in logs: Mar 21 06:18:00 b400ple0 stonith-ng: [8603]: ERROR: log_operation: Operation 'reboot' [3159] for host 'b300ple0' with device 'sbd_fense_SHARED2' returned: 1 (call 0 from (null)) Mar 21 06:18:00 b400ple0 stonith-ng: [8603]: info: process_remote_stonith_execExecResult <st-reply st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_notify" st_remote_op="5cb46419-bfdb-4115-85d9-6ec447b38823" st_callid="0" st_callopt="0" st_rc="1" st_output="Performing: stonith -t external/sbd -T reset b300ple0 failed: b300ple0 0.05859375" src="b400ple0" seq="172" /> Mar 21 06:18:06 b400ple0 stonith-ng: [8603]: ERROR: remote_op_timeout: Action reboot (5cb46419-bfdb-4115-85d9-6ec447b38823) for b300ple0 timed out Mar 21 06:18:06 b400ple0 stonith-ng: [8603]: info: remote_op_done: Notifing clients of 5cb46419-bfdb-4115-85d9-6ec447b38823 (reboot of b300ple0 from a8125881-30df-4bd4-a5b1-666020a29eba by (null)): 1, rc=-7 Mar 21 06:18:06 b400ple0 crmd: [8608]: info: tengine_stonith_callbackStonithOp <remote-op state="1" st_target="b300ple0" st_op="reboot" /> Mar 21 06:18:06 b400ple0 stonith-ng: [8603]: info: stonith_notify_client: Sending st_fence-notification to client 8608/bc1b0c7d-2cec-4e96-9523-5f6c51b52508 Mar 21 06:18:06 b400ple0 crmd: [8608]: info: tengine_stonith_callback: Stonith operation 44/15:49:0:44f2b175-7292-473a-a4e8-f9abda5b3ef6: Operation timed out (-7) Mar 21 06:18:06 b400ple0 crmd: [8608]: ERROR: tengine_stonith_callback: Stonith of b300ple0 failed (-7)... aborting transition. Mar 21 06:18:06 b400ple0 crmd: [8608]: info: abort_transition_graph: tengine_stonith_callback:401 - Triggered transition abort (complete=0) : Stonith failed Because I reboted the ndoe manualy to simulate outage, and I haven't started the rcopenais the sbd daemon isn't started yet too b400ple0:(/var/log/ha)(root)#/usr/sbin/sbd -d /dev/mapper/SHARED1_part1 list 0 b400ple0 clear 1 b300ple0 reset b400ple0 b400ple0:(/var/log/ha)(root)#/usr/sbin/sbd -d /dev/mapper/SHARED2_part1 list 0 b300ple0 reset b400ple0 1 b400ple0 clear It is waiting till the sbd will pick up the command and reset this. Question is where is located the information that the resource is still up it is in lrm part? I have found that I can use crm node clearstate which should set offline state on node and probably release the resources, but I want to find where exactly it is hidden. All information are located or should be located in cib, and I would like to know exactly which one is responsible for this behavior to understand it better Best regards Jozef _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org