Re: [Pacemaker] inconsistence in crm_mon and crm resource show

Janec, Jozef Wed, 21 Mar 2012 03:53:36 -0700

> 
> On 2012-03-21T09:42:26, "Janec, Jozef" <jozef.ja...@hp.com> wrote:
> 
> > Node b300ple0: UNCLEAN (offline)
> >         rs_nw_dbjj7     (ocf::heartbeat:IPaddr) Started
> >         rs_nw_cijj7     (ocf::heartbeat:IPaddr) Started
> > Node b400ple0: online
> >         sbd_fense_SHARED2       (stonith:external/sbd) Started
> >
> > Inactive resources:
> >
> > rs_nw_cijj7    (ocf::heartbeat:IPaddr):        Started b300ple0
> > rs_nw_dbjj7    (ocf::heartbeat:IPaddr):        Started b300ple0
> >
> > b400ple0:(/root/home/root)(root)#crm resource show
> > rs_nw_cijj7    (ocf::heartbeat:IPaddr) Started
> > sbd_fense_SHARED2      (stonith:external/sbd) Started
> > rs_nw_dbjj7    (ocf::heartbeat:IPaddr) Started
> > b400ple0:(/root/home/root)(root)#
> >
> > b400ple0:(/root/home/root)(root)#/usr/sbin/crm_resource -W -r
> > rs_nw_cijj7 resource rs_nw_cijj7 is running on: b300ple0
> > b400ple0:(/root/home/root)(root)#
> >
> > but b300ple0 is down
> 
> Resources are still considered owned because the node wasn't fenced yet.
>


[Jozef Janec] 
Yes I can see in logs:

Mar 21 06:18:00 b400ple0 stonith-ng: [8603]: ERROR: log_operation: Operation 
'reboot' [3159] for host 'b300ple0' with device 'sbd_fense_SHARED2' returned: 1 
(call 0 from (null))
Mar 21 06:18:00 b400ple0 stonith-ng: [8603]: info: 
process_remote_stonith_execExecResult <st-reply 
st_origin="stonith_construct_async_reply" t="stonith-ng" st_op="st_notify" 
st_remote_op="5cb46419-bfdb-4115-85d9-6ec447b38823" st_callid="0" 
st_callopt="0" st_rc="1" st_output="Performing: stonith -t external/sbd -T 
reset b300ple0 failed: b300ple0 0.05859375" src="b400ple0" seq="172" />
Mar 21 06:18:06 b400ple0 stonith-ng: [8603]: ERROR: remote_op_timeout: Action 
reboot (5cb46419-bfdb-4115-85d9-6ec447b38823) for b300ple0 timed out
Mar 21 06:18:06 b400ple0 stonith-ng: [8603]: info: remote_op_done: Notifing 
clients of 5cb46419-bfdb-4115-85d9-6ec447b38823 (reboot of b300ple0 from 
a8125881-30df-4bd4-a5b1-666020a29eba by (null)): 1, rc=-7
Mar 21 06:18:06 b400ple0 crmd: [8608]: info: tengine_stonith_callbackStonithOp 
<remote-op state="1" st_target="b300ple0" st_op="reboot" />
Mar 21 06:18:06 b400ple0 stonith-ng: [8603]: info: stonith_notify_client: 
Sending st_fence-notification to client 
8608/bc1b0c7d-2cec-4e96-9523-5f6c51b52508
Mar 21 06:18:06 b400ple0 crmd: [8608]: info: tengine_stonith_callback: Stonith 
operation 44/15:49:0:44f2b175-7292-473a-a4e8-f9abda5b3ef6: Operation timed out 
(-7)
Mar 21 06:18:06 b400ple0 crmd: [8608]: ERROR: tengine_stonith_callback: Stonith 
of b300ple0 failed (-7)... aborting transition.
Mar 21 06:18:06 b400ple0 crmd: [8608]: info: abort_transition_graph: 
tengine_stonith_callback:401 - Triggered transition abort (complete=0) : 
Stonith failed


Because I reboted the ndoe manualy to simulate outage, and I haven't started 
the rcopenais the sbd daemon isn't started yet too

b400ple0:(/var/log/ha)(root)#/usr/sbin/sbd -d /dev/mapper/SHARED1_part1 list
0       b400ple0        clear
1       b300ple0        reset   b400ple0
b400ple0:(/var/log/ha)(root)#/usr/sbin/sbd -d /dev/mapper/SHARED2_part1  list
0       b300ple0        reset   b400ple0
1       b400ple0        clear

It is waiting till the sbd will pick up the command and reset this.

Question is where is located the information that the resource is still up it 
is in lrm part? I have found that I can use crm node clearstate which should 
set offline state on node and probably release the resources, but I want to 
find where exactly it is hidden. All information are located or should be 
located in cib, and I would like to know exactly which one is responsible for 
this behavior to understand it better

Best regards

Jozef

_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] inconsistence in crm_mon and crm resource show

Reply via email to