Ouch. Yes definitely a bug. I seem to recall David looking into something like this recently... did you file a bug that he has been working on?
-- Andrew On Fri, Aug 10, 2012 at 8:08 PM, Kazunori INOUE <inouek...@intellilink.co.jp> wrote: > Hi, > I am using Pacemaker-1.1. > When stonith resource is in fencing, monitor of the stonith resource times > out. > > [test case] > 1. add sleep to reset() of stonith-plugin. > i am using libvirt. > > # diff -u libvirt.ORG libvirt > --- libvirt.ORG 2012-07-17 13:10:01.000000000 +0900 > +++ libvirt 2012-08-10 16:52:31.214366776 +0900 > @@ -204,6 +204,7 @@ > reset) > # libvirt has no reset so we do a power cycle > + sleep 3600 > libvirt_check_config > libvirt_set_domain_id $2 > > > 2. start a stonith resource (libvirt). > - stonith-timeout is 600s > - monitor interval is 10s > - monitor timeout is 60s > * stonith-timeout > monitor interval + monitor timeout > > # crm_mon > : > Stack: corosync > Current DC: dev1 (-1788499776) - partition with quorum > Version: 1.1.7-916198b > 2 Nodes configured, unknown expected votes > 1 Resources configured. > ============ > Online: [ dev1 dev2 ] > > Full list of resources: > f-dev2 (stonith:external/libvirt): Started dev1 > > > 3. require 'fence' and wait.. > # crm node fence dev2 > > and wait.. > then, although monitor of libvirt is performed, since libvirt is in > fencing by added sleep, monitor of the libvirt times out. > > # crm_mon > : > ============ > Node dev2 (-1822054208): UNCLEAN (online) > Online: [ dev1 ] > > Full list of resources: > f-dev2 (stonith:external/libvirt): Started dev1 FAILED > > Failed actions: > f-dev2_monitor_10000 (node=dev1, call=10, rc=1, status=Timed Out): > unknown error > > ---- > Aug 10 15:39:59 [6559] dev1 stonith-ng: info: call_remote_stonith: > Requesting that dev1 perform op reboot dev2 > Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_command: Processing > st_fence from dev1 ( 0) > Aug 10 15:39:59 [6559] dev1 stonith-ng: info: can_fence_host_with_device: > f-dev2 can fence dev2: dynamic-list > Aug 10 15:39:59 [6559] dev1 stonith-ng: info: stonith_fence: Found 1 > matching devices for 'dev2' > Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: schedule_stonith_command: > Scheduling reboot on f-dev2 for 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 > (timeout=601ms) > Aug 10 15:39:59 [6559] dev1 stonith-ng: info: stonith_command: Processed > st_fence from dev1: Operation now in progress (-115) > Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: > plugin=external/libvirt > Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: > hostlist=dev2 > Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: > hypervisor_uri=qemu+ssh://n7/system > Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: > action=reboot > Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: > nodename=dev2 > Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: make_args: Performing reboot > action for node 'dev2' as 'port=dev2' > Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: append_arg: Appending: > port=dev2 > Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: make_args: Calculated: > plugin=external/libvirt > hostlist=dev2 > hypervisor_uri=qemu+ssh://n7/system > action=reboot > nodename=dev2 > port=dev2 > Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent: forking > Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: run_stonith_agent: sending > args > Aug 10 15:39:59 [6559] dev1 stonith-ng: trace: run_stonith_agent: Op: reboot > on fence_legacy, pid: 6725, timeout: 601s > Aug 10 15:39:59 [6559] dev1 stonith-ng: debug: stonith_device_execute: > Operation reboot for node dev2 on f-dev2 now running with pid=6725, > timeout=601ms > Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_create_op: Sending > call options: 00000000, 0 > Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Sending > st_execute message to STONITH service, Timeout: 64s > Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Reply: > No data to dump as XML > Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_send_command: Async > call 8, returning > Aug 10 15:40:06 [6560] dev1 lrmd: trace: stonith_api_add_callback: > Added callback to lrmd_stonith_callback for call 8 > Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: stonith_command: Processing > st_execute from lrmd ( 0) > Aug 10 15:40:06 [6559] dev1 stonith-ng: debug: schedule_stonith_command: > Scheduling monitor on f-dev2 for 15cf9708-0b22-4e8e-a724-7e58f1014b99 > (timeout=64ms) > Aug 10 15:40:06 [6559] dev1 stonith-ng: info: stonith_command: Processed > st_execute from lrmd: Operation now in progress (-115) > Aug 10 15:40:59 [6559] dev1 stonith-ng: debug: remote_op_query_timeout: > Operation 15fb4a4b-4d91-4b8e-9e3d-8ca9394ce7e5 for dev2 already in progress > Aug 10 15:42:10 [6560] dev1 lrmd: error: stonith_async_timeout_handler: > Async call 8 timed out after 124000ms > Aug 10 15:42:10 [6560] dev1 lrmd: trace: stonith_perform_callback: > Invoking callback lrmd_stonith_callback for call 8 > Aug 10 15:42:10 [6560] dev1 lrmd: trace: stonith_perform_callback: OP > callback activated. > Aug 10 15:42:10 [6560] dev1 lrmd: debug: process_lrmd_message: > Processed lrmd_rsc_info operation from eadb7520-fdff-4231-a488-9fde0b45073c: > rc=0, reply=0, notify=0, exit=0 > Aug 10 15:42:10 [6563] dev1 crmd: error: process_lrm_event: LRM > operation f-dev2_monitor_10000 (10) Timed Out (timeout=64000ms) > > Best Regards, > Kazunori INOUE > > _______________________________________________ > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org > http://oss.clusterlabs.org/mailman/listinfo/pacemaker > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org