Hello, we have setup a cluster of 10 nodes to serve a Lustre filesystem to a computational cluster, with Pacemaker+Corosync to handle failover between hosts. Each host is connected to an ethernet network and an Infiniband, and we set up a `ping` resource to ensure that storage nodes can see compute nodes over the Infiniband network. The intention is to ensure that, if a storage node cannot communicate with compute nodes over IB, it should hand over resources to another storage node.
Here's the relevant section from `crm configure show`:: primitive ping ocf:pacemaker:ping \ params name=ping dampen=5s multiplier=10 host_list="lustre-mds1 ibr01c01b01n01 ...(24 hosts omitted)..." \ op start timeout=120 interval=0 \ op monitor timeout=60 interval=10 \ op stop timeout=20 interval=0 clone ping_clone ping \ meta globally-unique=false clone-node-max=1 is-managed=true target-role=Started # Bind OST locations to hosts that can actually support them. location mdt-location mdt \ [...] rule $id="mdt_only_if_ping_works" -INFINITY: not_defined ping or ping number:lte 0 In our understanding of the `ping` RA, this would add a score from 0 to 520, depending on how many compute nodes a storage node can ping. Since the resource stickiness is 2000, resources would only move if the `ping` RA failed completely and the host was totally cut off from the IB network. However, we have had a case last night of resources moving back and forth between two storage nodes; the only trace left in the logs is that `ping` failed everywhere, and some trouble reports from Corosync (which we cannot explain and could be the real cause):: May 28 00:29:19 lustre-mds1 ping(ping)[8147]: ERROR: Unexpected result for 'ping -n -q -W 5 -c 3 iblustre-mds1' 2: ping: unknown host iblustre-mds1 May 28 00:29:22 lustre-mds1 corosync[23879]: [TOTEM ] Incrementing problem counter for seqid 11125389 i face 10.129.93.10 to [9 of 10] May 28 00:29:25 lustre-mds1 corosync[23879]: [TOTEM ] Incrementing problem counter for seqid 11126239 i face 10.129.93.10 to [10 of 10] May 28 00:29:25 lustre-mds1 corosync[23879]: [TOTEM ] Marking seqid 11126239 ringid 0 interface 10.129. 93.10 FAULTY May 28 00:29:26 lustre-mds1 corosync[23879]: [TOTEM ] Automatically recovered ring 0 May 28 00:29:27 lustre-mds1 lrmd[23906]: warning: child_timeout_callback: ping_monitor_10000 process (PID 8147) timed out May 28 00:29:27 lustre-mds1 lrmd[23906]: warning: operation_finished: ping_monitor_10000:8147 - timed out after 60000ms May 28 00:29:27 lustre-mds1 crmd[23909]: error: process_lrm_event: Operation ping_monitor_10000: Timed Out (node=lustre-mds1.ften.es.hpcn.uzh.ch, call=267, timeout=60000ms) May 28 00:29:27 lustre-mds1 corosync[23879]: [TOTEM ] Incrementing problem counter for seqid 11126319 iface 10.129.93.10 to [1 of 10] May 28 00:29:27 lustre-mds1 crmd[23909]: warning: update_failcount: Updating failcount for ping on lustre-mds1.ften.es.hpcn.uzh.ch after failed monitor: rc=1 (update=value++, time=1401229767) [...] May 28 00:30:03 lustre-mds1 crmd[23909]: warning: update_failcount: Updating failcount for ping on lustre-oss1.ften.es.hpcn.uzh.ch after failed monitor: rc=1 (update=value++, time=1401229803) May 28 00:30:03 lustre-mds1 crmd[23909]: notice: run_graph: Transition 472 (Complete=7, Pending=0, Fired=0, Skipped=1, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-2770.bz2): Stopped May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:0 on lustre-oss4.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:1 on lustre-oss5.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:2 on lustre-oss6.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:3 on lustre-oss7.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:4 on lustre-oss8.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:5 on lustre-mds1.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:6 on lustre-mds2.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:7 on lustre-oss1.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:8 on lustre-oss2.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: warning: unpack_rsc_op_failure: Processing failed op monitor for ping:9 on lustre-oss3.ften.es.hpcn.uzh.ch: unknown error (1) May 28 00:30:03 lustre-mds1 pengine[23908]: notice: LogActions: Restart mdt#011(Started lustre-mds1.ften.es.hpcn.uzh.ch) May 28 00:30:03 lustre-mds1 pengine[23908]: notice: LogActions: Move mgt#011(Started lustre-mds2.ften.es.hpcn.uzh.ch -> lustre-mds1.ften.es.hpcn.uzh.ch) May 28 00:30:03 lustre-mds1 pengine[23908]: notice: LogActions: Restart ost00#011(Started lustre-oss1.ften.es.hpcn.uzh.ch) May 28 00:30:03 lustre-mds1 pengine[23908]: notice: LogActions: Restart ost01#011(Started lustre-oss3.ften.es.hpcn.uzh.ch) [...] So, questions: - is this the way one is supposed to use the `ping` RA, i.e., to compute a score based on the number of reachable test nodes? - or rather does the `ping` RA trigger failure events when even one of the nodes cannot be pinged? - could the ping failure have triggered the resource restart above? - any hints how to further debug the issue? Thank you for any help! Kind regards, Riccardo -- Riccardo Murri http://www.gc3.uzh.ch/people/rm Grid Computing Competence Centre University of Zurich Winterthurerstrasse 190, CH-8057 Zürich (Switzerland) Tel: +41 44 635 4222 Fax: +41 44 635 6888 _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org