Also, in my constraints section, for the ping connectivity resource location definitions, a node attribute is not specified on rsc_location. What is the default value of node then?
Anlu On Fri, Feb 17, 2012 at 10:57 AM, Anlu Wang <a...@mixpanel.com> wrote: > I'm running 1.0.8. In accordance with the bug in the post you linked, I > changed the config so that interval is greater than dampen. Here is the > relevant section now: > > <clone id="connectivity_resource"> > <primitive class="ocf" id="ping" provider="pacemaker" type="ping"> > <instance_attributes id="ping-attrs"> > <nvpair id="pingd-dampen" name="dampen" value="5s"/> > <nvpair id="pingd-multiplier" name="multiplier" value="100"/> > <nvpair id="pingd-hosts" name="host_list" value="10.54.130.6 > 10.54.130.8 10.54.130.7 50.97.196.101 50.97.196.103 50.97.196.102"/> > </instance_attributes> > <operations> > <op id="ping-monitor-10s" interval="10s" name="monitor" > timeout="60s"/> > </operations> > </primitive> > <meta_attributes id="connectivity_resource-meta_attributes"> > <nvpair id="connectivity_resource-meta_attributes-target-role" > name="target-role" value="Started"/> > </meta_attributes> > </clone> > > The scores are still not what I expect however, and when I disable the > internal interface on a node, nothing happens with failover. > > Also, I've noticed this in my syslog: > > Feb 17 06:26:11 anlutest2 lrmd: [1137]: WARN: ping:1:monitor process (PID > 9380) timed out (try 1). Killing with signal SIGTERM (15). > Feb 17 06:26:11 anlutest2 lrmd: [1137]: info: RA output: > (ping:1:monitor:stderr) Terminated > Feb 17 06:26:11 anlutest2 ping[9380]: [15745]: INFO: They use TERM to > bring us down. No such luck. > Feb 17 06:26:11 anlutest2 ping[9380]: [15747]: ERROR: Unexpected result > for 'ping -n -q -W 3 -c 5 50.97.196.103' 143: > > So it looks like the ping command is failing for some reason, but when I > run it manually, it succeeds... > > Really at a loss here, any help is appreciated! > > Anlu > > On Fri, Feb 17, 2012 at 3:26 AM, Dejan Muhamedagic <deja...@fastmail.fm>wrote: > >> Hi, >> >> On Thu, Feb 16, 2012 at 07:57:14PM -0800, Anlu Wang wrote: >> > I have three machines named anlutest1, anlutest2, and anlutest3 that I'm >> > trying to get IP failover working on. I'm using heartbeat for the >> messaging >> > layer, and everything works great when a machine goes down. But I also >> > would like to failover an IP when EITHER the eth0 or eth1 network >> > interfaces fail. From reading >> > >> > >> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ch09s03s03.html >> > >> > it seems the right way to do this is to add a ping resource. >> > >> > Here is my XML configuration: >> > >> > http://pastebin.com/05z7eB2s >> >> The configuration seems OK, though obviously monitors are >> scheduled back-to-back (the postponed operations messages below). >> I guess that you should increase the intervals or reduce the >> dampen period. Which version of Pacemaker do you run? Perhaps >> also take a look at this thread: >> >> http://oss.clusterlabs.org/pipermail/pacemaker/2011-April/009942.html >> >> Thanks, >> >> Dejan >> >> > This config doesn't work for me. Using the showscores.sh script found >> at: >> > >> > http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg00410.html >> > >> > I see that my scores are: >> > >> > Resource Score Node Stickiness #Fail >> > Migration-Threshold >> > address01 0 anlutest3 0 0 >> > >> > address01 1006 anlutest1 0 5 >> > >> > address01 50 anlutest2 0 157 >> > >> > address02 0 anlutest3 0 0 >> > >> > address02 1050 anlutest2 0 2 >> > >> > address02 6 anlutest1 0 0 >> > >> > address03 1000 anlutest3 0 7 >> > >> > address03 50 anlutest2 0 >> > >> > address03 6 anlutest1 0 0 >> > >> > ping:0 0 anlutest1 0 6 >> > >> > ping:0 0 anlutest2 0 14 >> > >> > ping:0 0 anlutest3 0 0 >> > >> > ping:1 0 anlutest2 0 >> > >> > ping:1 0 anlutest3 0 28 >> > >> > ping:1 -1000000 anlutest1 0 0 >> > >> > ping:2 0 anlutest3 0 13 >> > >> > ping:2 -1000000 anlutest1 0 0 >> > >> > ping:2 -1000000 anlutest2 0 >> > >> > which make no sense at all. I don't see how I could be getting these >> scores >> > of 50 and 1006. When I take down an interface on anlutest3, I see >> scores of >> > 4 and 1004, which sort of make sense, just the multiplier of 100 isn't >> > working. I was experimenting with changing values, so maybe its caching >> old >> > values. If so, how do I enforce the new values? >> > >> > Furthermore, shouldn't there be no scores of 0? If all 6 IPs I am >> pinging >> > return successfully, shouldn't my scores be either 600 or 1600? >> > >> > In my syslog I also see a ton of messages like >> > >> > Feb 17 03:54:47 anlutest2 lrmd: [1137]: info: perform_op:2877: >> operations >> > on resource address01 already delayed >> > Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2873: operation >> > monitor[419] on ocf::ping::ping:1 for client 1140, its parameters: >> > CRM_meta_clone=[1] host_list=[10.54.130.6 10.54.130.8 10.54.130.7 >> > 50.97.196.101 50.97.196.103 50.9CRM_meta_clone_max=[3] dampen=[60s] >> > crm_feature_set=[3.0.1] CRM_meta_globally_unique=[false] >> multiplier=[10000] >> > CRM_meta_name=[monitor] CRM_meta_timeout=[60000] >> CRM_meta_interval=[5000] >> > for rsc is already running. >> > Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2883: >> postponing >> > all ops on resource ping:1 by 1000 ms >> > Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2873: operation >> > monitor[171] on ocf::ping::ping:2 for client 1140, its parameters: >> > CRM_meta_clone=[2] host_list=[10.54.130.6 10.54.130.8 10.54.130.7 >> > 50.97.196.101 50.97.196.103 50.9CRM_meta_clone_max=[3] dampen=[60s] >> > crm_feature_set=[3.0.1] CRM_meta_globally_unique=[false] multiplier=[1] >> > CRM_meta_name=[monitor] CRM_meta_timeout=[30000] >> CRM_meta_interval=[5000] >> > for rsc is already running. >> > Feb 17 03:54:48 anlutest2 lrmd: [1137]: info: perform_op:2883: >> postponing >> > all ops on resource ping:2 by 1000 ms >> > >> > and occasionally >> > >> > Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_trigger_update: >> > Sending flush op to all hosts for: pingd (4000) >> > Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_ha_callback: flush >> > message from anlutest2 >> > Feb 17 03:54:33 anlutest2 attrd: [1139]: WARN: find_nvpair_attr: >> Multiple >> > attributes match name=pingd >> > Feb 17 03:54:33 anlutest2 attrd: [1139]: info: find_nvpair_attr: >> Value: >> > 50 #011(id=status-d619a94e-ebba-4ed0-8e0f-89837dd7506b-pingd) >> > Feb 17 03:54:33 anlutest2 attrd: [1139]: info: find_nvpair_attr: >> Value: 3 >> > #011(id=status-ab3c1a25-9471-48f7-9c0b-c76238abd402-pingd) >> > Feb 17 03:54:33 anlutest2 attrd: [1139]: info: attrd_perform_update: >> Sent >> > update -40: pingd=4000 >> > Feb 17 03:54:33 anlutest2 attrd: [1139]: ERROR: attrd_cib_callback: >> Update >> > -40 for pingd=4000 failed: Required data for this CIB API call not found >> > >> > Could someone just take a look at my config and let me know what I'm >> doing >> > wrong? Or if there's a better way to do what I want to do... >> > >> > Thanks in advance, >> > Anlu >> >> > _______________________________________________ >> > Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> > >> > Project Home: http://www.clusterlabs.org >> > Getting started: >> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> > Bugs: http://bugs.clusterlabs.org >> >> >> _______________________________________________ >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >> >> Project Home: http://www.clusterlabs.org >> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >> Bugs: http://bugs.clusterlabs.org >> > >
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org