Re: [Pacemaker] PINGD or IPfail

2011-01-30 Thread Andrew Beekhof
On Sun, Jan 30, 2011 at 11:33 AM, paul harford wrote: > HI Guys > Can anyone help? i have ipfail configured in my ha.cf and when i check the > ha-log it can see the ping ipaddress is gone but the resources do not > failover > > i have tried pingd also but i don't think i had configured it properly

[Pacemaker] PINGD or IPfail

2011-01-30 Thread paul harford
HI Guys Can anyone help? i have ipfail configured in my ha.cf and when i check the ha-log it can see the ping ipaddress is gone but the resources do not failover i have tried pingd also but i don't think i had configured it properly (which is why i went to ipfail) Does anyone have a working pingd

Re: [Pacemaker] pingd process dies for no reason

2011-01-11 Thread Lars Ellenberg
if you want to. > >  monitor operation timeout=60s > > > > BTW, someone should really implement the fping based ping RA ... > > Thankyou for volunteering :-) :-P Date: Fri, 3 Sep 2010 12:12:58 +0200 From: Bernd Schubert

Re: [Pacemaker] pingd process dies for no reason

2011-01-11 Thread Andrew Beekhof
On Tue, Jan 11, 2011 at 2:45 PM, Lars Ellenberg wrote: > On Tue, Jan 11, 2011 at 11:24:35AM +0100, patrik.rappo...@knapp.com wrote: >> we already made changes to the interval and timeout (> id="pingd-op-monitor-30s" interval="30s" name="monitor" timeout="10s"/>). >> >> how big should dampen be set

[Pacemaker] pingd process dies for no reason

2011-01-11 Thread Patrik . Rapposch
cemaker cluster resource manager An pacemaker@oss.clusterlabs.org Kopie Thema Re: [Pacemaker] pingd process dies for no reason On Tue, Jan 11, 2011 at 11:24:35AM +0100, patrik.rappo...@knapp.com wrote: > we already made changes to the interval and timeout ( id="pingd-op-monitor-30s"

Re: [Pacemaker] pingd process dies for no reason

2011-01-11 Thread Lars Ellenberg
On Tue, Jan 11, 2011 at 11:24:35AM +0100, patrik.rappo...@knapp.com wrote: > we already made changes to the interval and timeout ( id="pingd-op-monitor-30s" interval="30s" name="monitor" timeout="10s"/>). > > how big should dampen be set? > > please correct me, if i am wrong, as i calculate it as

[Pacemaker] pingd process dies for no reason

2011-01-11 Thread Patrik . Rapposch
we already made changes to the interval and timeout (). how big should dampen be set? please correct me, if i am wrong, as i calculate it as following: assuming the last check was ok and in the next second, the failures takes place: then we there would be 29s till the next check will start, and

Re: [Pacemaker] pingd process dies for no reason

2011-01-07 Thread Michael Schwartzkopff
On Friday 07 January 2011 14:56:03 patrik.rappo...@knapp.com wrote: > Greetings, > > we have a problem, that the ping daemon dies for no reason and we can't > find why this happened. > > we use following versions on SLES 11.1: > > libpacemaker3-1.1.2-0.6.1 > pacemaker-mgmt-2.0.0-0.3.10 > pacemak

[Pacemaker] pingd process dies for no reason

2011-01-07 Thread Patrik . Rapposch
Greetings, we have a problem, that the ping daemon dies for no reason and we can't find why this happened. we use following versions on SLES 11.1: libpacemaker3-1.1.2-0.6.1 pacemaker-mgmt-2.0.0-0.3.10 pacemaker-mgmt-client-2.0.0-0.3.10 drbd-pacemaker-8.3.8.1-0.2.9 libpacemaker-devel-1.1.2-0.6.

Re: [Pacemaker] pingd location constraints

2010-12-14 Thread Simon Jansen
Hi, I just want to let you know that the problem is solved. Thanks to "crm_mon -f". ;) The error was the usage of the pingd RA. This RA worked very unreliable and is marked as deprecated though ( http://comments.gmane.org/gmane.linux.highavailability.user/32290). So after changing the RA to ocf:pa

Re: [Pacemaker] pingd location constraints

2010-12-14 Thread Simon Jansen
Hi Mike, thank you for the advice. Referring to my actual knowledge the resource stickiness just defines that a resource should remain to the node it is running on. The failover and failback actions are performed correctly with the location contraints that bind the resources to a specific node wh

Re: [Pacemaker] pingd location constraints

2010-12-13 Thread Mike Diehn
Simon, I'm new to this, so if this doesn't help, don't despair - the more experienced members will be along shortly. :-) Could it be that you need "stickyness?" I think that's the term for the concept you are describing. Also, if that's a two node cluster, have you defined cluster property no-q

Re: [Pacemaker] pingd location constraints

2010-12-13 Thread Simon Jansen
With the help of "crm_mon -f" I found out that just one node has the right pingd score. Migration summary: * Node node1: pingd=3 * Node node2: pingd=0 When I intiate a failover by setting node1 to standby the pingd score on node2 becomes 3. A ping that is run manually arrives the destina

[Pacemaker] pingd location constraints

2010-12-13 Thread Simon Jansen
Hi, I'm trying to set up location contraints for my cluster, but I don't get them to work in the way that I want. The constraints should implement the following behaviour: - Normal operation msDRBD0 and resIP0 start on node1, msDRBD1 and resIP1 start on node2 - Loss of network connection When the

Re: [Pacemaker] pingd problem about clone

2010-09-21 Thread Andrew Beekhof
On Fri, Sep 17, 2010 at 2:38 AM, jiaju liu wrote: > Clone Set: pingd_data_net > Started: [ oss3 oss2 oss1 ] > > I use the command : > > crm_resource -g host_list -r pingd_data_net > to check the param host_list > what does the resource definition look like? > > the result

[Pacemaker] pingd problem about clone

2010-09-16 Thread jiaju liu
Clone Set: pingd_data_net      Started: [ oss3 oss2 oss1 ]           I use the command :           crm_resource -g host_list -r pingd_data_net     to check the param host_list           the result is           pingd_data_net is active on more than one node, returning the default value for     Er

Re: [Pacemaker] pingd

2010-09-03 Thread Lars Ellenberg
On Fri, Sep 03, 2010 at 12:12:58PM +0200, Bernd Schubert wrote: > On Friday, September 03, 2010, Lars Ellenberg wrote: > > > > how about an fping RA ? > > > > active=$(fping -a -i 5 -t 250 -B1 -r1 $host_list 2>/dev/null | wc -l) > > > > > > > > terminates in about 3 seconds for a hostlist of 100 (

Re: [Pacemaker] pingd

2010-09-03 Thread Andrew Beekhof
On Fri, Sep 3, 2010 at 12:12 PM, Bernd Schubert wrote: > On Friday, September 03, 2010, Lars Ellenberg wrote: >> > > how about an fping RA ? >> > > active=$(fping -a -i 5 -t 250 -B1 -r1 $host_list 2>/dev/null | wc -l) >> > > >> > > terminates in about 3 seconds for a hostlist of 100 (on the LAN, 2

Re: [Pacemaker] pingd

2010-09-03 Thread Andrew Beekhof
On Fri, Sep 3, 2010 at 9:38 AM, Lars Ellenberg wrote: > On Thu, Sep 02, 2010 at 09:33:59PM +0200, Andrew Beekhof wrote: >> On Thu, Sep 2, 2010 at 4:05 PM, Lars Ellenberg >> wrote: >> > On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote: >> >> On Thursday, September 02, 2010, Andrew Be

Re: [Pacemaker] pingd

2010-09-03 Thread Lars Ellenberg
On Fri, Sep 03, 2010 at 12:12:58PM +0200, Bernd Schubert wrote: > > > >> PS: (*) As you insist ;) on quorum with n/2 + 1 nodes, we use ping as > > > >> replacement. We simply cannot fulfill n/2 + 1, as controller failure > > > >> takes down 50% of the systems (virtual machines) and the systems > >

Re: [Pacemaker] pingd

2010-09-03 Thread Bernd Schubert
On Friday, September 03, 2010, Lars Ellenberg wrote: > > > how about an fping RA ? > > > active=$(fping -a -i 5 -t 250 -B1 -r1 $host_list 2>/dev/null | wc -l) > > > > > > terminates in about 3 seconds for a hostlist of 100 (on the LAN, 29 of > > > which are alive). > > > > Happy to add if someone

Re: [Pacemaker] pingd

2010-09-03 Thread Lars Ellenberg
On Thu, Sep 02, 2010 at 09:33:59PM +0200, Andrew Beekhof wrote: > On Thu, Sep 2, 2010 at 4:05 PM, Lars Ellenberg > wrote: > > On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote: > >> On Thursday, September 02, 2010, Andrew Beekhof wrote: > >> > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Sc

Re: [Pacemaker] pingd

2010-09-02 Thread Andrew Beekhof
On Thu, Sep 2, 2010 at 4:05 PM, Lars Ellenberg wrote: > On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote: >> On Thursday, September 02, 2010, Andrew Beekhof wrote: >> > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert >> > > My proposal is to rip out all network code out of pingd and

Re: [Pacemaker] pingd

2010-09-02 Thread Bernd Schubert
On Thursday, September 02, 2010, Lars Ellenberg wrote: > On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote: > > On Thursday, September 02, 2010, Andrew Beekhof wrote: > > > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert > > > > > > > My proposal is to rip out all network code out of

Re: [Pacemaker] pingd

2010-09-02 Thread Lars Ellenberg
On Thu, Sep 02, 2010 at 11:00:12AM +0200, Bernd Schubert wrote: > On Thursday, September 02, 2010, Andrew Beekhof wrote: > > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert > > > My proposal is to rip out all network code out of pingd and to add > > > slightly modified files from 'iputils'. > > >

Re: [Pacemaker] pingd

2010-09-02 Thread Bernd Schubert
On Thursday, September 02, 2010, Andrew Beekhof wrote: > On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert > > My proposal is to rip out all network code out of pingd and to add > > slightly modified files from 'iputils'. > > Close, but thats not portable. > Instead use ocf:pacemaker:ping which goes

Re: [Pacemaker] pingd

2010-09-01 Thread Andrew Beekhof
On Wed, Sep 1, 2010 at 11:59 AM, Bernd Schubert wrote: > Andrew, > > I think pindg is rather broken: > > strace -f /usr/lib64/heartbeat/pingd -a pingdnet2  -d 5s -i 1 -n 2 -h > 10.0.1.16 > > (which is in fact localhost) > > In an  endless loop: > > sendmsg(4, {msg_name(16)={sa_family=AF_INET, sin_

[Pacemaker] pingd

2010-09-01 Thread Bernd Schubert
Andrew, I think pindg is rather broken: strace -f /usr/lib64/heartbeat/pingd -a pingdnet2 -d 5s -i 1 -n 2 -h 10.0.1.16 (which is in fact localhost) In an endless loop: sendmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.0.1.16")}, msg_iov(1)=[{"\10\0\324\

Re: [Pacemaker] pingd problems

2010-06-08 Thread Dalibor Dukic
On Tue, 2010-06-08 at 19:08 +0200, Dejan Muhamedagic wrote: > Not sure, but I think that the default for the attribute name is > "pingd". Try changing L3_ping to pingd in the constraints. Dejan, thanks a lot for pointing out error, I really appreciate it. I've changed the attribute name to 'pin

Re: [Pacemaker] pingd problems

2010-06-08 Thread Dejan Muhamedagic
Hi, On Tue, Jun 08, 2010 at 06:43:11PM +0200, Dalibor Dukic wrote: > On Sat, 2010-06-05 at 15:36 +0200, Dalibor Dukic wrote: > > I have problem with ping RA not correctly updating CIB with appropriate > > attributes when doing fresh start. So afterwards IPaddr2 resources wont > > start. > > Have

Re: [Pacemaker] pingd problems

2010-06-08 Thread Dalibor Dukic
On Sat, 2010-06-05 at 15:36 +0200, Dalibor Dukic wrote: > I have problem with ping RA not correctly updating CIB with appropriate > attributes when doing fresh start. So afterwards IPaddr2 resources wont > start. Have anyone had chance to get a peek at this? My setup consists from two nodes doi

[Pacemaker] pingd problems

2010-06-05 Thread Dalibor Dukic
Hi, I have problem with ping RA not correctly updating CIB with appropriate attributes when doing fresh start. So afterwards IPaddr2 resources wont start. I'm running Ubuntu 10.04 with pacemaker 1.0.8+hg15494-2ubuntu2 . Have seen people on list having same problems. I've corrected integer based

Re: [Pacemaker] pingd fails to update CIB

2010-03-25 Thread Marc Villacorta
I have the same problem as Quentin Smith (sticky pingd value="0"). [12:49:44 ha1] ~> rpm -qa | egrep -i "pacemaker|corosync|heartbeat|resource" heartbeat-libs-3.0.2-2.el5 heartbeat-3.0.2-2.el5 pacemaker-libs-1.0.8-1.el5 resource-agents-1.0.1-1.el5 corosync-1.2.0-1.el5 pacemaker-1.0.8-1.el5 coros

Re: [Pacemaker] pingd fails to update CIB

2010-03-15 Thread Andrew Beekhof
On Sat, Mar 13, 2010 at 1:26 AM, Quentin Smith wrote: > I don't know a lot about hg, but doesn't the "r15404" in the version of > pacemaker that I'm running now mean that I already have this bugfix > (r15295)? Yes. I'd suggest you use the ping RA instead of pingd. The ping RA uses your system's

Re: [Pacemaker] pingd fails to update CIB

2010-03-12 Thread Quentin Smith
I don't know a lot about hg, but doesn't the "r15404" in the version of pacemaker that I'm running now mean that I already have this bugfix (r15295)? --Quentin On Fri, 12 Mar 2010, hj lee wrote: Hi, This seems the same problem I reported a while ago. It was fixed in http://hg.clusterlabs.o

Re: [Pacemaker] pingd fails to update CIB

2010-03-12 Thread hj lee
Hi, This seems the same problem I reported a while ago. It was fixed in http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/214f0fc258f2. Thanks On Fri, Mar 12, 2010 at 2:36 PM, Quentin Smith wrote: > Hi- > > I just took the latest updates to pacemaker and heartbeat from > http://people.debian.

[Pacemaker] pingd fails to update CIB

2010-03-12 Thread Quentin Smith
Hi- I just took the latest updates to pacemaker and heartbeat from http://people.debian.org/~madkiss/ha. In particular, I upgraded heartbeat 1:3.0.2-1~bpo50+1 to 1:3.0.2+hg12547-2~bpo50+1 pacemaker 1.0.7+hg20100203-1~bpo50+1 to 1.0.7+hg20100303r15404-3~bpo50+1 cluster-agents 1:1.0.2-1~bpo50+1

Re: [Pacemaker] pingd comments and metadata

2009-06-04 Thread Florian Haas
On 06/04/2009 09:33 AM, Andrew Beekhof wrote: >> Do we even still have "configured ping nodes" in the original, ha.cf sense? > > For openais based clusters, no. > For heartbeat based ones, yes. I see. >> >> >> The name of the attributes to set. This is the name to be used in the >> constraint

Re: [Pacemaker] pingd comments and metadata

2009-06-04 Thread Andrew Beekhof
On Thu, Jun 4, 2009 at 8:47 AM, Florian Haas wrote: > Andrew, Dejan, Dominik, > > I am by no means a pingd expert, but the current incarnation in > stable-1.0 seems to have some outdated and misleading comments and meta > data. Examples: > > > > The list of ping nodes to count.  Defaults to all

[Pacemaker] pingd comments and metadata

2009-06-03 Thread Florian Haas
Andrew, Dejan, Dominik, I am by no means a pingd expert, but the current incarnation in stable-1.0 seems to have some outdated and misleading comments and meta data. Examples: The list of ping nodes to count. Defaults to all configured ping nodes. Rarely needs to be specified. Host list D

Re: [Pacemaker] PingD Failure-Timeout

2009-05-27 Thread Andrew Beekhof
On Tue, May 26, 2009 at 3:22 PM, Eliot Gable wrote: > I am using 1.0.3, but the failure-timeout thing does not seem to work for > pingd. > You'll have to show us the rest of your configuration ___ Pacemaker mailing list Pacemaker@oss.clusterlabs.org h

Re: [Pacemaker] PingD Failure-Timeout

2009-05-26 Thread Eliot Gable
...@beekhof.net] Sent: Monday, May 25, 2009 11:49 AM To: pacemaker@oss.clusterlabs.org Subject: Re: [Pacemaker] PingD Failure-Timeout On Thu, May 21, 2009 at 10:20 PM, Eliot Gable wrote: > Is there a way to time-out the failure of PingD? Yes, but you need version >= 1.0.0 I assume you're not runn

Re: [Pacemaker] PingD Failure-Timeout

2009-05-25 Thread Andrew Beekhof
On Thu, May 21, 2009 at 10:20 PM, Eliot Gable wrote: > Is there a way to time-out the failure of PingD? Yes, but you need version >= 1.0.0 I assume you're not running it as a clone right? > > > > In my configuration, I cannot run PingD all the time on every node. Only one > node (the master) has

Re: [Pacemaker] PingD Failure-Timeout

2009-05-21 Thread Eliot Gable
tended recipient, please call me immediately. BROADVOX is a registered trademark of Broadvox, LLC. From: Eliot Gable [mailto:ega...@broadvox.net] Sent: Thursday, May 21, 2009 4:20 PM To: pacemaker@oss.clusterlabs.org Subject: [Pacemaker] PingD Failure-Timeout Is there a way to time-out the fail

[Pacemaker] PingD Failure-Timeout

2009-05-21 Thread Eliot Gable
Is there a way to time-out the failure of PingD? In my configuration, I cannot run PingD all the time on every node. Only one node (the master) has public Internet access. I use PingD to cause the master to fail-over to one of the slaves. When a slave becomes master, it then gains public Intern

Re: [Pacemaker] pingd CPU usage increases slowly

2009-05-12 Thread Andrew Beekhof
Excellent news. The slowdown was probably related to the memory leak I fixed for 1.0.3 Let me know if you have any further problems On Tue, May 12, 2009 at 7:42 PM, Stelio Plautz wrote: > Am Tue, 12 May 2009 11:19:46 +0200 > schrieb Andrew Beekhof : > > Hi Andrew, > I've upgraded both nodes to 1

Re: [Pacemaker] pingd CPU usage increases slowly

2009-05-12 Thread Stelio Plautz
Am Tue, 12 May 2009 11:19:46 +0200 schrieb Andrew Beekhof : Hi Andrew, I've upgraded both nodes to 1.0.3 and it looks ok now. Thanks stelio > On Mon, May 11, 2009 at 12:01 PM, Stelio Plautz > wrote: > > Hi all, > > > > I've set up a 2 node cluster on debian etch amd64, pacemaker 1.0.2-1 > > an

Re: [Pacemaker] pingd CPU usage increases slowly

2009-05-12 Thread Andrew Beekhof
On Mon, May 11, 2009 at 12:01 PM, Stelio Plautz wrote: > Hi all, > > I've set up a 2 node cluster on debian etch amd64, pacemaker 1.0.2-1 > and heartbeat 2.99.1-1 from suse repository. > everything works fine, but pingd increases CPU usage slowly. I've two > pingd processes running and both use ab

[Pacemaker] pingd CPU usage increases slowly

2009-05-11 Thread Stelio Plautz
Hi all, I've set up a 2 node cluster on debian etch amd64, pacemaker 1.0.2-1 and heartbeat 2.99.1-1 from suse repository. everything works fine, but pingd increases CPU usage slowly. I've two pingd processes running and both use about 100 % CPU after 3 weeks. 14091 root 16 0 1299M 1263M