Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Juha Heinanen
Steven Dake writes: > Self-healing is not as obvious or easy as it sounds. Totem (the > protocol) has no way to determine when the admin has replaced the faulty > switch in the network. why can't it keep on pinging the interface/ip address even if there is no response? how is it with pingd,

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Steven Dake
On Mon, 2009-05-25 at 18:32 +0300, Juha Heinanen wrote: > Florian Haas writes: > > > Agree that they're hacks, but disagree with your alternative. Why should > > Pacemaker be concerned with low-level OpenAIS recovery procedures? > > then have the variable in OpenAIS configuration. > Self-heal

Re: [Pacemaker] crm command line tool problem

2009-05-25 Thread Dejan Muhamedagic
On Mon, May 25, 2009 at 06:03:20PM +0200, Andrew Beekhof wrote: > On Mon, May 25, 2009 at 5:51 PM, Dejan Muhamedagic > wrote: > > Hi, > > > > On Fri, May 22, 2009 at 08:51:45AM -0700, Joe Armstrong wrote: > >> Hi All, > >> > >> I am playing around with the crm command line tool to create an > >>

Re: [Pacemaker] Redesigned Debian HA packages

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 6:05 PM, Juha Heinanen wrote: > i replaced my older packages with the new debian packages (heartbeat and > pacemaker-heartbeat) and my cluster came up automatically without a need > to change anything. > > regarding crm_mon, i would like to start it automatically to monitor

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 6:10 PM, Florian Haas wrote: > On 2009-05-25 17:45, Andrew Beekhof wrote: >> SUSE is currently recommending NIC bonding. >> We've not been able to get satisfactory behavior from clusters using RRP. > > I've repeatedly told customers that NIC bonding is not a valid > substit

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Florian Haas
On 2009-05-25 17:45, Andrew Beekhof wrote: > SUSE is currently recommending NIC bonding. > We've not been able to get satisfactory behavior from clusters using RRP. I've repeatedly told customers that NIC bonding is not a valid substitute for redundant Heartbeat links, I will stubbornly insist it

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Florian Haas
On 2009-05-25 17:45, Andrew Beekhof wrote: > SUSE is currently recommending NIC bonding. > We've not been able to get satisfactory behavior from clusters using RRP. I've repeatedly told customers that NIC bonding is not a valid substitute for redundant Heartbeat links, I will stubbornly insist it

[Pacemaker] Redesigned Debian HA packages

2009-05-25 Thread Juha Heinanen
i replaced my older packages with the new debian packages (heartbeat and pacemaker-heartbeat) and my cluster came up automatically without a need to change anything. regarding crm_mon, i would like to start it automatically to monitor the cluster and send alert emails if something happens, but no

Re: [Pacemaker] crm command line tool problem

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 5:51 PM, Dejan Muhamedagic wrote: > Hi, > > On Fri, May 22, 2009 at 08:51:45AM -0700, Joe Armstrong wrote: >> Hi All, >> >> I am playing around with the crm command line tool to create an >> HA config for pacemaker and am bumping into a problem. >> >> If I have a configurat

Re: [Pacemaker] crm command line tool problem

2009-05-25 Thread Dejan Muhamedagic
Hi, On Fri, May 22, 2009 at 08:51:45AM -0700, Joe Armstrong wrote: > Hi All, > > I am playing around with the crm command line tool to create an > HA config for pacemaker and am bumping into a problem. > > If I have a configuration running already, 3-node with ip & > httpd (pretty simple) and I

Re: [Pacemaker] PingD Failure-Timeout

2009-05-25 Thread Andrew Beekhof
On Thu, May 21, 2009 at 10:20 PM, Eliot Gable wrote: > Is there a way to time-out the failure of PingD? Yes, but you need version >= 1.0.0 I assume you're not running it as a clone right? > > > > In my configuration, I cannot run PingD all the time on every node. Only one > node (the master) has

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 5:08 PM, Florian Haas wrote: > Hello everyone, > > I realize this is primarily an OpenAIS issue, but let's discuss it here > anyway to share some thoughts. > > In Heartbeat-based clusters, we've always advised customers to use > redundant network communication links. Given

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Juha Heinanen
Florian Haas writes: > Agree that they're hacks, but disagree with your alternative. Why should > Pacemaker be concerned with low-level OpenAIS recovery procedures? then have the variable in OpenAIS configuration. -- juha ___ Pacemaker mailing list

Re: [Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Florian Haas
On 2009-05-25 17:18, Juha Heinanen wrote: > Florian Haas writes: > > > 1. Set rrp_problem_count_timeout and/or rrp_problem_count_threshold > > ridiculously high so the ring status never goes to faulty. (It seems > > that RRP "problem counting" can't be disabled altogether). > > > > 2. Have p

[Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Juha Heinanen
Florian Haas writes: > 1. Set rrp_problem_count_timeout and/or rrp_problem_count_threshold > ridiculously high so the ring status never goes to faulty. (It seems > that RRP "problem counting" can't be disabled altogether). > > 2. Have package maintainers include some magic that does > "open

[Pacemaker] Pacemaker on OpenAIS, RRP, and link failure

2009-05-25 Thread Florian Haas
Hello everyone, I realize this is primarily an OpenAIS issue, but let's discuss it here anyway to share some thoughts. In Heartbeat-based clusters, we've always advised customers to use redundant network communication links. Given the fact that most of the clusters we build are DRBD based, we pra

Re: [Pacemaker] [Fwd: Re: [RfC] Redesigned Debian HA packages, try 2 (was: try 1)]]

2009-05-25 Thread Martin Gerhard Loschwitz
Florian Haas wrote: > Hello, > > for some reason, Martin's post isn't making it through, so he asked me > to forward. Let's hope it works out this time... > > Cheers, > Florian > > Original Message > Subject: [Fwd: Re: [Pacemaker] [RfC] Redesigned Debian HA packages, try > 2 (w

Re: [Pacemaker] [Fwd: Re: [RfC] Redesigned Debian HA packages, try 2 (was: try 1)]]

2009-05-25 Thread Andrew Beekhof
On Mon, May 25, 2009 at 11:26 AM, Florian Haas wrote: > Hello, > > for some reason, He's not subscribed. Spammers were abusing the "hey you're not subscribed" reply so I've had to disable it. > Martin's post isn't making it through, so he asked me > to forward. Let's hope it works out this time.

Re: [Pacemaker] [RfC] Redesigned Debian HA packages, try 2 (was: try 1)

2009-05-25 Thread Martin Gerhard Loschwitz
Simon Horman schrieb: Hi, Has there been any progress with getting these packages into experimental? Hi folks, it's me again after some kind of longer outage; I overworked the packages and created new ones of the latest upstream versions. openais-legacy has lately been ACCEPTed into Experim

[Pacemaker] [Fwd: Re: [RfC] Redesigned Debian HA packages, try 2 (was: try 1)]]

2009-05-25 Thread Florian Haas
Hello, for some reason, Martin's post isn't making it through, so he asked me to forward. Let's hope it works out this time... Cheers, Florian Original Message Subject: [Fwd: Re: [Pacemaker] [RfC] Redesigned Debian HA packages, try 2 (was: try 1)] Date: Mon, 25 May 2009 07:14:3

Re: [Pacemaker] crm command line tool problem

2009-05-25 Thread Andrew Beekhof
Looks like a bug, can you post a hb_report archive of the scenario please? On Fri, May 22, 2009 at 5:51 PM, Joe Armstrong wrote: > Hi All, > > I am playing around with the crm command line tool to create an HA config for > pacemaker and am bumping into a problem. > > If I have a configuration ru

Re: [Pacemaker] cibadmin -E only temporarily erasing config

2009-05-25 Thread Andrew Beekhof
On Fri, May 22, 2009 at 11:19 PM, Jason Woodward wrote: > Hello, > > I am trying to get my config working, but have gone down the wrong path. >  Unfortunately, I can't erase my config and start over.  I try cibadmin -E > --force, which seems to work for about 10 seconds.  cibadmin -Q shows a > min