Re: [Pacemaker] Master/Slave not failing over

2010-06-25 Thread Eliot Gable
I modified my resource to set "migration-threshold=1" and "failure-timeout=5s". Now the resource is finally switching to Master on the slave node when the original master fails. However, shortly after it switches to Master, it reports FAILED_MASTER and fails back over. Looking at the logs, I see

Re: [Pacemaker] Trouble getting stonith with external/ipmi to work

2010-06-25 Thread Bart Willems
Hi Dejan, I 'm working with the IP resource as an example to test and get stonith up and running. The real resource I want HA for will be external storage and I need to be sure it doesn't get mounted simultaneously by 2 different nodes. Thanks, Bart -Original Message- From: Dejan Muhame

Re: [Pacemaker] Trouble getting stonith with external/ipmi to work

2010-06-25 Thread Dejan Muhamedagic
Hi, On Fri, Jun 25, 2010 at 11:58:01AM -0500, Bart Willems wrote: > I am setting SLES11 SP1 HA on 2 nodes and would like to use external/ipmi > for stonith. I have setup a resource that successfully migrates an IP from > node1 to node2 when I turn off openais on node1, and migrates back when I > t

Re: [Pacemaker] Master/Slave not failing over

2010-06-25 Thread Eliot Gable
When I issue the 'ip addr flush eth1' command on the Master node (node-2), it detects the failure of network resources that my resource agent monitors. Then I get alerts from my RA for the following actions: frs ERROR on node-2 at Fri Jun 25 08:14:44 2010 EDT: 192.168.3.4/24 is in a failed stat

Re: [Pacemaker] Master/Slave not failing over

2010-06-25 Thread Eliot Gable
Ok; I'm still not having any luck with this In my START action, right before I return $OCF_SUCCESS, I do: $CRM_MASTER -v 100 Where CRM_MASTER is defined as in the drbd resource with '-l reboot'. In my STOP action, right at the beginning of it, I do: $CRM_MASTER -D I copied the new RA to b

[Pacemaker] Trouble getting stonith with external/ipmi to work

2010-06-25 Thread Bart Willems
I am setting SLES11 SP1 HA on 2 nodes and would like to use external/ipmi for stonith. I have setup a resource that successfully migrates an IP from node1 to node2 when I turn off openais on node1, and migrates back when I turn openais back on on node1. Stonith is not rebooting or powering off node

Re: [Pacemaker] Master/Slave not failing over

2010-06-25 Thread Eliot Gable
After looking at the drbd master/slave RA, I think it is now clear. It looks like crm_master, being a wrapper for crm_attribute, actually specifies everything I need, and all I need to add to the command line are the few additional options like lifetime of the attribute modification, value to se

Re: [Pacemaker] Master/Slave not failing over

2010-06-25 Thread Eliot Gable
Thanks. Should I update my RA to use crm_master when it detects the resource in FAILED_MASTER state, or should I put it in the demote action or something else? What's the command line needed to "reduce the promotion score"? I looked at the Pacemaker_Explained.pdf document, and while it mentions

Re: [Pacemaker] small gfs question

2010-06-25 Thread Andrew Beekhof
On Thu, Jun 24, 2010 at 11:49 AM, Robert Lindgren wrote: > > > On Thu, Jun 24, 2010 at 11:45 AM, Andrew Beekhof wrote: >> >> On Thu, Jun 24, 2010 at 10:35 AM, Robert Lindgren >> wrote: >> > >> > >> > On Thu, Jun 24, 2010 at 9:58 AM, Andrew Beekhof >> > wrote: >> >> >> >> On Thu, Jun 24, 2010 at

Re: [Pacemaker] rsc_op: Hard error - res_Nagios_monitor_0 failedwith rc=6: Preventing res_Nagios from re-starting anywhere inthe cluster

2010-06-25 Thread Andrew Beekhof
On Thu, Jun 24, 2010 at 1:54 PM, Koch, Sebastian wrote: > Hi, > > thanks for your reply. It wasn't clear to me that pacemaker is issuing status > commands in the background even on the passive node. We run a single monitor op for each resource on each node when it joins the cluster. This is the

Re: [Pacemaker] RFC: stonith-enabled="error-recovery"

2010-06-25 Thread Andrew Beekhof
On Thu, Jun 24, 2010 at 5:46 PM, Lars Marowsky-Bree wrote: > Hi, > > this is about a new setting for stonith mode. > > Basically, a node failure would not cause a fence - the node would be > trusted to be truly down and have self-fenced. (Certain hardware > infrastructures can guarantee this, and

Re: [Pacemaker] Master/Slave not failing over

2010-06-25 Thread Michael Fung
Please let me know if I am wrong: This requirement can be satisfied by customizing the used RA. Thanks, Michael > On Fri, Jun 25, 2010 at 12:43 AM, Eliot Gable wrote: >> I am still having issues with the master/slave resource. When I cause one of >> the monitoring actions to fail, > > as well

Re: [Pacemaker] RFC: stonith-enabled="error-recovery"

2010-06-25 Thread Maros Timko
> Date: Thu, 24 Jun 2010 17:46:39 +0200 > From: Lars Marowsky-Bree > To: pacemaker@oss.clusterlabs.org > Subject: [Pacemaker] RFC: stonith-enabled="error-recovery" > Message-ID: <20100624154639.gf5...@suse.de> > Content-Type: text/plain; charset=iso-8859-1 > > Hi, > > this is about a new setting f

Re: [Pacemaker] ClusterMon failing: call=220, rc=1, status=complete): unknown error

2010-06-25 Thread Koch, Sebastian
Hi, i found the error. It was failing on just one note and it was always the passive node. I had a broken symlink from /var/www to my drbd device. After fixing it the ClusterMonitor runs just fine. Best Regards, Sebastian Koch   -Ursp

Re: [Pacemaker] Master/Slave not failing over

2010-06-25 Thread Andrew Beekhof
On Fri, Jun 25, 2010 at 12:43 AM, Eliot Gable wrote: > Thanks for pointing that out. > > I am still having issues with the master/slave resource. When I cause one of > the monitoring actions to fail, as well as failing it should also use crm_master to reduce the promotion score > the master nod

Re: [Pacemaker] ERROR: Couldn't mount filesystem /dev/mapper/XXX on /mnt

2010-06-25 Thread marc genou
It was that. Thank you On Fri, Jun 25, 2010 at 9:44 AM, Marc Mertes wrote: > Hi, > I think the problem is that you don't mount the drbd device like /dev/drbd0 > or /dev/drbd1 (as set in drbd.conf). > > You can not mount the real partition, you ever have to use the drbd device. > > Regards Marc

Re: [Pacemaker] ERROR: Couldn't mount filesystem /dev/mapper/XXX on /mnt

2010-06-25 Thread Marc Mertes
Hi, I think the problem is that you don't mount the drbd device like /dev/drbd0 or /dev/drbd1 (as set in drbd.conf). You can not mount the real partition, you ever have to use the drbd device. Regards Marc Am 25.06.2010 09:33, schrieb marc genou: Hi again. This seems weird. I am se

[Pacemaker] ERROR: Couldn't mount filesystem /dev/mapper/XXX on /mnt

2010-06-25 Thread marc genou
Hi again. This seems weird. I am setting a new cluster (a simpler one) and having some troubles to mount the ext3 on top of drbd partition: I get this meesages from the logs: /var/log/messages-Jun 24 17:24:43 openvz1 Filesystem[11092]: INFO: Running start for /dev/mapper/vg0-vzpart on /mnt /var/l