[Pacemaker] Constraint on location of resource start

2012-11-08 Thread Cserbák Márton
Hi, I have successfully set up a DRBD+Pacemaker+Xen cluster on two Debian servers. Unfortunately, I am facing the same issue as the one described in https://bugzilla.redhat.com/show_bug.cgi?id=694492, namely, that the CPU features differ on the two servers. When the domU resources, that were orig

Re: [Pacemaker] pacemaker service start failed.

2012-11-08 Thread Vladislav Bogdanov
09.11.2012 04:48, Andrew Beekhof wrote: ... > > A bit of an update > > The reverse lookup functionality has turned out to cause far more > problems and confusion than it was intended to solve. > So I am basically removing it. Anyone worried about that > bootstrapping case will be encouraged

Re: [Pacemaker] The CPU usage is almost 100 % when booth execute crm_ticket command.

2012-11-08 Thread Yuichi SEINO
Hi Jiaju, Could you comment about "comment 23" in bugzilla ? Sincerrely, Yuichi 2012/10/10 Yuichi SEINO : > Hi Jiaju, > > I reported the bugzilla. > > http://bugs.clusterlabs.org/show_bug.cgi?id=5110 > > Sincerely, > Yuichi > > 2012/10/10 Jiaju Zhang : >> Hi Yuichi, >> >> On Wed, 2012-10-10 at 1

Re: [Pacemaker] killproc not found? o2cb shutdown via resource agent

2012-11-08 Thread Matthew O'Connor
On 11/08/2012 08:15 PM, Andrew Beekhof wrote: > You're not starting it as a pacemaker resource are you? > CMAN should be doing that as part of the init script (which explains > why its still there until after pacemaker is gone). I thought that was the dlm_controld, not ocfs2_controld? dlm_control

Re: [Pacemaker] pacemaker service start failed.

2012-11-08 Thread Takatoshi MATSUO
2012/11/9 Andrew Beekhof : > On Fri, Nov 9, 2012 at 12:25 PM, Takatoshi MATSUO > wrote: >> Hi Andrew >> >> Pgsql RA uses crm_attribute to change slave's master-socre from master like >> # crm_attribute -l reboot -N host2 -n "master-pgsql:1" -v "1000" >> Because the Slave of PostgreSQL can't ge

Re: [Pacemaker] [Linux-HA] Apache-Pacemaker Configuration Error?

2012-11-08 Thread Andrew Beekhof
On Fri, Nov 9, 2012 at 1:20 PM, Andrew Beekhof wrote: > On Tue, Nov 6, 2012 at 3:33 AM, Viviana Cuellar Rivera > wrote: >> Hi all, >> I'm trying to mount the following settings: >> >> vipBalancer1|--| Backend Nodes >> | Backend Nodes >> vip2---bala

Re: [Pacemaker] [Linux-HA] Apache-Pacemaker Configuration Error?

2012-11-08 Thread Andrew Beekhof
On Tue, Nov 6, 2012 at 3:33 AM, Viviana Cuellar Rivera wrote: > Hi all, > I'm trying to mount the following settings: > > vipBalancer1|--| Backend Nodes > | Backend Nodes > vip2---balancer2|--| Backend Nodes > > For that I have installed a

Re: [Pacemaker] pacemaker service start failed.

2012-11-08 Thread Andrew Beekhof
On Fri, Nov 9, 2012 at 12:25 PM, Takatoshi MATSUO wrote: > Hi Andrew > > Pgsql RA uses crm_attribute to change slave's master-socre from master like > # crm_attribute -l reboot -N host2 -n "master-pgsql:1" -v "1000" > Because the Slave of PostgreSQL can't get replication status. > > In addition

Re: [Pacemaker] pacemaker service start failed.

2012-11-08 Thread Andrew Beekhof
On Fri, Nov 2, 2012 at 4:57 PM, Yuusuke Iida wrote: > Hi, Andrew > > > (2012/10/30 13:51), Andrew Beekhof wrote: >> >> On Mon, Oct 29, 2012 at 7:10 PM, Yuusuke Iida >> wrote: >>> >>> Hi, Andrew >>> >>> >>> (2012/10/26 9:31), Andrew Beekhof wrote: > > > When I described the IP which I

Re: [Pacemaker] pacemaker service start failed.

2012-11-08 Thread Takatoshi MATSUO
Hi Andrew Pgsql RA uses crm_attribute to change slave's master-socre from master like # crm_attribute -l reboot -N host2 -n "master-pgsql:1" -v "1000" Because the Slave of PostgreSQL can't get replication status. In addition the RA uses "uname -n" to compare hostname to $OCF_RESKEY_CRM_meta_no

Re: [Pacemaker] killproc not found? o2cb shutdown via resource agent

2012-11-08 Thread Andrew Beekhof
You're not starting it as a pacemaker resource are you? CMAN should be doing that as part of the init script (which explains why its still there until after pacemaker is gone). On Fri, Nov 9, 2012 at 11:14 AM, Matthew O'Connor wrote: > I'm honestly beginning to wonder what exactly that killproc d

Re: [Pacemaker] killproc not found? o2cb shutdown via resource agent

2012-11-08 Thread Matthew O'Connor
I'm honestly beginning to wonder what exactly that killproc does for the ocfs2_controld.cman process... For kicks, I created a script in /sbin and /usr/sbin for killproc, which simply sources the lsb include and calls the function with whatever was passed via the command-line. Perhaps an equivalen

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Andrew Beekhof
On Fri, Nov 9, 2012 at 10:13 AM, Andrew Beekhof wrote: > On Thu, Nov 8, 2012 at 7:18 PM, Lars Marowsky-Bree wrote: >> On 2012-11-08T14:24:40, "Gao,Yan" wrote: >> >>> > What do you propose the XML should look like? >>> Should be like: >>> ... >>> >> interval="30s" ignore-first-failures="true"> >>

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Andrew Beekhof
On Fri, Nov 9, 2012 at 10:03 AM, Andrew Beekhof wrote: > On Thu, Nov 8, 2012 at 5:24 PM, Gao,Yan wrote: >> Hi Andrew, >> >> On 11/08/12 13:09, Andrew Beekhof wrote: >>> On Tue, Nov 6, 2012 at 10:30 PM, Gao,Yan wrote: Hi, Currently, we can manage VMs via the VM agents. But the serv

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Andrew Beekhof
On Thu, Nov 8, 2012 at 7:18 PM, Lars Marowsky-Bree wrote: > On 2012-11-08T14:24:40, "Gao,Yan" wrote: > >> > What do you propose the XML should look like? >> Should be like: >> ... >> > interval="30s" ignore-first-failures="true"> > > If we make "ignore-first-failures" a time period, we can also l

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Andrew Beekhof
On Thu, Nov 8, 2012 at 7:15 PM, Lars Marowsky-Bree wrote: > On 2012-11-08T16:15:50, Andrew Beekhof wrote: > >> > And no, I'm not proposing that we allow overriding the >> > class/provider/type tuple for start/stop ;-) >> Did you consider having the VirtualDomain do the nagios redirect for >> moni

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Andrew Beekhof
On Thu, Nov 8, 2012 at 5:24 PM, Gao,Yan wrote: > Hi Andrew, > > On 11/08/12 13:09, Andrew Beekhof wrote: >> On Tue, Nov 6, 2012 at 10:30 PM, Gao,Yan wrote: >>> Hi, >>> >>> Currently, we can manage VMs via the VM agents. But the services running >>> within VMs are not very easy to be monitored. If

Re: [Pacemaker] [corosync] Corosync 2.1.0 dies on both nodes in cluster

2012-11-08 Thread Andrew Martin
Honza and Angus, Glad to hear about this possible breakthrough! Here's the output of df: root@storage1:~# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/vg00-lv_root 228424996 3376236 213445408 2% / udev 3041428 4 3041424 1% /dev tmpfs 1220808 340 1220468 1% /run none 5120 8

Re: [Pacemaker] OFFLINE node after cluster upgrade

2012-11-08 Thread Carlos Molina
ruslan usifov writes: > > > I solve this problem!On one node in log i found follow error message.slv009    peer is not p art of our clusterSo i stop pacemaker in that host (i use v1 for pacemaker):/etc/pacemaker stop > /etc/corosync stop Then remove all cib info from /var/lib/heatbeat/crm a

Re: [Pacemaker] [corosync] Corosync 2.1.0 dies on both nodes in cluster

2012-11-08 Thread Jan Friesse
Andrew, good news. I believe that I've found reproducer for problem you are facing. Now, to be sure it's really same, can you please run : df (interesting is /dev/shm) and send output of ls -la /dev/shm? I believe /dev/shm is full. Now, as a quick workaround, just delete all qb-* from /dev/shm an

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Dejan Muhamedagic
On Thu, Nov 08, 2012 at 12:10:24PM +0100, Lars Marowsky-Bree wrote: > On 2012-11-08T11:54:04, Dejan Muhamedagic wrote: > > > > Downside, that I have just realized - resources which depend on this > > > one, and might want to depend on a service the container provides. > > > > > > (This could be

Re: [Pacemaker] [corosync] Corosync 2.1.0 dies on both nodes in cluster

2012-11-08 Thread Jan Friesse
Andrew, thanks for valgrind report (even it didn't showed anything useful) and blackbox. We believe that problem is because of access to invalid memory mapped by mmap operation. There are basically 3 places where we are doing mmap. 1.) corosync cpg_zcb functions (I don't believe this is the case)

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Lars Marowsky-Bree
On 2012-11-08T11:54:04, Dejan Muhamedagic wrote: > > Downside, that I have just realized - resources which depend on this > > one, and might want to depend on a service the container provides. > > > > (This could be an argument to go back to the "configure them as > > primitives that are managed

Re: [Pacemaker] killproc not found? o2cb shutdown via resource agent

2012-11-08 Thread Dejan Muhamedagic
Hi, On Thu, Nov 08, 2012 at 08:23:53PM +1100, Tim Serong wrote: > On 11/08/2012 07:56 PM, Andrew Beekhof wrote: > > On Thu, Nov 8, 2012 at 5:16 PM, Tim Serong wrote: > >> On 11/08/2012 12:11 PM, Andrew Beekhof wrote: > >>> On Thu, Nov 8, 2012 at 9:59 AM, Matthew O'Connor wrote: > Follow-up

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Dejan Muhamedagic
Hi, On Thu, Nov 08, 2012 at 09:18:16AM +0100, Lars Marowsky-Bree wrote: > On 2012-11-08T14:24:40, "Gao,Yan" wrote: > > > > What do you propose the XML should look like? > > Should be like: > > ... > > > interval="30s" ignore-first-failures="true"> > > If we make "ignore-first-failures" a time

Re: [Pacemaker] Recommendations in reducing failover response time

2012-11-08 Thread Florian Crouzat
Le 05/11/2012 17:05, Arturo Borrero Gonzalez a écrit : Wich version of pacemaker/RAs are you using? thanks, Regards. $ sudo rpm -qa | grep -e pacemaker -e corosync -e resource-agent corosynclib-1.4.1-4.el6_2.3.x86_64 pacemaker-libs-1.1.6-3.el6.x86_64 pacemaker-cli-1.1.6-3.el6.x86_64 resource

Re: [Pacemaker] killproc not found? o2cb shutdown via resource agent

2012-11-08 Thread Andrew Beekhof
On Thu, Nov 8, 2012 at 8:23 PM, Tim Serong wrote: > On 11/08/2012 07:56 PM, Andrew Beekhof wrote: >> On Thu, Nov 8, 2012 at 5:16 PM, Tim Serong wrote: >>> On 11/08/2012 12:11 PM, Andrew Beekhof wrote: On Thu, Nov 8, 2012 at 9:59 AM, Matthew O'Connor wrote: > Follow-up and additional inf

Re: [Pacemaker] killproc not found? o2cb shutdown via resource agent

2012-11-08 Thread Tim Serong
On 11/08/2012 07:56 PM, Andrew Beekhof wrote: > On Thu, Nov 8, 2012 at 5:16 PM, Tim Serong wrote: >> On 11/08/2012 12:11 PM, Andrew Beekhof wrote: >>> On Thu, Nov 8, 2012 at 9:59 AM, Matthew O'Connor wrote: Follow-up and additional info: System is Ubuntu 12.04. Not sure where kill

Re: [Pacemaker] killproc not found? o2cb shutdown via resource agent

2012-11-08 Thread Andrew Beekhof
On Thu, Nov 8, 2012 at 5:16 PM, Tim Serong wrote: > On 11/08/2012 12:11 PM, Andrew Beekhof wrote: >> On Thu, Nov 8, 2012 at 9:59 AM, Matthew O'Connor wrote: >>> Follow-up and additional info: >>> >>> System is Ubuntu 12.04. Not sure where killproc is supposed to be derived >>> from, or if there

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Lars Marowsky-Bree
On 2012-11-08T14:24:40, "Gao,Yan" wrote: > > What do you propose the XML should look like? > Should be like: > ... > interval="30s" ignore-first-failures="true"> If we make "ignore-first-failures" a time period, we can also limit the time the cluster ignores this. Downside, that I have just re

Re: [Pacemaker] Enable remote monitoring

2012-11-08 Thread Lars Marowsky-Bree
On 2012-11-08T16:15:50, Andrew Beekhof wrote: > > And no, I'm not proposing that we allow overriding the > > class/provider/type tuple for start/stop ;-) > Did you consider having the VirtualDomain do the nagios redirect for > monitor operations? > If so, what was the drawback? The Xen agent has