[Pacemaker] Jeez!

2016-06-05 Thread Andrew Beekhof
Hey, I was looking for something an I've suddenly came accross that thing, jeez... you have to take a look <http://nkecoshapri.bluesquarepopups.com/aewtts> Best Wishes, Andrew Beekhof ___ Pacemaker mailing list: Pacemaker@oss.clusterla

Re: [Pacemaker] Pacemaker Corosync Issue Exiting Automatically

2016-01-06 Thread Andrew Beekhof
"crmd:error: plugin_dispatch:” you’re using the plugin on a rhel6 based install. don’t do that. you’ll need to use pacemaker with cman instead. this happens automatically is you use 'pcs cluster create ...' > On 14 Dec 2015, at 6:46 PM, Sahil Aggarwal wrote: > > Hello Team Pacemaker/Corosyn

Re: [Pacemaker] Cluster node getting stopped from other node(resending mail)

2015-08-03 Thread Andrew Beekhof
We need a crm_report archive to be able to comment on this sort of thing. A handful of logs from one of the nodes isn’t anywhere near enough. > On 29 Jun 2015, at 4:42 pm, Arjun Pandey wrote: > > > Hi > > I am running a 2 node cluster with this config on centos 6.5/6.6 > > Master/Slave Set:

Re: [Pacemaker] How can I wait for a device to be ready?

2015-05-19 Thread Andrew Beekhof
> On 15 May 2015, at 3:55 am, Carlos Xavier wrote: > > Hi. > > I are doing some testes with OCFS2 running on a AOE shared disk. > The tests are going on a OpenSuse 12.3 with the following packages: > ocfs2-tools-o2cb-1.8.2-4.8.1.x86_64 > ocfs2-tools-1.8.2-4.8.1.x86_64 > corosync-1.4.3-4.1.1.x86

Re: [Pacemaker] [ClusterLabs] Unexpected behaviour of PEngine Recheck Timer while in maintenance mode

2015-05-17 Thread Andrew Beekhof
> On 29 Apr 2015, at 5:40 am, Rolf Weber wrote: > > Hi! > > On 07:27 Mon 27 Apr , Andrew Beekhof wrote: >> What exactly were you doing at this point? > > resizing a filesystem. > fs was unexported and unmounted. > as I understand maintenence mode this sho

Re: [Pacemaker] Centos 70->71 update fails with "Application of an update diff failed (rc=-206)"

2015-04-27 Thread Andrew Beekhof
> On 27 Apr 2015, at 6:35 pm, Patrick Zwahlen wrote: > >> Apart from those scary logs, does anything actually break? >> What your seeing is probably just ignorable noise from the older version >> - I would expect the underlying cib to resolve things correctly. > > Thanks Andrew for the response

Re: [Pacemaker] [ClusterLabs] Unexpected behaviour of PEngine Recheck Timer while in maintenance mode

2015-04-26 Thread Andrew Beekhof
> On 21 Apr 2015, at 6:24 am, Rolf Weber wrote: > > Hi all! > > I encountered some strange behaviour of the PEngine Recheck Timer while in > maintenance mode ending in a reboot. > > Setup is a 2 node cluster, 1 resource group consisting of several drbds and > filesystems that are exported via

Re: [Pacemaker] stonith

2015-04-26 Thread Andrew Beekhof
> On 19 Apr 2015, at 11:37 pm, Andrei Borzenkov wrote: > > В Sun, 19 Apr 2015 14:23:27 +0200 > Andreas Kurz пишет: > >> On 2015-04-17 12:36, Thomas Manninger wrote: >>> Hi list, >>> >>> i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over >>> ipmi interface. >>> >>> My pr

Re: [Pacemaker] Centos 70->71 update fails with "Application of an update diff failed (rc=-206)"

2015-04-26 Thread Andrew Beekhof
> On 26 Apr 2015, at 7:27 pm, Patrick Zwahlen wrote: > >> map your ip cluster to hostname using /etc/hosts and try to use an >> example like this >> http://clusterlabs.org/doc/fr/Pacemaker/1.1- >> pcs/html/Clusters_from_Scratch/_sample_corosync_configuration.html > > I've added "name: fqdn" in

Re: [Pacemaker] Add new node in the case of unicast with corosync v1.4.2

2015-04-26 Thread Andrew Beekhof
> On 26 Apr 2015, at 5:58 am, Kadlecsik József > wrote: > > Hi, > > We have an unicast setup with corosync v1.4.2 which is unable to reload > it's configuration file, and want to add a new node. > > Our plan is to enable maintenance mode in pacemaker and restart corosync > at the nodes one

Re: [Pacemaker] Unique clone instance is stopped too early on move

2015-04-26 Thread Andrew Beekhof
> On 17 Apr 2015, at 4:19 pm, Vladislav Bogdanov wrote: > > 17.04.2015 00:48, Andrew Beekhof wrote: >> >>> On 22 Jan 2015, at 12:04 am, Vladislav Bogdanov >>> wrote: >>> >>> 20.01.2015 02:44, Andrew Beekhof wrote: >>>> >

Re: [Pacemaker] coronosyc 1.2.1 with pacemaker and openais is suse11sp1

2015-04-19 Thread Andrew Beekhof
> On 15 Apr 2015, at 10:06 pm, Timi wrote: > > Hi guys, > > we have a cluster setup with: coronosyc 1.2.1 with pacemaker and openais is > suse11sp1 on two nodes connected via direct cable for heartbeat, we checked > the connection and its ok. > > we are having this on the logs: > > 12:23] <[1

Re: [Pacemaker] Unique clone instance is stopped too early on move

2015-04-16 Thread Andrew Beekhof
> On 22 Jan 2015, at 12:04 am, Vladislav Bogdanov wrote: > > 20.01.2015 02:44, Andrew Beekhof wrote: >> >>> On 16 Jan 2015, at 3:59 pm, Vladislav Bogdanov wrote: >>> >>> 16.01.2015 07:44, Andrew Beekhof wrote: >>>> >>>

Re: [Pacemaker] Unique clone instance is stopped too early on move

2015-04-12 Thread Andrew Beekhof
> On 22 Jan 2015, at 12:04 am, Vladislav Bogdanov wrote: > > 20.01.2015 02:44, Andrew Beekhof wrote: >> >>> On 16 Jan 2015, at 3:59 pm, Vladislav Bogdanov wrote: >>> >>> 16.01.2015 07:44, Andrew Beekhof wrote: >>>> >>>

Re: [Pacemaker] Pre/post-action notifications for master-slave and clone resources

2015-03-29 Thread Andrew Beekhof
> On 27 Jan 2015, at 9:57 pm, Vladislav Bogdanov wrote: > > Hi, > > Playing with two-week old git master on a two-node cluster I discovered that > only limited set of "notify" operations is performed for clone and > master-slave instances when all of them are being started/stopped. > > Clone

Re: [Pacemaker] Master-Slave role stickiness

2015-03-29 Thread Andrew Beekhof
> On 23 Jan 2015, at 9:13 am, brook davis wrote: > > < snip > >> It sounds like default-resource-stickiness does not kick in; and with >> default resource-stickiness=1 it is expected (10 > 6). Documentation >> says default-recource-stickiness is deprecated so may be it is ignored >> in your ver

Re: [Pacemaker] why sometimes pengine seems lazy

2015-03-29 Thread Andrew Beekhof
> On 10 Feb 2015, at 6:50 pm, d tbsky wrote: > > hi: > I was using pacemaker and drbd with sl linux 6.5/6.6. all are fine. > > now I am tesing sl linux 7.0 and I notice when I want to promote > the drbd resource with "pcs resource meta my-ms-drbd master-max=2". > >sometimes pengine fi

Re: [Pacemaker] Colocating with unmanaged resource

2015-03-29 Thread Andrew Beekhof
> On 28 Feb 2015, at 6:00 am, Покотиленко Костик wrote: > > В Чтв, 22/01/2015 в 14:59 +1100, Andrew Beekhof пишет: >>> On 15 Jan 2015, at 12:54 am, Покотиленко Костик wrote: >>> >>> В Вто, 06/01/2015 в 16:27 +1100, Andrew Beekhof пишет: >>>>

Re: [Pacemaker] One node thinks everyone is online, but the other node doesn't think so

2015-03-29 Thread Andrew Beekhof
> On 11 Mar 2015, at 2:21 am, Dmitry Koterov wrote: > > On Tue, Feb 24, 2015 at 2:07 AM, Andrew Beekhof wrote: > > > I have a 3-node cluster where node1 and node2 are running > > corosync+pacemaker and node3 is running corosync only (for quorum). > > Co

Re: [Pacemaker] pacemaker-1.1.12 - lots of Could not establish cib_ro connection: Resource temporarily unavailable (11) errors

2015-03-18 Thread Andrew Beekhof
> On 18 Mar 2015, at 7:01 pm, Nikola Ciprich wrote: > > Hello Andrew, > >> It certainly explains the log message. >> Do you have a lot of these resources querying the CIB? Perhaps its overloaded > > well, it keeps happening when I try to start many those resources in one > moment > (by many I

Re: [Pacemaker] pacemaker-1.1.12 - lots of Could not establish cib_ro connection: Resource temporarily unavailable (11) errors

2015-03-15 Thread Andrew Beekhof
> On 14 Mar 2015, at 5:53 pm, Nikola Ciprich wrote: > > Hello Andrew, > > I'm really sorry for replying this late.. >> >> The python command has nothing to do with the cluster and no reason to >> connect to the cib? > well, python script actually executes crm_mon to do some internal sanity >

Re: [Pacemaker] One more globally-unique clone question

2015-02-25 Thread Andrew Beekhof
> On 24 Feb 2015, at 4:35 pm, Vladislav Bogdanov wrote: > > 24.02.2015 01:58, Andrew Beekhof wrote: >> >>> On 21 Jan 2015, at 5:08 pm, Vladislav Bogdanov wrote: >>> >>> 21.01.2015 03:51, Andrew Beekhof wrote: >>>> >>>

Re: [Pacemaker] Suggestions for managing HA of containers from within a Pacemaker container?

2015-02-25 Thread Andrew Beekhof
> On 26 Feb 2015, at 8:51 am, Digimer wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 25/02/15 04:45 PM, David Vossel wrote: >> >> >> - Original Message - >>> Pacemaker as a scheduler in Mesos or Kubernates does sound like a >>> very interesting idea. Packaging coros

Re: [Pacemaker] Suggestions for managing HA of containers from within a Pacemaker container?

2015-02-25 Thread Andrew Beekhof
sense to me. What's the reason in isolating something and then giving it > all permissions on a host machine? Probably because someone realised that they wanted to container-ize the software for creating containers and nesting them was too horrible to contemplate. > > On Mon, Fe

Re: [Pacemaker] Running pacemaker as non-root user

2015-02-24 Thread Andrew Beekhof
> On 24 Feb 2015, at 10:36 pm, N, Ravikiran wrote: > > Hi all, > > I was trying to find out whether it is possible to START/STOP pacemaker, and > also run PCS commands as non-root user (in my case it is ‘admin’ user). > I did add the user(‘admin’) to haclient group, but it is of no help. I ge

Re: [Pacemaker] Pacemaker won't start after node was fenced

2015-02-23 Thread Andrew Beekhof
> On 27 Jan 2015, at 5:23 pm, Jake Smith wrote: > > Had a failover of my active/passive cluster and now the passive node will not > rejoin the cluster. > > 2 nodes running Ubuntu 12.04 > coro 1.4.2-2, openais 1.1.4-4, pcmk 1.1.6-2ubuntu3 > > Corosync ring membership is fine on both rings. >

Re: [Pacemaker] Suggestions for managing HA of containers from within a Pacemaker container?

2015-02-23 Thread Andrew Beekhof
> On 8 Feb 2015, at 7:09 am, Steven Dake (stdake) wrote: > > Hi, > > I am working on Containerizing OpenStack in the Kolla project > (http://launchpad.net/kolla). One of the key things we want to do over the > next few months is add H/A support to our container tech. David Vossel had > sug

Re: [Pacemaker] Suggestions for managing HA of containers from within a Pacemaker container?

2015-02-23 Thread Andrew Beekhof
> On 10 Feb 2015, at 1:45 pm, Serge Dubrouski wrote: > > Hello Steve, > > Are you sure that Pacemaker is the right product for your project? Have you > checked Mesos/Marathon or Kubernates? Those are frameworks being developed > for managing containers. And in a few years they'll work out th

Re: [Pacemaker] pacemaker/corosync: a resource is started on 2 nodes

2015-02-23 Thread Andrew Beekhof
> On 28 Jan 2015, at 9:20 pm, Sergey Arlashin > wrote: > > Hi! > > I have a small corosync/pacemaker based cluster which consists of 4 nodes. 2 > nodes are in standby mode, another 2 actually handle all the resources. > > corosync ver. 1.4.7-1. > pacemaker ver 1.1.11. > os: ubuntu 12.04

Re: [Pacemaker] Colocation constraint getting removed

2015-02-23 Thread Andrew Beekhof
> On 22 Jan 2015, at 4:39 pm, Arjun Pandey wrote: > > Any pointers on this would be helpful. Constraints don't get removed automatically unless someone asked for a resource that it references to be deleted. Other possibilities include, someone asked to delete the constraint and someone upload

Re: [Pacemaker] One node thinks everyone is online, but the other node doesn't think so

2015-02-23 Thread Andrew Beekhof
> On 26 Jan 2015, at 10:53 am, Dmitry Koterov wrote: > > Hello. > > I have a 3-node cluster where node1 and node2 are running corosync+pacemaker > and node3 is running corosync only (for quorum). Corosync 2.3.3, pacemaker > 1.1.10. Everything worked fine the first couple of days. > > Once up

Re: [Pacemaker] One more globally-unique clone question

2015-02-23 Thread Andrew Beekhof
> On 21 Jan 2015, at 5:08 pm, Vladislav Bogdanov wrote: > > 21.01.2015 03:51, Andrew Beekhof wrote: >> >>> On 20 Jan 2015, at 4:13 pm, Vladislav Bogdanov wrote: >>> >>> 20.01.2015 02:47, Andrew Beekhof wrote: >>>> >>>

Re: [Pacemaker] Issues Migrating from 12.04 to 14.04 with resource-stickiness

2015-02-23 Thread Andrew Beekhof
It looks like jiravip1 has failed in a lot of places. Is this the complete configuration? I would have expected some colocation constraints from the behaviour. Also, you understand what symmetric-cluster="false" does? > On 13 Feb 2015, at 4:29 am, Krakowitzer, Merritt > wrote: > > Good Day/E

Re: [Pacemaker] 'stop' operation passes outdated set of instance attributes to RA

2015-02-22 Thread Andrew Beekhof
> On 14 Feb 2015, at 1:10 am, Vladislav Bogdanov wrote: > > Hi, > > I believe that is a bug that 'stop' operation uses set of instance attributes > from the original 'start' op, not what successful 'reload' had. > Corresponding pe-input has correct set of attributes, and pre-stop 'notify' > o

Re: [Pacemaker] Cannot fail-over Master/Slave resource collocated with ping resource at the HDD crash

2015-02-22 Thread Andrew Beekhof
> On 16 Feb 2015, at 8:15 pm, NAKAHIRA Kazutomo > wrote: > > Hi all, > > I encountered trouble that Master/Slave resource collocated > with ping resource can not fail-over at the HDD crash. > > After HDD crash, stop operation of the ping resource is looping > and notify operation of the Maste

Re: [Pacemaker] pacemaker-1.1.12 - lots of Could not establish cib_ro connection: Resource temporarily unavailable (11) errors

2015-02-22 Thread Andrew Beekhof
> On 9 Feb 2015, at 8:06 pm, Nikola Ciprich wrote: > > Hello, > > I'd like to ask about following problem that troubles me for some time > and I wan't able to find solution for: > > I've got cluster with quite a lot of resources, and when I try to do > multiple operations at time, I get a lot

Re: [Pacemaker] Querying resource status programatically

2015-02-22 Thread Andrew Beekhof
> On 7 Feb 2015, at 3:01 pm, Brian Campbell > wrote: > > Hi all! > > I'm writing a domain-specific frontend for Pacemaker, which can set up > a few different pre-configured "stacks" of resources, and provide > simplified monitoring and administration of those stacks. > > One thing I'm wonderi

Re: [Pacemaker] pacemaker does not start after cman config

2015-02-22 Thread Andrew Beekhof
> On 20 Feb 2015, at 9:24 pm, Lukas Kostyan wrote: > > > > 2015-02-19 22:49 GMT+01:00 Andrew Beekhof : > > > On 10 Feb 2015, at 11:53 pm, Lukas Kostyan wrote: > > > > Hi all, > > > > was following the guide from clusterlab but use debian

[Pacemaker] This mailing list will go away soon

2015-02-19 Thread Andrew Beekhof
One of the things that was agreed at the recent cluster summit was a consolidation cluster related irc channels, mailing lists, and websites. In keeping with this, the pacemaker mailing list should now be considered deprecated and will no longer accept messages as of March 1st 2015. We still wan

Re: [Pacemaker] pacemaker does not start after cman config

2015-02-19 Thread Andrew Beekhof
> On 10 Feb 2015, at 11:53 pm, Lukas Kostyan wrote: > > Hi all, > > was following the guide from clusterlab but use debian wheezy. > corosync 1.4.2-3 > > pacemaker 1.1.7-1 > cman 3.0.12-3.2+deb7u2 > > configured the active/passive with no problems but a

Re: [Pacemaker] [Linux-HA] Announcing the Heartbeat 3.0.6 Release

2015-02-19 Thread Andrew Beekhof
> On 11 Feb 2015, at 8:24 am, Lars Ellenberg wrote: > > > TL;DR: > > If you intend to set up a new High Availability cluster > using the Pacemaker cluster manager, > you typically should not care for Heartbeat, > but use recent releases (2.3.x) of Corosync. > > If you don't care for Hear

Re: [Pacemaker] RHEL 6 to RHEL 7 upgrade

2015-02-19 Thread Andrew Beekhof
> On 11 Feb 2015, at 9:36 pm, Alex Samad - Yieldbroker > wrote: > > Hi > > I am doing some planning for a Centos 6 to Centos 7 upgrade. Just wondering > if there are any gotchas with pacemaker/cman? There is no cman in 7 You'll need to configure corosync2 directly (instead of via cluster.con

Re: [Pacemaker] Multiple live cib in one pacemaker

2015-02-19 Thread Andrew Beekhof
> On 20 Feb 2015, at 1:52 am, Kristoffer Grönlund wrote: > > Adam Błaszczykowski writes: > >> Hello, >> I am using Pacemaker 1.1.12 together with Corosync 2.4.3 in my cluster >> environment. I have two nodes in cluster that are in different LAN >> locations. It may be situation that nodes will

Re: [Pacemaker] Stop single resource on quorum loss but not others

2015-01-22 Thread Andrew Beekhof
being vague in my > previous email. > > On January 22, 2015 6:21:34 PM MST, Andrew Beekhof wrote: > > On 23 Jan 2015, at 8:50 am, Rahim Millious wrote: > > Hello, > > I am hoping someone can help me. I have a custom resource agent which > requires access (vi

Re: [Pacemaker] (no subject)

2015-01-22 Thread Andrew Beekhof
> On 23 Jan 2015, at 8:50 am, Rahim Millious wrote: > > Hello, > > I am hoping someone can help me. I have a custom resource agent which > requires access (via ssh) to the passive node in order to function correctly. > Is it possible to stop the resource when quorum is lost and restart it whe

Re: [Pacemaker] Colocating with unmanaged resource

2015-01-21 Thread Andrew Beekhof
> On 15 Jan 2015, at 12:54 am, Покотиленко Костик wrote: > > В Вто, 06/01/2015 в 16:27 +1100, Andrew Beekhof пишет: >>> On 20 Dec 2014, at 6:21 am, Покотиленко Костик wrote: >>> Here are behaviors of different versions of pacemaker: >>> >>> 1.1.1

Re: [Pacemaker] One more globally-unique clone question

2015-01-20 Thread Andrew Beekhof
> On 20 Jan 2015, at 4:13 pm, Vladislav Bogdanov wrote: > > 20.01.2015 02:47, Andrew Beekhof wrote: >> >>> On 17 Jan 2015, at 1:25 am, Vladislav Bogdanov >>> wrote: >>> >>> Hi all, >>> >>> Trying to reproduce problem

Re: [Pacemaker] breaking resource dependencies by replacing resource group by co-location constrains

2015-01-20 Thread Andrew Beekhof
group grp_application res_mount-application res_Service-IP res_mount-CIFSshareData res_mount-CIFSshareData2 res_mount-CIFSshareData3 res_MyApplication is just a shortcut for colocation res_Service-IP with res_mount-application colocation res_mount-CIFSshareData with res_Service-IP ... and c

Re: [Pacemaker] One more globally-unique clone question

2015-01-19 Thread Andrew Beekhof
> On 17 Jan 2015, at 1:25 am, Vladislav Bogdanov wrote: > > Hi all, > > Trying to reproduce problem with early stop of globally-unique clone > instances during move to another node I found one more "interesting" problem. > > Due to the different order of resources in the CIB and extensive use

Re: [Pacemaker] Unique clone instance is stopped too early on move

2015-01-19 Thread Andrew Beekhof
> On 16 Jan 2015, at 3:59 pm, Vladislav Bogdanov wrote: > > 16.01.2015 07:44, Andrew Beekhof wrote: >> >>> On 15 Jan 2015, at 3:11 pm, Vladislav Bogdanov wrote: >>> >>> 13.01.2015 11:32, Andrei Borzenkov wrote: >>>> On Tue, Jan 13, 2015

Re: [Pacemaker] Unique clone instance is stopped too early on move

2015-01-15 Thread Andrew Beekhof
> On 15 Jan 2015, at 3:11 pm, Vladislav Bogdanov wrote: > > 13.01.2015 11:32, Andrei Borzenkov wrote: >> On Tue, Jan 13, 2015 at 10:20 AM, Vladislav Bogdanov >> wrote: >>> Hi Andrew, David, all. >>> >>> I found a little bit strange operation ordering during transition execution. >>> >>> Could

Re: [Pacemaker] Help needed to configure MySQL Cluster using Pacemaker, Corosync, DRBD and PCS

2015-01-15 Thread Andrew Beekhof
(x86_64) using readline 5.1 > > > My aim is to create MySQL/MariaDB cluster using pacemaker DRBD on CentOS7. > Could you please guide me on this or provide any related arti

Re: [Pacemaker] Avoid one node from being a target for resources migration

2015-01-14 Thread Andrew Beekhof
'quorum.two_node:1' is only sane for a 2 node cluster > > On Wednesday, January 14, 2015, Andrew Beekhof wrote: > > > On 14 Jan 2015, at 12:06 am, Dmitry Koterov > > wrote: > > > > > > > Then I see that, although node2 clearly knows it'

Re: [Pacemaker] Avoid one node from being a target for resources migration

2015-01-13 Thread Andrew Beekhof
> On 14 Jan 2015, at 12:06 am, Dmitry Koterov wrote: > > > > Then I see that, although node2 clearly knows it's isolated (it doesn't see > > other 2 nodes and does not have quorum) > > we don't know that - there are several algorithms for calculating quorum and > the information isn't includ

Re: [Pacemaker] Setting cib-bootstrap-options parameters before DC election

2015-01-13 Thread Andrew Beekhof
> On 14 Jan 2015, at 12:19 am, AWeber - Ryan Steele wrote: > > Hi folks, > > For testing scenarios in which I’m only spinning up nodes on my laptop > (test-kitchen), I don’t really need the full 20 seconds for dc-deadtime. > However, I haven’t been successful in finding a way to set that opt

Re: [Pacemaker] Avoid one node from being a target for resources migration

2015-01-12 Thread Andrew Beekhof
> On 13 Jan 2015, at 7:56 am, Dmitry Koterov wrote: > > 1. install the resource related packages on node3 even though you never want > them to run there. This will allow the resource-agents to verify the resource > is in fact inactive. > > Thanks, your advise helped: I installed all the service

Re: [Pacemaker] Avoid one node from being a target for resources migration

2015-01-12 Thread Andrew Beekhof
> On 13 Jan 2015, at 4:25 am, David Vossel wrote: > > > > - Original Message - >> Hello. >> >> I have 3-node cluster managed by corosync+pacemaker+crm. Node1 and Node2 are >> DRBD master-slave, also they have a number of other services installed >> (postgresql, nginx, ...). Node3 is j

Re: [Pacemaker] Help needed to configure MySQL Cluster using Pacemaker, Corosync, DRBD and PCS

2015-01-11 Thread Andrew Beekhof
> On 11 Jan 2015, at 5:39 pm, Shameer Babu wrote: > > Hi, > > I have configured Apache cluster by referring you document > http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf and it was good and > working. Now I would like to configure a simple MySQL cluster using > pacemaker,corosync,DR

Re: [Pacemaker] Clustermon issue

2015-01-11 Thread Andrew Beekhof
> On 9 Jan 2015, at 6:23 pm, Marco Querci wrote: > > Sorry ... it was my error. > My CentOS is 6.6: same as I said last time, except s/6.6/6.7/ > > [root@langate1 ~]# cat /etc/redhat-release > CentOS release 6.6 (Final) > > full upgraded. > > > >

Re: [Pacemaker] BUG: crm_mon prints status of clone instances being started as 'Started'

2015-01-11 Thread Andrew Beekhof
I'll push this up soon: diff --git a/lib/pengine/clone.c b/lib/pengine/clone.c index 596f701..b83798a 100644 --- a/lib/pengine/clone.c +++ b/lib/pengine/clone.c @@ -438,6 +438,10 @@ clone_print(resource_t * rsc, const char *pre_text, long options, void *print_da /* Unique, unmanaged

Re: [Pacemaker] Clustermon issue

2015-01-08 Thread Andrew Beekhof
ix it until 6.6). > > Many Thanks. > > > Il 08/01/2015 03:39, Andrew Beekhof ha scritto: >>> On 8 Jan 2015, at 1:31 pm, Andrew Beekhof wrote: >>> >>> And there is no indication this is being called? >> Doh. I know this one... you're actuall

Re: [Pacemaker] Patches: RFC before pull request

2015-01-07 Thread Andrew Beekhof
They all look sane to me. Please proceed with a pull request :-) We should probably start thinking about .13 (or .14 for the superstitious), there have been quite a few important patches arrive since .12 was released. > On 10 Dec 2014, at 1:33 am, Lars Ellenberg wrote: > > > Andrew, > All,

Re: [Pacemaker] More Diagnosis help

2015-01-07 Thread Andrew Beekhof
> On 1 Nov 2014, at 7:07 am, Alex Samad - Yieldbroker > wrote: > > It looks to me, like VMWare took too long to give this vm a time slice and > corosync responded by killing one node That does sound reasonable from the logs you posted. (sorry, I'm only just catching up on old posts) _

Re: [Pacemaker] pgsql troubles.

2015-01-07 Thread Andrew Beekhof
> On 5 Dec 2014, at 4:16 am, steve wrote: > > Good Afternoon, > > > I am having loads of trouble with pacemaker/corosync/postgres. Defining the > symptoms is rather difficult. The primary being that postgres starts as > slave on both nodes. I have tested the pgsqlRA start/stop/status/mon

Re: [Pacemaker] OpenVZ live migration

2015-01-07 Thread Andrew Beekhof
> On 16 Dec 2014, at 7:33 am, mailing list wrote: [...] > When I shutdown the nodea that the resource get on the nodeb, bad when I > switch nodea again on, that the cluster kill the virtual and start it on the > nodea. It's here a Idea how to allow live migrate on the cluster? Could be a lim

Re: [Pacemaker] Long failover

2015-01-07 Thread Andrew Beekhof
cemaker] Long failover > > Hello, > > Debug logs from slave are attached. Hope it helps. > > > Kind regards, > Dmitriy Matveichev. > > -Original Message- > From: Andrew Beekhof [mailto:and...@beekhof.net] > Sent: Monday, Nove

Re: [Pacemaker] Split Brain on DRBD Dual Primary

2015-01-07 Thread Andrew Beekhof
> On 12 Nov 2014, at 5:16 pm, Ho, Alamsyah - ACE Life Indonesia > wrote: > > Hi All, > > On October archives, I saw the issue reported by Felix Zachlod on > http://oss.clusterlabs.org/pipermail/pacemaker/2014-October/022653.html and > the same is actually happens to me now on dual primary D

Re: [Pacemaker] qb_ipcs_disconnect message in corosync cluster

2015-01-07 Thread Andrew Beekhof
g from so few non-error logs. > > -- > Bharathiraja > > On Mon, Dec 15, 2014 at 9:19 AM, Andrew Beekhof wrote: > > > On 12 Dec 2014, at 9:57 pm, Bharathiraja P wrote: > > > > Hi, > > > > We run pacemaker+corosync cluster on OpenSuSE 13.1 QEMU gue

Re: [Pacemaker] Clustermon issue

2015-01-07 Thread Andrew Beekhof
> On 8 Jan 2015, at 1:31 pm, Andrew Beekhof wrote: > > And there is no indication this is being called? Doh. I know this one... you're actually using 1.1.12-rc3. You need this patch which landed after 1.1.12 shipped: https://github.com/beekhof/pacemaker/commit/3df6aff >

Re: [Pacemaker] Clustermon issue

2015-01-07 Thread Andrew Beekhof
> "Cluster Monitor" -a $monitorfile mquerc...@gmail.com > > > Thanks. > > > Il 06/01/2015 01:21, Andrew Beekhof ha scritto: >>> On 6 Jan 2015, at 3:37 am, Marco Querci wrote: >>> >>> Hi All. >>> Any news for my problem? >>

Re: [Pacemaker] Corosync 1.4.7: zombie (defunct)

2015-01-06 Thread Andrew Beekhof
dn't be a bad idea while you're at it > > -- > Best regards, > Sergey Arlashin > > > On Jan 6, 2015, at 11:04 AM, Sergey Arlashin > wrote: > >> Thank you! >> I'll try 1.1.12. >> >> -- >> Best regards, >> Sergey Ar

Re: [Pacemaker] Avoid monitoring of resources on nodes

2015-01-05 Thread Andrew Beekhof
> On 4 Dec 2014, at 7:52 pm, Daniel Dehennin > wrote: > > Andrew Beekhof writes: > >> What version of pacemaker is this? >> Some very old versions wanted the agent to be installed on all nodes. > > It's 1.1.10+git20130802-1ubuntu2.1 on Trusty Tahr. I&

Re: [Pacemaker] Colocating with unmanaged resource

2015-01-05 Thread Andrew Beekhof
> On 20 Dec 2014, at 6:21 am, Покотиленко Костик wrote: > > Hi, > > Simple scenario, several floating IPs should be living on "front" nodes > only if there is working Nginx. There are several reasons against Nginx > being controlled by Pacemaker. > > So, decided to colocate FIPs with unmanaged

Re: [Pacemaker] Clustermon issue

2015-01-05 Thread Andrew Beekhof
> On 6 Jan 2015, at 3:37 am, Marco Querci wrote: > > Hi All. > Any news for my problem? Maybe post your /home/administrator/clustermonitor_notification.sh script? > > Many thanks. > > > Il 19/12/2014 12:13, Marco Querci ha scritto: >> Many tahnk for your reply. >> Here is my configuration:

Re: [Pacemaker] Corosync 1.4.7: zombie (defunct)

2015-01-05 Thread Andrew Beekhof
4 22:06:36 > UTC 2014 x86_64 x86_64 x86_64 GNU/Linux > > -- > Best regards, > Sergey Arlashin > > > On Jan 5, 2015, at 7:59 AM, Andrew Beekhof wrote: > >> pacemaker version? it looks familiar but it depends on the version number. >> >>> On 29 Dec 2

Re: [Pacemaker] Corosync 1.4.7: zombie (defunct)

2015-01-04 Thread Andrew Beekhof
pacemaker version? it looks familiar but it depends on the version number. > On 29 Dec 2014, at 10:24 pm, Sergey Arlashin > wrote: > > Hi! > Recently I've noticed that one of my nodes had OFFLINE status in 'crm status' > output. But it actually was not. I could ssh on this node. I could get '

[Pacemaker] Where is Beekhof?

2014-12-14 Thread Andrew Beekhof
Just a courtesy email for anyone looking for me or waiting on a reply from me specifically... I have been sucked into openstack hell and probably won't re-emerge until early January at the earliest. Until then its unlikely that I'll be able to deal with anything except the lowest of the low han

Re: [Pacemaker] qb_ipcs_disconnect message in corosync cluster

2014-12-14 Thread Andrew Beekhof
> On 12 Dec 2014, at 9:57 pm, Bharathiraja P wrote: > > Hi, > > We run pacemaker+corosync cluster on OpenSuSE 13.1 QEMU guests. > > Frequently, one node gets disconnected from cib. This is the message seen in > corosync logs, > > Nov 25 08:36:07 [3760] sysmon-secondarycib:debug:

Re: [Pacemaker] Avoid monitoring of resources on nodes

2014-12-03 Thread Andrew Beekhof
What version of pacemaker is this? Some very old versions wanted the agent to be installed on all nodes. > On 26 Nov 2014, at 10:21 pm, Daniel Dehennin > wrote: > > Daniel Dehennin writes: > >>> I'll try find how to make the change directly in XML. >> >> Ok, looking at git history this featu

Re: [Pacemaker] Suicide fencing and watchdog questions

2014-11-30 Thread Andrew Beekhof
> On 29 Nov 2014, at 5:36 pm, Andrei Borzenkov wrote: > > В Thu, 27 Nov 2014 08:24:56 +0300 > Vladislav Bogdanov пишет: > >> 27.11.2014 03:43, Andrew Beekhof wrote: >>> >>>> On 25 Nov 2014, at 10:37 pm, Vladislav Bogdanov >>>> wrote:

Re: [Pacemaker] Pacemaker fencing and DLM/cLVM

2014-11-27 Thread Andrew Beekhof
the node and notify >> its success. >> >> How the “returned: 0 (OK)” could became “receive 1”? >> >> A logic issue somewhere between stonith-ng and dlm_controld? >> > > it could be, I don't know enough about pacemaker to be able to comment o

Re: [Pacemaker] Suicide fencing and watchdog questions

2014-11-27 Thread Andrew Beekhof
> On 27 Nov 2014, at 4:24 pm, Vladislav Bogdanov wrote: > > 27.11.2014 03:43, Andrew Beekhof wrote: >> >>> On 25 Nov 2014, at 10:37 pm, Vladislav Bogdanov >>> wrote: >>> >>> Hi, >>> >>> Is there any information how watch

Re: [Pacemaker] Suicide fencing and watchdog questions

2014-11-26 Thread Andrew Beekhof
> On 25 Nov 2014, at 10:37 pm, Vladislav Bogdanov wrote: > > Hi, > > Is there any information how watchdog integration is intended to work? > What are currently-evaluated use-cases for that? > It seems to be forcibly disabled id SBD is not detected... Are you referring to no-quorum-policy=suic

Re: [Pacemaker] [ha-wg-technical] [ha-wg] [Cluster-devel] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-26 Thread Andrew Beekhof
> On 27 Nov 2014, at 2:41 am, Lars Marowsky-Bree wrote: > > On 2014-11-25T16:46:01, David Vossel wrote: > > Okay, okay, apparently we have got enough topics to discuss. I'll > grumble a bit more about Brno, but let's get the organisation of that > thing on track ... Sigh. Always so much work!

Re: [Pacemaker] [ha-wg-technical] [ha-wg] [Linux-HA] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Andrew Beekhof
> On 26 Nov 2014, at 4:51 pm, Fabio M. Di Nitto wrote: > > > > On 11/25/2014 10:54 AM, Lars Marowsky-Bree wrote: >> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: >> Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) >>> That would

Re: [Pacemaker] [ha-wg-technical] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Andrew Beekhof
> On 26 Nov 2014, at 10:06 am, Digimer wrote: > > On 25/11/14 04:31 PM, Andrew Beekhof wrote: >>> Yeah, but you're already bringing him for your personal conference. >>> That's a bit different. ;-) >>> >>> OK, let's switch tracks a

Re: [Pacemaker] [Cluster-devel] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Andrew Beekhof
> On 25 Nov 2014, at 8:54 pm, Lars Marowsky-Bree wrote: > > On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: > >>> Yeah, well, devconf.cz is not such an interesting event for those who do >>> not wear the fedora ;-) >> That would be the perfect opportunity for you to convert users to Suse ;)

Re: [Pacemaker] [Linux-HA] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-25 Thread Andrew Beekhof
> On 25 Nov 2014, at 9:16 pm, Michael Schwartzkopff wrote: > > Am Dienstag, 25. November 2014, 10:54:01 schrieb Lars Marowsky-Bree: >> On 2014-11-24T16:16:05, "Fabio M. Di Nitto" wrote: Yeah, well, devconf.cz is not such an interesting event for those who do not wear the fedora ;-) >>

Re: [Pacemaker] [ha-wg-technical] [ha-wg] [RFC] Organizing HA Summit 2015

2014-11-24 Thread Andrew Beekhof
> On 25 Nov 2014, at 2:12 am, Lars Marowsky-Bree wrote: > > On 2014-11-24T15:54:33, "Fabio M. Di Nitto" wrote: > >> dates and location were chosen to piggy-back with devconf.cz and allow >> people to travel for more than just HA Summit. > > Yeah, well, devconf.cz is not such an interesting ev

Re: [Pacemaker] Globally-unique clone cleanup on remote nodes

2014-11-20 Thread Andrew Beekhof
> On 20 Nov 2014, at 5:44 pm, Vladislav Bogdanov wrote: > > 20.11.2014 09:25, Andrew Beekhof пишет: >> >>> On 20 Nov 2014, at 5:12 pm, Vladislav Bogdanov wrote: >>> >>> 20.11.2014 01:57, Andrew Beekhof пишет: >>>> >>>

Re: [Pacemaker] Globally-unique clone cleanup on remote nodes

2014-11-19 Thread Andrew Beekhof
> On 20 Nov 2014, at 5:12 pm, Vladislav Bogdanov wrote: > > 20.11.2014 01:57, Andrew Beekhof пишет: >> >>> On 19 Nov 2014, at 4:27 pm, Vladislav Bogdanov wrote: >>> >>> Hi all, >>> >>> just an observation. >>> >>

Re: [Pacemaker] Globally-unique clone cleanup on remote nodes

2014-11-19 Thread Andrew Beekhof
> On 19 Nov 2014, at 4:27 pm, Vladislav Bogdanov wrote: > > Hi all, > > just an observation. > > I have a globally-unique clone with 50 instances in a cluster consisting > of one cluster node and 3 remote bare-metal nodes. > When I run "crm_resource -C -r ''" (crmsh does that for me), > it wri

Re: [Pacemaker] Long failover

2014-11-16 Thread Andrew Beekhof
> On 17 Nov 2014, at 6:17 pm, Andrei Borzenkov wrote: > > On Mon, Nov 17, 2014 at 9:34 AM, Andrew Beekhof wrote: >> >>> On 14 Nov 2014, at 10:57 pm, Dmitry Matveichev >>> wrote: >>> >>> Hello, >>> >>> We have a clu

Re: [Pacemaker] Intermittent Failovers: route_ais_message: Sending message to local.crmd failed: ipc delivery failed (rc=-2)

2014-11-16 Thread Andrew Beekhof
> On 11 Nov 2014, at 1:32 am, Zach Wolf wrote: > > Hey Team, > > I’m receiving some strange intermittent failovers on a two-node cluster > (happens once every week or two). When this happens, both nodes are > unavailable; one node will be marked offline and the other will be shown as > uncle

Re: [Pacemaker] Reset failcount for resources

2014-11-16 Thread Andrew Beekhof
> On 13 Nov 2014, at 10:08 pm, Arjun Pandey wrote: > > Hi > > I am running a 2 node cluster with this config > > Master/Slave Set: foo-master [foo] > Masters: [ bharat ] > Slaves: [ ram ] > AC_FLT (ocf::pw:IPaddr): Started bharat > CR_CP_FLT (ocf::pw:IPaddr): Started bharat > CR_UP_FLT (ocf::

Re: [Pacemaker] doubt when cloning a resource

2014-11-16 Thread Andrew Beekhof
> On 15 Nov 2014, at 3:17 am, david escartin wrote: > > Hello all > > we are trying to have in a 2 node cluster one resource TEST (LSB type) cloned Thats your problem. LSB resources cannot be cloned with globally-unique="true" Why do you think you need globally-unique="true" ? > but we wou

Re: [Pacemaker] Long failover

2014-11-16 Thread Andrew Beekhof
> On 14 Nov 2014, at 10:57 pm, Dmitry Matveichev > wrote: > > Hello, > > We have a cluster configured via pacemaker+corosync+crm. The configuration is: > > node master > node slave > primitive HA-VIP1 IPaddr2 \ > params ip=192.168.22.71 nic=bond0 \ > op monitor interval=1s

Re: [Pacemaker] resource-stickiness not working?

2014-11-16 Thread Andrew Beekhof
> On 14 Nov 2014, at 5:52 am, Scott Donoho wrote: > > Here is a simple Active/Passive configuration with a single Dummy resource > (see end of message). The resource-stickiness default is set to 100. I was > assuming that this would be enough to keep the Dummy resource on the active > node as

Re: [Pacemaker] Operation attribute change leads to resource restart

2014-11-16 Thread Andrew Beekhof
> On 15 Nov 2014, at 8:46 am, Vladislav Bogdanov wrote: > > 14.11.2014 17:36, David Vossel пишет: >> >> >> - Original Message - >>> Hi! >>> >>> Just noticed that deletion of a trace_ra op attribute forces resource >>> to be restarted (that RA does not support reload). >>> >>> Logs sh

Re: [Pacemaker] Configuring Dependencies and Groups in CRM

2014-11-16 Thread Andrew Beekhof
> On 17 Nov 2014, at 12:54 am, Stephan wrote: > > Hello, > > I've a cluster with two nodes with a master-slave DRBD and filesystem. On top > of the filesystem there are several OpenVZ containers which are managed by > ocf:heartbeat:ManageVE. > > Currently I've configured the VEs within one g

  1   2   3   4   5   6   7   8   9   10   >